動的パラメータメモリ：長シーケンス会話における感情認識のためのテンポラリーLoRA拡張LLM

なぜ重要か: 企業や社会への影響が見込まれ、一般メディアにも波及する可能性があります。

ソースを読む（export.arxiv.org）

arXiv:2507.09076v2 発表種別：差し替え

概要：近年の研究では、音声感情認識（SER）の改善に音声大規模言語モデル（SLLM）の適用に焦点を当てている。しかし、音声モダリティの本質的に高いフレームレートは、SLLMの信号処理と理解能力を著しく制限する。例えば、4Kコンテキストウィンドウを持つSLLMは、50Hzの特徴量サンプリングレートで80秒の音声しか処理できない。SLLMで使用される入力トークン圧縮方法は、複数の会話ターンにわたる感情の連続性と慣性を無視している。本論文では、文脈的意味と文レベルの感情エンコーディングを備えた動的パラメータメモリ（DPM）機構を提案し、SLLMにおける限定的なコンテキストウィンドウで無限長の音声処理を可能にする。具体的には、DPMは推論中に文レベルの情報と感情を一時的なLoRAモジュールに段階的にエンコードすることで、文脈情報を効果的に「記憶」する。我々は感情SLLMをバックボーンとして訓練し、会話における感情認識（ERC）のために推論にDPMを取り入れた。IEMOCAPデータセットにおける実験結果は、DPMが長い音声シーケンスを処理する場合にSLLMの感情認識能力を大幅に向上させ、最先端の性能を達成することを示している。

原文（英語）を表示

Title (EN): Dynamic Parameter Memory: Temporary LoRA-Enhanced LLM for Long-Sequence Emotion Recognition in Conversation

arXiv:2507.09076v2 Announce Type: replace-cross
Abstract: Recent research has focused on applying speech large language model (SLLM) to improve speech emotion recognition (SER). However, the inherently high frame rate in speech modality severely limits the signal processing and understanding capabilities of SLLM. For example, a SLLM with a 4K context window can only process 80 seconds of audio at 50Hz feature sampling rate before reaching its capacity limit. Input token compression methods used in SLLM overlook the continuity and inertia of emotions across multiple conversation turns. This paper proposes a Dynamic Parameter Memory (DPM) mechanism with contextual semantics and sentence-level emotion encoding, enabling processing of unlimited-length audio with limited context windows in SLLM. Specifically, DPM progressively encodes sentence-level information and emotions into a temporary LoRA module during inference to effectively “memorize” the contextual information. We trained an emotion SLLM as a backbone and incorporated our DPM into inference for emotion recognition in conversation (ERC). Experimental results on the IEMOCAP dataset show that DPM significantly improves the emotion recognition capabilities of SLLM when processing long audio sequences, achieving state-of-the-art performance.

Published: 2025-09-24 19:00 UTC

動的パラメータメモリ：長シーケンス会話における感情認識のためのテンポラリーLoRA拡張LLM

コメントする コメントをキャンセル

コメントするコメントをキャンセル