表現の収束：相互蒸留は実は正則化の一種である

なぜ重要か: 企業や社会への影響が見込まれ、一般メディアにも波及する可能性があります。

ソースを読む（export.arxiv.org）

本論文では、強化学習ポリシー間の相互蒸留が暗黙の正則化として機能し、無関係な特徴への過学習を防ぐことを主張する。(i) 理論的には、初めて、無関係な特徴に対するポリシーの堅牢性の向上は、汎化性能の向上につながることを証明する。(ii) 経験的には、ポリシー間の相互蒸留がそのような堅牢性に寄与し、ピクセル入力に対する不変表現の自発的な出現を可能にすることを示す。最終的には、最先端の性能達成を主張するのではなく、汎化の基礎原理を解明し、そのメカニズムに対する理解を深めることに焦点を当てる。

原文（英語）を表示

Title (EN): Representation Convergence: Mutual Distillation is Secretly a Form of Regularization

arXiv:2501.02481v5 Announce Type: replace-cross
Abstract: In this paper, we argue that mutual distillation between reinforcement learning policies serves as an implicit regularization, preventing them from overfitting to irrelevant features. We highlight two separate contributions: (i) Theoretically, for the first time, we prove that enhancing the policy robustness to irrelevant features leads to improved generalization performance. (ii) Empirically, we demonstrate that mutual distillation between policies contributes to such robustness, enabling the spontaneous emergence of invariant representations over pixel inputs. Ultimately, we do not claim to achieve state-of-the-art performance but rather focus on uncovering the underlying principles of generalization and deepening our understanding of its mechanisms.

Published: 2025-09-24 19:00 UTC

表現の収束：相互蒸留は実は正則化の一種である

コメントする コメントをキャンセル

コメントするコメントをキャンセル