大規模言語モデルのサイズ、温度、プロンプトスタイルがLLM-人間評価スコアの整合性に及ぼす影響

なぜ重要か: 企業や社会への影響が見込まれ、一般メディアにも波及する可能性があります。

ソースを読む（export.arxiv.org）

arXiv:2509.19329v1（発表種別：新規）

要旨：本研究は、大規模言語モデル（LLM）の臨床推論能力評価における、モデルサイズ、温度、プロンプトスタイルが、モデル内、モデル間、および人間との整合性に及ぼす影響を調べた。モデルサイズは、LLMと人間のスコア整合性において重要な要素であることが判明した。本研究は、複数レベルでの整合性チェックの重要性を示唆している。

原文（英語）を表示

Title (EN): How Model Size, Temperature, and Prompt Style Affect LLM-Human Assessment Score Alignment

arXiv:2509.19329v1 Announce Type: new
Abstract: We examined how model size, temperature, and prompt style affect Large Language Models’ (LLMs) alignment within itself, between models, and with human in assessing clinical reasoning skills. Model size emerged as a key factor in LLM-human score alignment. Study highlights the importance of checking alignments across multiple levels.

Published: 2025-09-24 19:00 UTC

大規模言語モデルのサイズ、温度、プロンプトスタイルがLLM-人間評価スコアの整合性に及ぼす影響

コメントする コメントをキャンセル

コメントするコメントをキャンセル