医療専門用途における言語モデル選択の指針：小規模か大規模か？ゼロショットかファインチューニングか？

なぜ重要か: 企業や社会への影響が見込まれ、一般メディアにも波及する可能性があります。

ソースを読む（export.arxiv.org）

arXiv:2504.21191v2発表タイプ：置換-クロス

要旨：本研究は、1）ファインチューニングの必要性とゼロショット利用の比較、2）ドメイン近接事前学習済みモデルと汎用事前学習済みモデルの利点、3）さらなるドメイン固有事前学習の価値、4）特定タスクにおける小型言語モデル（SLM）と大型言語モデル（LLM）の継続的な関連性、を調査することで、言語モデル選択の指針を示すことを目的とする。British Columbia Cancer Registry (BCCR)の電子病理レポートを用いて、難易度とデータサイズが異なる3つの分類シナリオを評価した。モデルには様々なSLMと1つのLLMが含まれる。SLMはゼロショットとファインチューニングの両方で評価され、LLMはゼロショットのみで評価された。ファインチューニングは、すべてのシナリオにおいてゼロショット結果と比較してSLMのパフォーマンスを大幅に向上させた。ゼロショットLLMはゼロショットSLMを上回ったものの、ファインチューニングされたSLMには常に下回った。ドメイン近接SLMは、ファインチューニング後、特に困難なタスクにおいて、汎用SLMよりも一般的に良好な性能を示した。さらなるドメイン固有事前学習は、容易なタスクではわずかな向上を示したが、複雑でデータが不足しているタスクでは大幅な改善をもたらした。結果は、専門ドメインにおけるSLMのファインチューニングの重要な役割を強調しており、これにより、ターゲット分類タスクにおいてゼロショットLLMのパフォーマンスを上回ることが可能となる。ドメイン近接またはドメイン固有のデータによる事前学習は、特に複雑な問題やファインチューニングデータが限られている場合に、さらなる利点を提供する。LLMは強力なゼロショット能力を提供するが、これらの特定のタスクにおけるパフォーマンスは、適切にファインチューニングされたSLMに匹敵しなかった。LLMの時代においても、SLMは関連性と有効性を持ち続け、LLMと比較して、潜在的に優れたパフォーマンスとリソースのトレードオフを提供する。

原文（英語）を表示

Title (EN): Small or Large? Zero-Shot or Finetuned? Guiding Language Model Choice for Specialized Applications in Healthcare

arXiv:2504.21191v2 Announce Type: replace-cross
Abstract: This study aims to guide language model selection by investigating: 1) the necessity of finetuning versus zero-shot usage, 2) the benefits of domain-adjacent versus generic pretrained models, 3) the value of further domain-specific pretraining, and 4) the continued relevance of Small Language Models (SLMs) compared to Large Language Models (LLMs) for specific tasks. Using electronic pathology reports from the British Columbia Cancer Registry (BCCR), three classification scenarios with varying difficulty and data size are evaluated. Models include various SLMs and an LLM. SLMs are evaluated both zero-shot and finetuned; the LLM is evaluated zero-shot only. Finetuning significantly improved SLM performance across all scenarios compared to their zero-shot results. The zero-shot LLM outperformed zero-shot SLMs but was consistently outperformed by finetuned SLMs. Domain-adjacent SLMs generally performed better than the generic SLM after finetuning, especially on harder tasks. Further domain-specific pretraining yielded modest gains on easier tasks but significant improvements on the complex, data-scarce task. The results highlight the critical role of finetuning for SLMs in specialized domains, enabling them to surpass zero-shot LLM performance on targeted classification tasks. Pretraining on domain-adjacent or domain-specific data provides further advantages, particularly for complex problems or limited finetuning data. While LLMs offer strong zero-shot capabilities, their performance on these specific tasks did not match that of appropriately finetuned SLMs. In the era of LLMs, SLMs remain relevant and effective, offering a potentially superior performance-resource trade-off compared to LLMs.

Published: 2025-09-24 19:00 UTC

医療専門用途における言語モデル選択の指針：小規模か大規模か？ゼロショットかファインチューニングか？

コメントする コメントをキャンセル

コメントするコメントをキャンセル