PromptSculptor：マルチエージェントベースのテキストツーイメージプロンプト最適化

なぜ重要か: 企業や社会への影響が見込まれ、一般メディアにも波及する可能性があります。

ソースを読む（export.arxiv.org）

arXiv:2509.12446v2発表種類：代替交差

要約：生成AIの急速な発展により、Text-to-Imageモデルなどの強力なツールへのアクセスが民主化されました。しかし、高品質な画像を生成するためには、ユーザーは依然としてシーン、スタイル、コンテキストを詳細に指定するプロンプトを作成する必要があり、多くの場合、複数回の改良を繰り返す必要があります。本稿では、この反復的なプロンプト最適化プロセスを自動化する、新規マルチエージェントフレームワークであるPromptSculptorを提案します。本システムは、タスクを4つの専門化されたエージェントに分解し、それらが共同して短く曖昧なユーザープロンプトを包括的で洗練されたプロンプトに変換します。Chain-of-Thought推論を活用することで、本フレームワークは隠れたコンテキストを効果的に推論し、シーンと背景の詳細を豊かにします。プロンプトを反復的に改良するために、自己評価エージェントは修正されたプロンプトと元の入力との整合性を確認し、フィードバック調整エージェントはさらなる改良のためにユーザーフィードバックを組み込みます。実験結果は、PromptSculptorが生成出力の品質を大幅に向上させ、ユーザー満足度を得るために必要な反復回数を削減することを示しています。さらに、そのモデル非依存的な設計により、様々なT2Iモデルへのシームレスな統合が可能となり、産業応用への道を拓きます。

原文（英語）を表示

Title (EN): PromptSculptor: Multi-Agent Based Text-to-Image Prompt Optimization

arXiv:2509.12446v2 Announce Type: replace-cross
Abstract: The rapid advancement of generative AI has democratized access to powerful tools such as Text-to-Image models. However, to generate high-quality images, users must still craft detailed prompts specifying scene, style, and context-often through multiple rounds of refinement. We propose PromptSculptor, a novel multi-agent framework that automates this iterative prompt optimization process. Our system decomposes the task into four specialized agents that work collaboratively to transform a short, vague user prompt into a comprehensive, refined prompt. By leveraging Chain-of-Thought reasoning, our framework effectively infers hidden context and enriches scene and background details. To iteratively refine the prompt, a self-evaluation agent aligns the modified prompt with the original input, while a feedback-tuning agent incorporates user feedback for further refinement. Experimental results demonstrate that PromptSculptor significantly enhances output quality and reduces the number of iterations needed for user satisfaction. Moreover, its model-agnostic design allows seamless integration with various T2I models, paving the way for industrial applications.

Published: 2025-09-24 19:00 UTC

PromptSculptor：マルチエージェントベースのテキストツーイメージプロンプト最適化

コメントする コメントをキャンセル

コメントするコメントをキャンセル