Large Model Based Crossmodal Chinese Poetry Creation

Oct 4, 2024·
L. Yang
Equal contribution
张志东
张志东
Equal contribution
,
K. Niu
,
S. Pan
,
W. Zhu
,
C. Ma
· 0 min read
System Structure
Abstract
Generating Chinese poetry is a complex task with significant potential for large models. However, most current systems only support single-model of input and the output lacks interpretability. This paper proposes a large model based system that supports cross-modal input of text and image, provides interpretable annotations for generated Chinese poems, and sup- ports multiple rounds of iterative optimization. First, it analyzes images with CLIP and MiniGPT-4 and generates descriptive text from analysis with ERNIE-4.0. Then, it generates Chinese ancient poems from the input text and descriptive text by ERNIE-4.0, using our devised prompts based on CRISPE. Finally, it evaluates and then optimizes the created poems with prompts based on few-shot. Preliminary evaluations have validated the efficacy of our poetry scoring criteria and demonstrated the superior performance of the system when utilizing the conjunction of text and imagery as cross-modal inputs.
Type
Publication
In 2024 IEEE Smart World Congress (SWC)
张志东
Authors
张志东 (he/him)
Master Student