Large Model Based Crossmodal Chinese Poetry Creation

Oct 4, 2024·

L. Yang

Equal contribution

张志东

Equal contribution

K. Niu

S. Pan

W. Zhu

C. Ma

· 0 min read

Slides DOI

System Structure

Abstract

Generating Chinese poetry is a complex task with significant potential for large models. However, most current systems only support single-model of input and the output lacks interpretability. This paper proposes a large model based system that supports cross-modal input of text and image, provides interpretable annotations for generated Chinese poems, and sup- ports multiple rounds of iterative optimization. First, it analyzes images with CLIP and MiniGPT-4 and generates descriptive text from analysis with ERNIE-4.0. Then, it generates Chinese ancient poems from the input text and descriptive text by ERNIE-4.0, using our devised prompts based on CRISPE. Finally, it evaluates and then optimizes the created poems with prompts based on few-shot. Preliminary evaluations have validated the efficacy of our poetry scoring criteria and demonstrated the superior performance of the system when utilizing the conjunction of text and imagery as cross-modal inputs.

Type

Conference paper

Publication

In 2024 IEEE Smart World Congress (SWC)