Hello everyone, we are Professor Chang’s ERSP 2023 cohort, also known as the UCSB Kangaroos. Our research topic is prompt engineering for stable diffusion using ChatGPT and prompt-based learning.
First, let’s go over some key terminology. Stable diffusion is a deep learning text-to-image model that generates images based on a written prompt. ChatGPT is a large language model developed by OpenAI that can interact in a conversational format through text.
Our key strategies include prompt-based learning, which is a method to train large language models without the need for retraining or fine-tuning, and chain of thoughts, which involves providing a stable diffusion prompt to ChatGPT along with an explanation of each component in the prompt.
We have used stable diffusion to generate images with and without specific modifiers. By adding trending modifiers such as lighting and composition, the relevance and aesthetic quality of the images improve.
Our research question is whether we can use ChatGPT to automate the prompt generation process and create relevant and aesthetically pleasing stable diffusion prompts.
Our solution involves prompt-based learning and the steps in reasoning to reconstruct a user prompt input to be more suitable for the stable diffusion model. We help ChatGPT identify patterns and components that make up a high-quality prompt by providing input-output examples and detailed explanations.
To evaluate our methods, we conducted a blind test ranking of images generated by each prompt method. The results showed that the control (ChatGPT plus chain of thoughts reasoning) outperformed our methods in terms of aesthetics.
From our experiments, we made three key observations. First, ChatGPT is capable of creating highly detailed prompts with aesthetically pleasing modifiers. Second, the combination of ChatGPT’s modifiers may not always perform well with unseen test prompts, resulting in lower quality and relevance. Third, ChatGPT’s modifiers can sometimes be unrelated to the original prompt, as aesthetic modifiers are prioritized.
In conclusion, ChatGPT has the capability to create detailed and relevant stable diffusion prompts with carefully written instructions. Further research can be done to find the optimal task description and chain of thought reasoning that generates the most relevant and aesthetically pleasing images.