ANDREI KOVALEV'S
MIDJOURNEY
GUIDE

IMAGE-TO-TEXT
CREATIVE TECHNIQUE
FOR AI ARTISTS
FEB 18 2023
"I am always doing that which I cannot do, in order that I may learn how to do it." - Pablo Picasso
If you read my previous guides, you know my prompting strategy is highly minimalist. I keep my prompts as simple as possible. And when going for a complex one, I will comprise it of smaller parts that I know work by themselves. It allows control and repeatability of the outcome.
lonely hero in field by Hayao Miyazaki --v 4
far dieselpunk megastructure by Hayao Miyazaki
crepuscular rays by Hayao Miyazaki
lonely hero in field with far dieselpunk megastructure and crepuscular rays by Hayao Miyazaki
The idea is to check that Midjourney can interpret specific parts of the final complex prompt on their own in a given style.
However, in this study, the prompts used to achieve the final results are the exact opposite of my approach. They are wordy, seem half-random, and some parts don't make sense—even as stand-alone prompts.
Image-to-text AI as creative tool for AI artists
a couple of people laying on top of a lush green field, by Elsa Bleda, magic realism, anya taylor - joy and emma stone, wide high angle view, barefeet in grass, marat zakirov, concert, two girls, hammershøi, yelena belova, midair, benjamin vnuk, alexi zaitsev, innocence --ar 3:2 --v 4
The method we will dive into today can be such a fantastic creative tool that I, for once, will step away from my cozy and controlled prompt strategy. (≧∀≦)

IMAGE TO TEXT TO IMAGE

Image-to-text AI as creative tool for AI artists
A cover photo I made for RBC magazine some years ago. The previous image is its image-to-text interpretation made by CLIP Interrogator 2.
In a nutshell, CLIP Interrogator 2 (available to play with at Huggingface for free↗) analyzes an image and comprises a prompt based on what it "saw."
[ORIGINAL IMAGE]
movie poster about deadly atompunk Archbishop. Hand-colored photographic print --ar 2:3 --test --creative
[INTERPRETATION]
a black and white photo of a man with red smoke coming out of his head, inspired by Matthias Stom, nuclear art, priest, bbc promotional artwork, looking furious, butcher billy, still image from tv series, le corbusier, sacrament, full color still, ffffound, a bald, fellini almodovar john waters, 2 0 s --ar 2:3
It's difficult to believe, but despite the inflated and often absurd prompts, their interpretation is stunningly close to the original. It seems like an AI knows how to talk to another AI... |。_。|
[ORIGINAL IMAGE]
Star Wars characters by Tsutomu Nihei and Hilma af Klint --ar 16:9
[INTERPRETATION]
a couple of women standing next to each other, by Petros Afshar, afrofuturism, starwars, three women, sakimichan frank franzzeta, oilpunk, japanese pop surrealism, edgar maxence, trio, norman rockwell ross tran, beautiful art uhd 4 k, !5 three eyed goddesses, tom bagshaw style --ar 16:9
CLIP Interrogator's primary goal is Stable Diffusion prompts. However, the experience of using its prompts in Midjourney is simply fantastic! It gives you a whole new perspective on creating art in MJ and launches a creative journey to chaos, full of unexpected insights and mind-blowing visuals.
[ORIGINAL IMAGE]
necromonk. Chemigram. Acid colors --ar 4:5 --test --creative
[INTERPRETATION]
a close up of a skull on a colorful background, digital art, by Aleksander Kotsis, psychedelic art, liquid fire, yellow infrared, in style of beeple, 70s psychedelic style, whirling death, colored analog photography, voodoo --ar 2:3
[ORIGINAL PHOTO]
patron of the Northern nature by Wangechi Mutu --ar 2:3
[INTERPRETATION]
a painting of a woman with antlers on her head, inspired by Yoann Lossel, sylvain sarrailh and igor morski, beeple and james jean, afrofuturist, high detailed), beautiful detail and color, behance. high detail, she has a crown of dried flowers, dark skin, tom bagshaw inspired, portrait of female humanoid, wearing a crown made of antlers --ar 2:3
CLIP Interrogator has a very straightforward interface, works fast (all samples in this study were interpreted in between 30 to 120 seconds each), and has only four settings.
Different modes render different results, and each is worth experimenting with. For this study, however, I primarily used the default Best mode.
[ORIGINAL IMAGE]
cat wanderer wearing long coat in Istanbul by Ando Fuchs --ar 2:3 --test --creative
[BEST MODE at 16 flavors]
a person wearing a red coat and a black cat mask, a picture, by Maciej Kuciara, shutterstock, digital art, he is wearing a trenchcoat, cinematic outfit photo, red and brown color scheme, alucard, new cats movie, stock photo --ar 2:3
[CLASSIC MODE]
a person wearing a red coat and a black cat mask, a picture by Maciej Kuciara, shutterstock, digital art, he is wearing a trenchcoat, superhero with a cat head, anthropomorphic cat --ar 2:3
[FAST MODE]
a person wearing a red coat and a black cat mask, he is wearing a trenchcoat, superhero with a cat head, anthropomorphic cat, blacksad, catman, shadowy and eerie character, mysterious man, the cat is walking, by Maciej Kuciara, by maxim verehin, sergey zabelin, cat the assassin --ar 2:3
[ORIGINAL IMAGE]
Fairy Kei samurai by Erwin Olaf
[INTERPRETATION]
a woman with pink hair wearing a helmet, inspired by Kanō Naizen, hyperrealism, chinese armor, trending on juxtapoz magazine, elaborate costume, pale and coloured kimono, award winning color photo, mikado, portrait of a warrior, portrait of a japanese teen
[ORIGINAL PHOTO]
Goddess of Floral Decay by James C. Christensen photographed by Tim Walker
[INTERPRETATION]
a close up of a statue of a woman surrounded by flowers, digital art, inspired by james christensen, rococo, photography alexey gurylev, crown of blue flowers, clothes made out of flower, agostino arrivabene, jean-sebastien rossbach, portrait of virgin mary, dressed in ornate, shaxi

NEW PERSPECTIVE

I started the previous chapter with an example from my photographic portfolio. Of course, it's not the only one. :) For visual artists with a portfolio of past works, CLIP Interrogator is an excellent instrument to re-imagine their work and look at it with fresh eyes.

Of course, I couldn't help but start by offering CLIP Interrogator the portrait of Francis D. for interpretation.
[ORIGINAL PHOTO]
Francis D.
[INTERPRETATION]
a man with a serious look on his face, a character portrait, by Jan Tengnagel, flickr, studio medium format photograph, mick jagger, tony hawk, portrait mode photo, medium format. soft light, gemmy woud - binendijk, neil breen face, candid portrait photograph, ribbed man face portrait, promo photo --ar 2:3
[ORIGINAL PHOTO]
Patrice Piveteau from Cognac Frapin
[INTERPRETATION]
a man standing in a field holding a stick, a portrait, by Matthias Stom, flickr, man with glasses, hugh kretschmer, destroying houses, black fork, production still, portrait of bald, tf 1, in garden, neuromancy, press shot, american gothic, 2014, medieval, adrien girod, 8 0 0 mm, promotional image --ar 2:3
[ORIGINAL IMAGE]
Danish performance artist Christian Falsnaes for Baku magazine
[INTERPRETATION]
a man sitting on a chair in a room, a character portrait, inspired by Nabil Kanso, conceptual art, principal set photography, standing in front of a mirror, eliran kantor, with professional lighting, portrait of mr bean, high res photo, hana alisa omer, in a studio, hq --ar 3:2
[ORIGINAL PHOTO]
Polar Stalker. A photograph from my 2018 photography course
[INTERPRETATION]
a man sitting on a pile of luggage in the woods, a portrait, by Kurt Roesch, old man doing with mask, smoky laboratory, tools and junk on the ground, red fog, santa, 8k octan advertising photo, from the borderlands series, glenn barr, yulia nevskaya, marmoset toolbag, viral photo --ar 3:2
[ORIGINAL PHOTO]
Promo for DiveBar Tbilisi
[INTERPRETATION]
a woman standing in front of a table filled with candles, inspired by Elsa Bleda, light bulbs, gardening, cinematic industrial lighting, woman made of plants, a wide full shot, with professional lighting, advert, charli bowater --ar 3:2
Apart from looking at your own work through different optics, I easily imagine text-to-image helping artists overcome creative blocks. How about not just getting a prompt from an image but developing a whole story from one picture?

IMAGE TO TEXT TO STORY TO IMAGE

Undoubtedly, you've heard about ChatGPT ↗. So I'll skip the introduction and cut to the chase: I used it to generate a sensible short story from the CLIP Interrogator's results. Here is what I asked:
Please, rewrite the following gibberish into a sensible text. Use all the words from the original.
And this is what happened:
Image-to-text AI as creative tool for AI artists
[ORIGINAL IMAGE]
Anime squad of most incredible creatures by Hayao Miyazaki and Josef Sudek --ar 16:9 --v 4
[CLIP INTERROGATOR INTERPRETATION]
a group of cartoon characters sitting around a table, inspired by Jean Tabaud, anime movie screenshot, anthropomorphic rabbit, dark sci - fi movie scene, tea ceremony scene, 3 0 s, high quality image, anime elon musk, junji ito 4 k, an epic fantastic realism, captura, deco, in claymation, high quality anime --ar 16:9
[CHATGPT RE-INTERPRETATION]
A group of anthropomorphic rabbits, depicted as cartoon characters, are captured in a high quality image inspired by the works of Jean Tabaud. The image shows the characters sitting around a table, participating in a tea ceremony scene from an anime movie screenshot set in a dark sci-fi environment. The image is a stunning example of epic fantastic realism, emphasized by its high quality 4K technology. The style is reminiscent of the works of Junji Ito and showcases elements from anime and claymation. The overall aesthetic is characterized by captivating deco elements, making it a top-notch anime piece --ar 16:9
[ORIGINAL IMAGE]
closeup portrait of biopunk asian female floral cyborg, cyberpunk face made of plants. By Piero della Francesca --ar 2:3 --test --creative
[CLIP INTERROGATOR INTERPRETATION]
a painting of a woman surrounded by flowers, a surrealist painting, by Ikuo Hirayama, naotto hattori, detailed face with mask, draped in rich green and pink, gong li, ayami kojima and yoshitaka amano, alexey egorov, stems, (collage), hiromasa ogura, yang qi --ar 2:3
[CHATGPT RE-INTERPRETATION]
A surrealist painting showcases a woman, surrounded by blooming flowers. This masterpiece was crafted by the talented Ikuo Hirayama and is reminiscent of the works of Naotto Hattori. The subject's detailed face is partially obscured by a mask, and she is draped in a garment of rich green and pink. The painting draws inspiration from the works of Gong Li, Ayami Kojima, and Yoshitaka Amano, as well as Alexey Egorov's style. The woman is surrounded by stems and the piece has a collage-like quality, reminiscent of Hiromasa Ogura's style. The artwork is a collaboration with Yang Qi. --ar 2:3
But how about we remove the limitation and let ChatGPT run wild(er)?
Please, rewrite the following gibberish into a short story (think of it as a movie still). Use all the words from the original.
[ORIGINAL IMAGE]
gigantic robotic biopunk cat walking through city --ar 2:3
[CLIP INTERROGATOR INTERPRETATION]
a man walking down a street past a giant robot cat, trending on Artstation, street art, hr giger. 8 k, modern street art, anthropomorphic female cat, absolutely outstanding image, in the city, abduzeedo, blue cyborg, full shot photo --ar 2:3
[CHATGPT RE-INTERPRETATION]
In the bustling city, a man walked down a neon-lit street past a giant robot cat, a marvel of modern street art that had been trending on Artstation. The anthropomorphic female cat loomed over him, her metallic body shining in the flickering lights. The street was filled with other pieces of street art, but nothing could compare to the absolutely outstanding image of the robot cat, reminiscent of the twisted creations of HR Giger. As the man passed by, he couldn't help but snap a full shot photo of the blue cyborg, determined to show it off to his friends back home. Little did he know, the robot cat was more than just a piece of art - it was a powerful machine, capable of abducting unsuspecting victims and taking them to an unknown destination. As the man walked on, he was oblivious to the danger lurking in the shadows, and the looming threat of the robot cat --ar 2:3

CONCLUSION

Image-to-text AI tools may be great for analyzing other peoples' generations to figure out their prompts, overcoming a creative block, or finding a new perspective on your past work. And for sure—for having pure artistic fun!
[ORIGINAL IMAGE]
Five generation of Georgian Women. Personal project
[INTERPRETATION]
a black and white photo of a group of women, an album cover, by irakli nadar, victorian setting, all from the group nct, dressed in black, liepke, victorian inspired clothing, alexi zaitsev, family photo, ji-min, group of seven, exclusive, slide show, living, fashion, orthodox, iu, 8 --ar 3:2
Happy midjourneys!

— Andrei
If you enjoy our guides and studies, please consider supporting us on Patreon →
Get access to a private Midlibrary Discord channel: community, tips, tricks, insights, fav styles, and more.

DAILY NEW STYLES ON OUR INSTAGRAM →

MOST POPULAR GUIDES

GET WEEKLY UPDATES
Editor's Pick: Top Styles, essential Midjourney Guides, Case Studies, and more!
After you subscribe, you will receive one email weekly. We never share your email with anyone outside our team and infrastructure. Don't worry, after signing up, you can unsubscribe from our newsletter anytime.
SUBSCRIBE TO
MIDLIBRARY WEEKLY!
Editor's Pick: Top Styles, essential Midjourney Guides,
Case Studies, and more!
After you subscribe, you will receive one email weekly. We never share your email with anyone outside our team and infrastructure. Don't worry, after signing up, you can unsubscribe from our newsletter anytime.

ANDREI KOVALEV'S
MIDJOURNEY
GUIDE

Series of observations and lab experiments exploring and showcasing various aspects of Midjourney AI and its applications in real life!

SEND FEEDBACK

If you want to suggest a missing style, report a bug, have a collaboration idea or simply want to say hi—send me a note!
If you subscribe to the Newsletter, you will receive one email weekly with Midlibrary's
most exciting updates (all free!). Don't worry, you can easily unsubscribe at any time. Otherwise, Andrei will only use your email to send feedback (if needed). We never share your email with anyone outside our team and infrastructure.