Exploring Midjourney: Image Diffusion, Effective Prompts, and Grok 2

Aug 21, 2024

Introduction

This week, I began exploring some of the AI imagery platforms. As a designer, you might think I’m a bit late to the party since these tools have been around for a while. However, I never really took the time to sit down and explore how they work and what you can create with them. The arrival of Grok 2 from x.ai also got me quite interested.

As it turns out, these platforms, like most AI tools, are constantly changing and improving so no matter when you decide to get stuck in, feel comfortable knowing the fact that everyone is a new comer!

As I’ve been diving into this, I’ve spent hours going down rabbit holes with images. Interestingly, as I created and edited them, I found myself feeling strangely attached to the results—almost proud of them. It’s an odd feeling considering all I’ve done is describe what I want through a series of prompts. I’ve gone from frustration to amazement, and as I’ve delved deeper into it, I’ve realised there’s an art and science to crafting prompts. I discuss this a bit more below. Once you start to understand how to curate and refine your prompts, you’ll be amazed at the results.

One thing I noticed at the beginning was that my imagination seemed to disappear, but as I got more into it, crazier ideas started popping into my head. I hope this post inspires you to start exploring what you can create with these tools.

Midjourney

I was planning to do a little review of each of the imagery platforms, but there’s so much to learn that I decided to focus on Midjourney. The key thing to remember is that they all work in a similar way using an image diffusion process (explained in the video below), but each platform has its own style. For example, if you typed the same text prompt into different platforms, the styles would be slightly different, and that’s because of the following:

Diffusion Process: They all have slightly different diffusion techniques, which involve starting with random noise and gradually refining it into a coherent image through multiple iterations.
Training Data: They are trained on different datasets of text-image pairs, enabling them to understand and generate images based on textual descriptions.

The video below explains how image diffusion works. To be honest, it took a while for it all to sink in. Fortunately, I found a fantastic video from a guys called Derrick Schultz from Artificial Images where he explains it really well. I highly recommend watching it, as understanding the process and the importance of prompts is crucial.

Prompts

When I say prompts, they are the short text phrases that you type in to the imagery tools like Midjourney and they interpret it to produce an image. They can be as simple as a single word, phrase, or even an emoji. The tools break down the words and phrases in a prompt into smaller pieces, called tokens, that can be compared to its training data and then used to generate an image.

To write effective prompts you have to be specific and concise, use descriptive language, avoid ambiguity, and experiment with different prompts. You can also use negative keywords to exclude certain elements from your image.

Below I’ve shown the basic elements of writing good prompts, It can get much more complicated but I’ve found that following this you can create some pretty cool imagery.

The image off the back of this prompt…

Here are some helpful resources for building your prompts, which I’ve listed below. However, one of the easiest methods is to describe what you want in a tool like ChatGPT and ask it to create a well-crafted prompt for Midjourney.

Botcreative

This site helps you find a style to reference in your prompts and it also helps with inspiration

Promptomania

This tools gives you a helping hand creating the prompts and also allows you to add weighting, meaning its allows you to give part of your prompts more weight and make them more or less important.

Some of my images in Midjourney

Below are some of the images I created in Midjourney. I’ve includes the prompts underneath each one. If you ever wanted a window into my brain here it is.

A giraffe, scuba diving, in a deep dark ocean, with a shoal of fish circling it, with detail I can't believe in a photo, with colours going from light to dark

An old mans face, who just returned from the coal mine to see his family, in detail I can't believe in a photo, in black and white

A well crafted wood hut with large glass windows, a large log fire burning, overlooking a stormy ocean with waves, with a surfer surfing the waves, the sun is setting, in detail you can't believe

At the peak of a snow capped steep mountain, a camel looks into the distance, lost, surrounded by other mountain ranges, in a photo realistic, in photo realistic detail

A dinosaur, walking down a typical suburban street in England, looking from behind with the road in the distance, on a rainy day, with people amazed as it passes them, in a photo realistic style

Some questions I had…

If I create an image using Midjourney, who owns it?

For MidJourney, paid users own the images they create and can use them commercially, but MidJourney retains a license to use those images. Free users don’t have exclusive ownership; their images fall under a Creative Commons license, allowing personal use with attribution but no commercial use. But always check MidJourney’s terms of service for the most current details.

If I type in the exact same prompt twice, will I get the same image?

If you type in the exact same prompt twice, you will typically not get the same image. Most AI image generation tools, including MidJourney, incorporate some level of randomness or variation in their algorithms. This randomness ensures that even with the same prompt, the output can vary slightly or significantly each time. This feature is designed to allow for creative diversity and exploration, providing different interpretations of the same input.

Grok 2 Release

Whilst I was pulling this post together, Grok 2 was release from x.ai. I thought it was worth me mentioning it as its come with a fair amount of controversies. The main one being it has come with fewer restrictions. With users already creating controversial images.

One of the key advantages of Grok-2 is its tight integration with the X platform (formerly Twitter). The integration allows Grok-2 to leverage real-time information from X to provide users with the most current data available, which could differentiate it from other AI models that rely on static datasets.

Grok-2 uses the Flux AI model for image generation, which incorporates diffusion technology as we’ve been through above. This integration allows Grok-2 to generate images from text prompts with high fidelity and realism. The Flux model, developed by Black Forest Labs, is noted for its advanced capabilities in producing detailed and varied images, outperforming many existing text-to-image generators in terms of quality and prompt adherence.

Below the first image was created by Midjourney, the second by Grok 2. You can really see the different styles but also the high fidelity and realism of the flux model.

(Source of the image and example: Generative AI Publication)

Summary

Midjourney is an incredible tool that has added a whole new dimension to my creative process. Once my imagination started to take over, I found it to be an incredibly enjoyable experience. However, at the moment, I'm not sure I'll be diving back into it immediately, aside from writing this post and getting a better grasp of how it works. But who knows.

My day to day

In my day-to-day job, AI imagery tools haven't made a significant impact yet, though I can see their potential in areas like product shots or adding scenes to videos. They could also be valuable for creating storyboards, which would be quite useful. Even for tasks like social media or blog posts, tools like Midjourney could help produce a visually appealing set of artwork to work with.

What does this mean for photographers

As I write this, I can't help but think about the implications for photographers and illustrators today. If you can generate a photo in the exact style you need, why would a company invest in commissioning a photographer or illustrator? These are tough questions, but I believe they are important to consider.

It’s possible that these tools could assist in the more time-consuming aspects of roles such as editing maybe and speeding up the process, allowing photographers to focus on other creative elements.

I came across a great post by photographer Paul Reiffer. In his post Artificial Stupidity – Could AI be the End of Landscape Photography? he says in response to his question, So – Are We All Doomed? - this is his response:

So as photographers, should we be worried?
If I was shooting product, still life, or studio photography that shows off a pre-staged environment – I’d be concerned. It’s possible that AI could help and become a part of my workflow – but where that type of photography has huge setup costs to “fake the scene”, AI can step in to literally fake it for 0.001% of the current outlay.
Is that fair, or positive for the industry? Probably not – but it’ll become a reality.
For those of us who capture scenes that are spontaneous, crafted by nature, out of our control – the threat would seem to depend on how and why we do it.
If it’s staged – by definition, AI can stage it – faster, cheaper, arguably better.
If you prefer to “blend” realities, or fake your images, then AI’s got your job already.
But if we’re there to capture a real moment that actually happened, and share that with other humans to evoke a feeling, a memory, an emotion – then AI’s all out of tricks.

When I was looking at Pauls work, a headline caught my eye above his gallery, its says

Our iconic artwork comes with a disclaimer: Every photograph you see is real moment in time.

I imagine we will be seeing more disclaimers like this.

Misuse of AI-generated images

I am slightly concerned about the potential misuse of AI-generated images, as it’s becoming nearly impossible to distinguish between what is real and what is not—especially in examples like Grok. It's already challenging to discern reality from misinformation in the news, and this will only add to the confusion. However, I’m cautiously optimistic that, alongside the risks, we will develop solutions to manage this and highlight AI-generated content. In addition to detection tools, it's crucial to educate people on the risks, what to look out for, and to encourage critical thinking.

Something a bit more light hearted to finish

Whilst writing this post and playing with Midjourney, my daughters Amelie (10) and Evie (7) were over my shoulder asking me about the images. They were keen to create their own so here you go, you’re very welcome!

Wonder.ai

Discussion about this post