Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
August 19, 2022 05:19 am GMT

I spent $15 in DALLE 2 credits creating this AI image, and heres what I learned

Yes, thats a llama dunking a basketball. A summary of the process, limitations, and lessons learned while experimenting with the closed Beta version of DALLE 2.


Llama playing basketball, generated using DALLE 2 by author.

This article was originally published by me on Medium.

Ive been dying to try DALLE 2 ever since I first saw this artificially generated image of a Shiba Inu Bento Box.

Wow now thats disruptive technology.

For those of you unfamiliar, DALLE 2 is a system created by OpenAI that can generate original images from text.

Its currently in closed Beta I signed up for the waitlist in early May and got access at the end of July. During the Beta, users receive credits (50 free in the first month, 15 credits every month after that) where every use costs 1 credit, and each use results in 34 images. You can also purchase 115 credits for US$15.

P.S. If you cant wait to try it, give DALLE mini a go for free. However, the quality of its images are generally poorer (giving rise to a host of DALLE memes) and takes about ~60 seconds per prompt (DALLE 2 in comparison only takes 5 seconds or so).

Youve probably seen various cherry-picked images online showing what DALLE 2 is capable of (provided the right creative prompt). In this article, I share a candid walkthrough of what it takes to create a usable image from scratch for the subject matter: a llama playing basketball. You might find it useful if youre thinking of trying out DALLE 2 yourself, or youre just interested in understanding what its capable of.

The starting point

Theres both an art and science to knowing what prompt to feed DALLE 2. To illustrate, here are the results for llama playing basketball:


Images generated by the author using DALLE 2 with prompt llama playing basketball.

Why is DALLE 2 inclined to generate cartoon images for this prompt? I assume it has something to do with the lack of actual images of a llama playing basketball seen during training.

I attempted to go a step further by adding the key term realistic photo of:


Images generated by the author using DALLE 2 with prompt realistic photo of llama playing basketball

That llamas looking more photorealistic, but the whole image is starting to look like a botched Photoshop job. In this case, DALLE 2 clearly needed some hand-holding to create a cohesive scene.

Prompt engineering, aka the art of specifying exactly what you want

In the context of DALLE, prompt engineering refers to the process of designing prompts to give you the desired results.

The DALLE 2 Prompt Book is a fantastic resource for this. It contains a detailed list of inspirations for prompts using keywords from photography and art.

Why is something like this necessary? Because getting a usable output from DALLE 2 is finicky (especially when youre not sure what DALLE 2 is capable of). So much so that a new startup is creating a marketplace charging $1.99 for prompts to save you the time and money from coming up with your own.

My personal favorite find is dramatic backlighting:


Now were talking! Images generated by the author using DALLE 2 with prompt: Film still of a llama dunking a basketball, low angle, extreme long shot, indoors, dramatic backlighting.

Its important to tell DALLE 2 exactly what you want. Apparently, its not obvious from the context that this llama should be dressed for the occasion. DALLE 2 does a great job realizing this fantasy scene however, when llama wearing a jersey is specified:


Basketball dunking llama, now comes with jerseys. Images generated by author with DALLE 2 using prompt: film still of an alpaca wearing a jersey, dunking a basketball, low angle, long shot, indoors, dramatic backlighting, high detail.

It doesnt stop there. To add some drama to the image and really get this llama flying, I needed to specify phrases such as dunking a basketball', action shot of, or my personal favorite: llama in a jersey dunking a basketball like Michael Jordan:


Michael Jordan if he was a llama, according to DALLE 2. Images generated by author with DALLE 2 using prompt film still of a llama in a jersey dunking a basketball like Michael Jordan, low angle, show from below, tilted frame, 35, Dutch angle, extreme long shot, high detail, indoors, dramatic backlighting..

Tip: DALLE 2 only stores the previous 50 generations in your history tab. Make sure to save your favourite images as you go.

You might have noticed: DALLE 2 isnt great at composition.

Youd think that from the context of dunking a basketball, itd be obvious where the relative positions of the llama, ball, and hoop should be. More often than not, the llama dunks the wrong way, or the ball is positioned in such a way that the llama has no real hope of making the shot. Though all the elements of the prompt are there, DALLE 2 doesnt truly understand the relationship between them. This article covers the topic in more depth.


Image generated by author using DALLE 2 with prompt: Film still of a llama in a jersey dunking a basketball like Michael Jordan, low angle, shot from below, tilted frame, 35, Dutch angle, extreme long shot, high detail, indoors, dramatic backlighting.

Another artifact of DALLE 2 not really understanding the scene is the occasional mix-up in textures. In the image below, the net is made out of fur (a morbid scene once you think about it):


Image generated by author using DALLE 2 with prompt: Expressive photo of a llama wearing a jersey dunking a basketball like Michael Jordan, low angle, extreme wide shot, indoors, dramatic backlighting, high detail.

DALLE 2 struggles to generate realistic faces

According to some sources, this may have been a deliberate attempt to avoid generating deepfakes. I thought that would only apply to human subjects, but apparently, it applies to llamas too.

Some of the results were downright creepy.


Image generated by author using DALLE 2 with prompt: Dramatic photo of an llama wearing a jersey dunking a basketball like Michael Jordan, low angle, wide shot, indoors, dramatic backlighting, high detail.

Some other limitations of DALLE 2

Here are some other minor issues I experienced:

Angles and shots are interpreted loosely

No matter how many variants of in the distance or extreme long shot I used, it was difficult to find images where the entire llama fit within the frame.

In some cases, the framing was ignored entirely:


Image generated by the author using DALLE 2 with prompt: Dramatic film still of a llama wearing a jersey dunking a basketball, low angle, shot from below, tilted frame, 35, Dutch angle, extreme long shot, indoors, dramatic backlighting, high detail.

DALLE 2 cant spell

I guess this shouldnt be too surprising given that DALLE 2 struggles to understand the relationship between components. It is, however, capable of attempting some fully formed letters in the right context:


Image generated by author using DALLE 2 with prompt: Film still of a fluffy llama in a jersey dunking a basketball like Michael Jordan, low angle, shot from below, tilted frame, 35, Dutch angle, extreme long shot, high detail, indoors, dramatic backlighting.

DALLE 2 can be temperamental with complex or poorly-worded prompts

Occasionally, adding keywords or phrasing the prompt in certain ways led to results that were completely different from what was expected.

In this case, the real subject of the prompt (llama wearing a jersey) was completely ignored:


Now that is an impressive dunk. Images generated by author using DALLE 2 with prompt: A low angle, long shot, indoors, dramatic backlighting, professional photo of a llama wearing a jersey, dunking a basketball.

Even adding the term fluffy led to dramatically worse performance and multiple cases where it looked like DALLE 2 just broke:


Images generated by the author using DALLE 2 with prompt: Film still of a fluffy llama in a jersey dunking a basketball like Michael Jordan, high detail, indoors, dramatic backlighting. (Image intentionally modified to blur and hide faces).

In working with DALLE 2, its important to be specific about what you want without over-stuffing or adding redundant words.

DALLE 2s ability to transfer styles is impressive

You need to try this!

Once you have your keyword subject matter, you can generate the image in an impressive number of other art styles.

Abstract painting of.


Images generated by the author using DALLE 2 with prompt: Abstract painting of a llama in a jersey dunking a basketball like Michael Jordan, shot from below, tilted frame, 35, Dutch angle, extreme long shot, high detail, dramatic backlighting, indoors. In the background is a stadium full of people.

Vaporwave


Images generated by the author using DALLE 2 with prompt: Film still of a llama in a jersey dunking a basketball like Michael Jordan, dramatic backlighting, vibrant sunset, vaporwave.

Digital art


Images generated by the author using DALLE 2 with prompt: llama in a jersey dunking a basketball like Michael Jordan, shot from below, tilted frame, 35, Dutch angle, extreme long shot, high detail, dramatic backlighting, epic, digital art

Screenshots from the Miyazaki anime movie


Images generated by the author using DALLE 2 with prompt: Llama in a jersey dunking a basketball like Michael Jordan, screenshots from the Miyazaki anime movie. Thanks to the tip in this article.

Final thoughts

After over 100 credits (~US$13) and a lot of trial-and-error, heres my final image:


My winning image. https://labs.openai.com/s/HYv3Kp8ElKDAWKHq2vs76VXu

The image isnt perfect, but DALLE 2 managed to fulfill about 80% of the brief.

Most of the credits went towards trying to get the right combination of style, faces, and composition to work together.

According to OpenAIs DALLE announcement,

users get full usage rights to commercialize the images they create with DALLE, including the right to reprint, sell, and merchandise.

Expect many users to play fast and loose with these rules.

As a content creator, DALLE 2 will be most useful for creating simple illustrations, photos, and graphics for blogs and websites. Ill be using it as an alternative to Unsplash to create blog cover images that wont look the same as everyone elses.

If youre about to try out DALLE 2 yourself, heres a tl;dr of tips before you start:

  • Check out the DALLE 2 Prompt Book! (Also, the fan-made Prompt Engineering Sheet).
  • Be prepared to do some trial-and-error to get what you want. Fifteen free credits might sound like a lot, but it really isnt. Expect to use at least 15 credits to generate a usable image. DALLE 2 is not cheap.
  • Dont forget to save your favorite images as you go.

Thanks for reading! Id love to hear your experience with DALLE 2 and welcome any thoughts or feedback.

If you enjoyed reading this, here are some articles by other writers you might like as well:


Original Link: https://dev.to/joooyz/i-spent-15-in-dalle-2-credits-creating-this-ai-image-and-heres-what-i-learned-4hl1

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To