It's a little premature, fine, but I want to start liquidating my rhetorical swa...

bryced · on Jan 22, 2023

Been doing a lot with prompts lately. What people are calling "prompt engineering" I'd call "knowing what to even ask for and also asking for it in a clear manner". That was a valuable skill before computers and will continue to be one as AI progresses.

I've been pretty disappointed to introduce ChatGPT to people in jobs where it would be a game changer and they just don't know what to do with it. They ask it for not-useful things or useful things in a non-productive way. "here is some ad copy I wrote, write it better". Whether you're instructing a human, chatgpt, or AI god... that's just too vague of instructions.

Semaphor · on Jan 22, 2023

> I'd call "knowing what to even ask for and also asking for it in a clear manner".

It was a very important skill for searching. Nowadays, with Google "I know what you want better than you" search, it’s not that useful anymore (not useless, I get better search results by not using google and knowing what I want, just less required).

lobocinza · on Jan 22, 2023

Most people struggle with deliberate logical thought.

felipeerias · on Jan 22, 2023

IMHO it stems from lack of imagination. Impressive as the results may sometimes be, the user interfaces for AI are still extremely crude.

Soon we will see AI being used to define semantic operations on images that are hard to define exactly (imagine a knob to make an image more or less "cyberpunk", for example).

I also expect AI-powered inpainting to become a ubiquitous piece of functionality in drawing and editing tools (there are already Photoshop plugins).

Furthermore, my hunch is that many of the use cases around image creation will gradually move towards direct manipulation. Somewhat like painting, but without a physical model. AI components will be probably applied to interpreting the user's touch input in a similar way to how they are currently deployed to understand text input.

js8 · on Jan 22, 2023

Prompt engineering already exists, it's called management.

gpderetta · on Jan 22, 2023

If Asimov had robopsychologists in his stories, why can't we in real life? Who wants to be the first Susan Calvin?

chadcmulligan · on Jan 22, 2023

Haven't we been here before? - see self driving cars.

c7b · on Jan 22, 2023

LLMs and Image AIs are the opposite of self-driving cars. "Everybody" had concrete expectations for at least half a decade now that the moment where self-driving cars would surpass human ability was imminent, yet the tech hasn't lived up to it (yet). While practically nobody was expecting AI to be able to do the jobs of artists, programmers or poets anywhere near human level anytime soon, yet here we are.

Der_Einzige · on Jan 22, 2023

Still bad at poetry due to the tokenizer though. I wrote a whole paper on how to fix it: https://paperswithcode.com/paper/most-language-models-can-be...

c7b · on Jan 22, 2023

Great work, congratulations! One question, if I understood it right you based your demo on GPT-2 - what is your experience working with those open-source language AIs. In terms of computational requirements and performance?

I'm really fascinated by all the tools the OS community is building based on StableDiffusion (like OP's), which compares favourably with the latest closed-source models like Dall-E or Midjourney, and can run reasonably well on a high-end home computer or a very reasonably-sized cloud instance. For language models, it seems the requirements are substantially higher, and it's hard to match the latest GPT versions in terms of quality.

yyyk · on Jan 22, 2023

If LLMs (etc.) had the same requirements and business models as AV cars they'd still be considered a failure. Nobody expects Stable Diffusion to have a 6-sigma accuracy rate*, nor do we expect ChatGPT to seamlessly integrate into a human community. The AV business model discourages individual or small scale participation, so we wouldn't even have SD (would anyone allow a single OSS developer to drive or develop an AV car? Ok, there's Comma, that's all there is on the OSS side).

* The amount of times that I've seem an 'impressive' selections of AI images that I consider a critical failure deserves it's own word. The AIs are impressive for even getting that far, it's just that some people have bad taste and pick the bad outputs.

cachehit · on Jan 22, 2023

I've certainly seen this argument before..

Yes, it's true that not all technology evolves as fast as predicted (by some, at some point), but first of all I still believe we will see self driving cars in the future and secondly, it's one anti-example in a forest of examples of tech that evolves beyond anyone's expectations. I don't find it very convincing.

trompetenaccoun · on Jan 22, 2023

Autonomous driving as it currently exists came unexpected for most people. Now many look at it with the power of hindsight but back in the day the majority never thought we'd have cars (partly) driving on their own within a few years. The case of AI art seems the exact same to me, now that many are working on it there's lots of progress but it's still nowhere near what an experienced human can do. And that seems to be the general rule, not an exception.

We might need to create real intelligence for that to become true. A machine that can think and is aware of its purpose.

cududa · on Jan 22, 2023

fassssst · on Jan 22, 2023

Large language models are stateless. The apps and fine-tuned models are doing prompt engineering on users' behalf. It's very much a thing for developers, with the goal of making it invisible for end users.

xdfgh1112 · on Jan 22, 2023

I think that "prompt engineering" stuff went away when ChatGPT came out.

c7b · on Jan 22, 2023

Has it? I mean, maybe the idea of people doing this as a long-time career has, but practically, I still find it a challenge to get those AIs to do exactly what I want. I've played around with Dreambooth-style extensions now, and that goes some way for some applications, and I'm excited to try OP's solution, but in my experience, it is still a bit of a limitation for working with those AIs right now.

rjh29 · on Jan 22, 2023

Oh yeah it's definitely still an issue right now! But I think the power of ChatGPT's ability to understand and execute instructions has convinced most people that "prompt engineering" isn't going to be a career path in the future.

c7b · on Jan 22, 2023

Absolutely. I briefly thought about asking ChatGPT to write a prompt, but then I remembered that the training corpus is probably older than those tools (I heard that if you ask it the right way, it will tell you that its corpus ended in 21 - whether it's true or not, it sounds plausible). But that's a truly temporary issue, the respective subreddits probably have enough information to train an AI for prompt engineering already (if you start from a strong foundation like the latest GPT versions).

Plus, who knows whether future models won't be able to integrate those different modes much better (along those lines https://www.deepmind.com/publications/a-generalist-agent).

rjh29 · on Jan 22, 2023

In the near future you can totally imagine a dialogue like that you'd have with a real designer, "can you make it pop a bit more?" or "can you move that logo to the right side?". It might some trial and error but it's only going to improve.

Making the AI truly creative (which means going beyond what the client asks for, towards things the client doesn't even know they want) would be a much larger leap and potentially take a lot longer.

TeMPOraL · on Jan 22, 2023

I don't get it. Pre-ChatGPT prompt engineering was a BS exercise in guessing how a given model's front-end tokekizes and processes the prompt. ChatGPT made it only more BS. But I've seen a paper the other day, implementing more structured, formal prompt language, with working logic operators implemented one layer below - instead of adding more cleverly structured English, they were stepping the language model with variations of the prompt (as determined by the operators), and did math on probability distributions of next tokens the model returned. That, to me, sounds like valid, non-BS approach, and strictly better than doubling down on natural language.

c7b · on Jan 22, 2023

Think about the problem in an end-to-end fashion: the user has an idea of what sort of image they want, they just need an interface to tell the machine. A combination of natural language plus optional image/video input is probably the most intuitive interface we can provide (at least until we've made far more progress on reading brain signals more directly).

How exactly we get there, by adding layers like on top like language models, or adding layers below like what you described, doesn't seem like such a fundamental difference. It's engineering, you try different approaches, vary your parameters and see what works best. And from the onset, natural language does seem like a good candidate for encoding nuances like "make it pink, but not cheesy" or "has the vibes of a 50's Soviet propaganda poster, but with friendlier colors".