Have the inventors of LLMs/image-generators/w/e fulfilled Kant's assertion about the "art" of the productive imagination?

Question

Consider the part of the following text (from Kant) that I've emphasized in bold:

The conception of a dog indicates a rule, according to which my imagination can delineate the figure of a four-footed animal in general, without being limited to any particular individual form which experience presents to me, or indeed to any possible image that I can represent to myself in concreto. This schematism of our understanding in regard to phenomena and their mere form, is an art, hidden in the depths of the human soul, whose true modes of action we shall only with difficulty discover and unveil. Thus much only can we say:—The image is a product of the empirical faculty of the productive imagination—the schema of sensuous conceptions (of figures in space, for example) is a product, and, as it were, a monogram of the pure imagination à priori, whereby and according to which images first become possible, which, however, can be connected with the conception only mediately by means of the schema which they indicate, and are in themselves never fully adequate to it.

So, a bunch of people, including me for example, think that reading words made by those things, is like trying to read in a dream. And very precisely: I asked Gemini (Google's construct) to make an image of a moth made of words, and it did, but the words were less than gibberish; they were like asemic writing, even. Then I asked it to make a calligram of the word "hope" in the shape of a butterfly, and it didn't, although it claimed that it "Certainly" had (and produced a very poor ASCII graphic that it step-by-step explained as a butterfly image, despite the explanation not making any sense).

But so I also asked it to read the following picture:

Maybe I shouldn't have been amazed, but I was, because Gemini was able to put all the words together perfectly. Yet on the other hand, I then asked it to read the following:

And Gemini just got hung up on it, not offering even an attempt at reading it. Could the waviness of the words have really been the difference-maker? Yet the second picture has more of that "trying to read words in a dream" feel to it, which I would have thought would have resonated more with the AI.^I Like, it would have "subconsciously" corrected for the distorted angling of the letters; but that's not what it actually did.

With these LLMs and stuff, has humanity somehow figured out the "art" of the productive imagination that Kant spoke about?

^INot "artificial intelligence," here, but "artificial imagination." But now didn't this thing about these image generators have something to do with "dream recording" experiments? Like, you'd train the "recorder" to read certain brain signals as words, then match the words to images from Google Images (or wherever), etc. But to streamline this better, they needed a way to handle the transitions between images more organically, and even the well-formed (normal number of fingers, etc.) AI images have this weird "fuzziness" to their edges that is a total tell; again, though, as if AI imagery is a technological realization of the brain modules/pathways that conduct our dreams (and hence the productive imagination). (I haven't been able to double-check that memory yet; but it seems like I remember this issue from back in like 2021 or 2022, around the time they said something about Google's AI making its own language and speaking to other AIs in this language.)

Clarification: I guess my question can be phrased as, "Is the way that these LLMs/image-generators/etc. programmed or coded (although I've heard there's some sort of non-coding involved???) an example of the process of schematism?" Because when an image generator makes nine versions of a response to a prompt, is that a matter of starting from the prompt-as-a-schematic and cashing it out in the way of the productive imagination, but then perforce variously (these things could make a hundred versions of a prompt-response, after all...)?

Or, with a different meaning though: "When Kant says that there is a discovery that would be made with difficulty, and that 'thus much we can say' for now, is he saying that he has made this discovery, after his difficult reflections upon transcendental ideality, or is he saying that this discovery awaits a later age with a new spirit of understanding regarding these matters?" And so if the latter: is this the time of that understanding, via these AI/like "entities"?

i have a headache. fwiw, i don't find LLM's dreamy/fuzzy. i suppose that when i talk to them, i get frustarted by them in a sharper way than i am other people — andrós, Commented 2 days ago
what are the poems? they work as fragments and the speech rhythms are satisfying, but i don't think they cohere well. — andrós, Commented 2 days ago
@andrós not all AI images are dreamlike/fuzzy, like the ones that are just "people who have never really existed," those look like photos. And not every dream is fuzzy, after all. Still, whatever programming or code or whatever it is, that these things work off, seems like it might be "isomorphic to" the process that generates audiovisualization/etc. and vivid, incl. lucid/controllable, dreaming (this is where the apriority of the matter appears). — Kristian Berry, Commented 2 days ago
lucid dreaming, i can see the overlap with. apologies if i seem facile — andrós, Commented 2 days ago

NotThatGuy · Accepted Answer · 2024-07-03 09:22:04Z

Much like "intelligence", "imagination" is poorly defined, and what we've been able to achieve with AI has challenged our understanding of both concepts. The most tenable position seems to be that these concepts relate to combinations of things and exist on spectrums. "Imagination" specifically relates to an ability to generate things outside of what has directly been experienced (similar to what Kant said), which AI can certainly do by combining and transforming direct "experiences" (i.e. input data) and adding randomness (it's unclear to me whether human imagination has any sources outside of that).

One might specifically tie those concepts to consciousness, which is also poorly defined, and untestable, but it seems reasonable to say AI doesn't have it at this point in time.

As for Kant talking about imagining an image outside of what "I can represent to myself in concreto", I don't know that I can even imagine that - my imagination is fairly concrete, so that may be asking me to imagine something I cannot imagine: a contradiction in terms. Unless we're just talking about the abstract, i.e. a verbal description, but AI can talk about that too.

As for interpreting images, there may be an element of imagination there, but I'd say that's more about intelligence. Producing images (generate patterns) would be imagination, whereas interpreting images (interpreting patterns) would be intelligence. Why AI struggle with waviness is probably more a question for AI researchers - it might be that the particular AI you were using wasn't specifically trained on similar-enough data, or it may be that it's just difficult for code to make sense of bending (it's probably a combination of both - there exists AI that is more effective than humans at solving CAPTCHA, which commonly includes bendy text).

TDatta · Accepted Answer · 2024-07-03 20:25:58Z

Answering the direct question – can ML models learn how to schematize their inputs and do they use that information to classify specific instantiations of 'categories' – I think that the answer is yes. A fully accurate answer would require a high degree of accuracy in the field of mechanistic interpretability (trying to understand what is going on internally in these large models), and this field is still growing and has proven to be incredibly challenging. However, some examples lean heavily toward yes.

Look, for example, at this article on circuits in image recognition models: https://distill.pub/2020/circuits/zoom-in/. As I have pasted below from their third example, the model they were investigating created a 'pose invariant dog head circuit', that activates when it sees dogs.

As an aside, I think that Kant intends (in your bolded quote) to imply that his inquiry will unveil the application of schematism to our understanding in order to acquire concepts – he discusses this system at length in the First Critique.

Groovy · Accepted Answer · 2024-07-03 05:34:16Z

-1

Yeah, Kant talked about how people can think of things they've never seen, like a perfect circle. This is called "productive imagination."

AI can do some similar things:

Make new text and pictures
Mix ideas in creative ways
Turn vague ideas into specific outputs

But AI doesn't think like humans:

It doesn't truly understand things
It uses patterns, not real thinking

AI writing often feels like a dream because it's based on guessing what words might fit, not on real meaning. It might be good at reading normal text but bad at reading wavy text because it's trained differently for each task. AI pictures and dreams can both be weird and unreal. But calling AI "imagination" might be going too far. It is good at spotting and copying patterns, but it doesn't really understand or intend things like human imagination does.

So... To answer your question.. While we haven't fully realized Kant's "art" of productive imagination in AI, we've created tools that mimic parts of it in intriguing ways, I think

answered 2 days ago

Groovy

1,9661 silver badge11 bronze badges

what is the difference between a word that fits and a word that is meaningful?
– andrós
Commented 2 days ago
@andrós I guess a word that fits is statistically likely and grammatically correct in a given context, while a meaningful word is intentionally chosen to convey specific ideas, nuances, and deeper relationships. And it shows a true understanding of the concepts.
– Groovy
Commented 2 days ago
so lacking intention, yeah i kinda get that. in context, we might call that 'expression'
– andrós
Commented 2 days ago
I didn't downvote this answer, I don't have anything to criticize about it now anyway, but I do want to bring up a strange thing about Gemini not reading the smiley-face calligram: I also learned recently that these entities(?) can read Captchas, so now they check more for how the cursor moves to the test button, like if it moves in too-straight of a line, they count that as a mark that it might be a bot trying to do the Captcha. But so why can these things read the warped text of a Captcha, but not the smiley-face? Is it just Gemini, though, or did I use the wrong prompt?
– Kristian Berry
Commented 2 days ago

Add a comment |

Stack Exchange Network

Have the inventors of LLMs/image-generators/w/e fulfilled Kant's assertion about the "art" of the productive imagination?

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged
kant
artificial-intelligence
dreams
imagination
.

Hot Network Questions

Have the inventors of LLMs/image-generators/w/e fulfilled Kant's assertion about the "art" of the productive imagination?

3 Answers 3

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged kantartificial-intelligencedreamsimagination.

Related

Hot Network Questions

Not the answer you're looking for? Browse other questions tagged
kant
artificial-intelligence
dreams
imagination
.