ITP-JN

I kept thinking about that story from class about the people who tried to recreate Picasso's painting and were terrible at it until someone flipped the image upside down and then they managed to do a better job copying it. I've seen these techniques used in drawing classes, and it makes sense that it works. It kind of forces you to focus on the shape, its angles and curves, how light falls on it, its texture, rather than looking at it as an object, as an idea of a thing, something that is just an instance of a class almost (or actually).

With this in mind, and as I was watching the interview videos of Dr Ian Mcgilchrist, It made me think about the way ML models work. Could we say that in a way, the models (at least the image generation ones) work similarly to the left brain? Or are they more of a combination like us? Like they get this signified kind of representation of a thing that is in itself represented by a set of characters — making the outcome at least twice removed from "reality". But our perception of the same thing is perhaps not that different?

‍

The general breakdown of the left and right brain echoed with me — I often find it hard, or not hard, but reductive, to put my feelings into words. When I talk to people that are close to me I usually look them in the eye and pay attention to their body language, and use my own too as a way to convey meaning. I feel like this is just as big a part of communicating as using words. In my opinion, this right-brained intuitiveness and specificity is absolutely essential to have meaningful relationships, and it's becoming increasingly harder to do the more we move into online communication that lacks this kind of affordance. It is interesting to see attempts being made to allow this in the digital realm, or even in the robotics field, which often touches on the uncanny-valley which is a whole subject which I will leave for a different time but it's worth mentioning. I also wonder how the future might change our body language, instead, or in combination with incorporating it to how we communicate online. Like, there are certain movements you can do with a cursor that we can easily understand to have a correlation to a feeling and so on but then again, not sure how it could cross the line between any cursor to a specific one.

‍

Reading some pages from The Language Instinct made me think there's something innately incomplete about language, in that that we use these agreed-upon rules of what things mean, but its ultimately trying to describe things that are abstract and we kind of have to trust each other that these words mean the same thing to others, while knowing that it's never 100% correct. It's just the best system we have at the moment. This reminds me of Gödel's incompleteness theorem that, in a way, proved our mathematical system to be... wrong, or incomplete. In simple terms, Gödel showed that no matter how well you design a formal mathematical system, it will either miss some truths or fail to prove it won't lead to contradictions. This challenged the previously held belief that mathematics could be made entirely complete and free of ambiguity. A lingual example would be the following paradoxical sentence: "This statement is false" — if it is true then it is false, and vice versa. So any and all discrete combinatorial systems, even coding languages, have limitations on what they can achieve. It's just the best we've got at the moment.

‍

Making

github repo | live site

On a different note, I tried to think of cases of blended language — portmanteau words — words that were created by blending the sounds and combining the meanings of two others, like "motel", "brunch", "smog", "infomercial", "ginormous", and even "brangelina"!

I was also very curious about embeddings, and VAEs as was talked about in class. I tried to make something more elaborate using a model that could supposedly output an embedding of an image and a text (and a few other things), but I couldn't get it to work. I stripped it down to the bare minimum and I managed to get a number — a single floating point number that supposedly corresponds to a string input. But I wasn't able to figure out what to do with it to achieve my goal, and I felt like I didn't really understand how to move forward.

So I tried a different approach: I used a combination of an LLM (like we did previously) and a textual embeddings model — the same one we saw in class so I could figure out how to connect all the parts. I did try other LLMs but the prompt engineering was not happening...

In this version two inputs are taken in and sent to the LLM which blends them into a single new word (a portmanteau if you will). Then this word, as well as the two inputs, are sent to an embeddings model that outputs these terribly long arrays that should point to their position in the model's latent space (unless I got it very very wrong?). Then some custom functions position the words relative to each other. I asked co-pilot for help with the positioning because it felt very complex. It suggested a bunch of visualization libraries that felt like an overkill and didn't work. After some direction and looking at the p5 example as a reference I got to something that sort of works.

Some results are better than others but it's interesting to see things like how the LLM knew not to break "graph", and to use "i" instead of "y" in the blended word. So many things about language are intuitive, both to us and to LLMs... Even though languages are mostly governed by rules, most of us don't really remember or know them, rather we intuitively understand them. Like riding a bike.

‍

‍

Thanks for reading!

Research & Fine tuning a LLM

The world needs some help?

Spatial

Distillation

Where to insert yourself

Addiction & Social Media

Social

Time

Implicit/Explicit

Inner Dimensions

What were you thinking?

Implicit/Explicit

Making