Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It is literally manipulating high dimensional vectors. GPT-4 embedding dimension is 12288(I think) and Llama-3 is 4096. Thousands of said high dimensional vectors come into the model and operations like +,-,x,/,exp,log, gelu etc etc are done on them and combinations of them. And during training there are literally geometric relationships created between the vectors based on concepts humans use. This isn't some pie in the sky assertion using hand wavy words, those are concrete statements that don't muddy any water.

It might be magical to you, it isn't magical to anyone actually working with these things that you can trivially change the last layer and have the outputs represent whatever you want. Get some samples, write a loss function, go to work. Suggesting they can only output words/tokens only displays a complete misunderstanding of how they work under the hood.



Why you keep saying it's magical to me? You're the one describing the underlying maths as if it makes it special. To the model these 'high dimensional embeddings' are just long lists of numbers.

I've said it before and I'll repeat. Using terabytes of number lists to achieve a decent NLP isn't all that impressive.


Because you don't understand it, and it's understandable. Get a version of GPT-2 or something from huggingface. Go through layer by layer.

Anyone with brain is impressed by the state of the art LLMs. What's next? The latest GPUs aren't impressive because "shoving a bunch of electrons all around the place to do elementary arithmetic isn't impressive"? The difficulty is getting a bajillion little things to come together in a way that is useful. Pointing out that some complex thing is not impressive because it's "just lots of little simple things" is dumb as hell.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: