Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

DNNs do not have special generalization powers. If anything, their generalization is likely weaker than more mathematically principled techniques like the SVM.

If you try to train a DNN to solve a classical ML problem like the "Wine Quality" dataset from the UCI Machine Learning repo [0], you will get abysmal results and overfitting.

The "magic" of LLMs comes from the training paradigm. Because the optimization is word prediction, you effectively have a data sample size equal to the number of words in the corpus - an inconceivably vast number. Because you are training against a vast dataset, you can use a proportionally immense model (e.g. 400B parameters) without overfitting. This vast (but justified) model complexity is what creates the amazing abilities of GPT/etc.

What wasn't obvious 10 years ago was the principle of "reusability" - the idea that the vastly complex model you trained using the LLM paradigm would have any practical value. Why is it useful to build an immensely sophisticated word prediction machine, who cares about predicting words? The reason is that all those concepts you learned from word-prediction can be reused for related NLP tasks.

[0] https://archive.ics.uci.edu/dataset/186/wine+quality



You may want to look at this. Neural network models with enough capacity to memorize random labels are still capable of generalizing well when fed actual data

Zhang et al (2021) 'Understanding deep learning (still) requires rethinking generalization'

https://dl.acm.org/doi/10.1145/3446776




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: