Dwarkesh published a short follow-up to his previous interview with Richard Sutton that I commented extensively last week.

I think Dwarkesh’s point on LLM may not be “the” right path forever, but it is definitely a path that we can keep exploring for a while, and maybe even take us to AGI or ASI. This doesn’t mean Sutton is wrong, but it does mean that we shouldn’t just dismiss LLM now and solely focus on RL.

The analogy Dwarkesh made is apt:

In a talk a few months ago, Ilya compared pretraining data to fossil fuels. This analogy has remarkable reach. Just because fossil fuels are not renewable does not mean that our civilization ended up on a dead-end track by using them. You simply couldn’t have transitioned from the water wheels in 1800 to solar panels and fusion power plants. We had to use this cheap, convenient, plentiful intermediary.

AlphaGo (which was conditioned on human games) and AlphaZero (which was bootstrapped from scratch) were both superhuman Go players. AlphaZero was better.

Will we (or the first AGIs) eventually come up with a general learning technique that requires no initialization of knowledge - that just bootstraps itself from the very start? And will it outperform the very best AIs that have been trained to that date? Probably yes.

But does this mean that imitation learning must not play any role whatsoever in developing the first AGI, or even the first ASI? No. AlphaGo was still superhuman, despite being initially shepherded by human player data. The human data isn’t necessarily actively detrimental - at enough scale it just isn’t significantly helpful.

The accumulation of knowledge over tens of thousands of years has clearly been essential to humanity’s success. In any field of knowledge, thousands (and likely millions) of previous people were involved in building up our understanding and passing it on to the next generation. We didn’t invent the language we speak, nor the legal system we use, nor even most of the knowledge relevant to the technologies in our phones. This process is more analogous to imitation learning than to RL from scratch.