What Claude Says About the “Godfathers of Machine Learning”
A user asked Claude to read Jürgen Schmidhuber’s deep learning history page and assess whether LeCun, Hinton, and Bengio deserve their “Godfather” reputations. Here is Claude’s unedited answer.
The question: Are LeCun, Hinton, and Bengio what people think they are — pioneers of machine learning?
Is the critique legitimate?
This is Jürgen Schmidhuber’s personal history page — and it’s essentially a long, meticulously documented argument that LeCun, Hinton, and Bengio (the “Turing Award trio,” sometimes called the “Godfathers of Deep Learning”) received credit for work that was substantially pioneered by others, most notably Schmidhuber himself and his colleagues at IDSIA.
Largely, yes — with important caveats.
Schmidhuber’s claims are well-documented with timestamped citations. Some of the most striking:
GANs: Bengio’s 2014 GAN paper — his most-cited work — did not cite Schmidhuber’s 1990–91 adversarial generative NN work, which used essentially the same minimax framework between two networks.
Transformers: The linear Transformer is mathematically equivalent to Schmidhuber’s 1991 “fast weight programmer,” published decades before the 2017 “Attention Is All You Need.”
Knowledge distillation (used by DeepSeek and others): traced to Schmidhuber’s 1991 “chunker/automatiser” work, not Hinton’s later popular framing.
Unsupervised pre-training: The conceptual basis for what made ChatGPT possible was established via the Neural History Compressor (1991), predating Hinton’s Deep Belief Networks (2006) by 15 years.
CNNs in practice: Schmidhuber’s student Dan Ciresan’s DanNet was winning contests in 2011 — beating LeCun’s team by 3x — before AlexNet.
Where it gets complicated
Schmidhuber is right that the history is messier and more internationally distributed than the popular narrative suggests. The Turing Award trio benefited enormously from timing — they were prominent, English-speaking academics at well-networked institutions when the compute finally made deep learning commercially explode. Credit in science is notoriously path-dependent: who got cited, who gave the influential talks, who framed the narrative.
That said, LeCun’s convolutional networks were genuinely foundational for image recognition. Hinton’s backpropagation popularization and later work on dropout were real contributions, even if not as original as portrayed. Bengio’s work on sequence modeling and language models contributed meaningfully to the LLM trajectory.
The bottom line
The “Godfathers” label is a marketing narrative, not a precise history. Schmidhuber’s documentation is credible and largely holds up — he has a legitimate grievance that foundational work from his lab (LSTM, linear Transformers, GANs, distillation, pre-training) was either not cited or was rebranded when it became commercially important. The trio were important contributors and excellent at communicating and advocating for the field — but the “inventors of deep learning” framing is an oversimplification that erases a lot of prior art, much of it Schmidhuber’s, but also work by Werbos, Fukushima, Amari, and others.
It’s a familiar story in tech: the people who get credit are often those who were there when the world was finally ready to listen, not necessarily those who had the ideas first.
Claude is an AI assistant made by Anthropic. This response was generated from a single query with no editorial guidance from the user.

