Nobel prize in physics 2024

23
Oct
2024

Bernhard Mehlig

This year’s Nobel prize in physics was awarded to John Hopfield and Geoffrey Hinton for `foundational discoveries and inventions that enable machine learning with artificial neural networks´(press release of the Royal Swedish Academy of Sciences, October 8, 2024).

Machine learning algorithms with artificial neural networks excel at image analysis, locating and classifying objects in digital images with unprecedented accuracy. This became widely known after a neural network with many layers of neurons – a deep neural network¹ – won the ImageNet competition in 2012 Although one may wonder exactly what is meant by `Computers now better than humans at recognising and sorting images´², we know that machine learning with neural networks is here to stay, and it will have a tremendous effect on our lives, for better or worse. So-called generative AI algorithms generate images or videos that are very hard to recognize as machine made. The latest large language models (the algorithms behind e.g. Chat GPT) generate text that looks like it may have been written by humans.

The work awarded the Nobel prize in physics in 2024 was done in the 80ies, but theoretical research on neural networks began at least 40 years before that. What were Hopfield’s and Hinton’s insights that are so significant today? How is their work connected to the neural-networks algorithms used today? And why did it take so long for the `deep-learning revolution´³to take place, almost 30 years after Hopfield’s and Hinton’s prize-winning work?

John Hopfield⁴ recognized that artificial neural networks can be programmed to memorize patterns, such as hand-written digits. The digits are stored in the network by assigning connection strengths between the neurons, using a rule not unlike the one discussed Hebb’s book `The Organization of Behavior: A Neuropsychological Theory´, published in 1949. When presented with a distorted digit, by noise for example, the network associates the distorted digit with the correct memorized one (Figure 1). John Hopfield’s idea is significant because it shows that artificial neural networks can perform useful tasks besides representing Boolean functions⁵. A much more important point, however, is that Hopfield showed how the network finds the correct pattern: the memorized patterns are related to local minima in a high-dimensional landscape of hills and valleys (called the energy landscape). The idea is that the machine finds the correct minimum by moving downhill.

^{Figure 1. (schematic) A Hopfield network takes several steps to recognize the digit `0´, starting from a distorted one. After Figure 2.2 in [B. Mehlig, Machine Learning with Neural Networks. An Introduction for Scientists and Engineers, Cambridge University Press (2021)].}

However, the algorithm may get stuck in the wrong minimum if it can’t move uphill. Hinton and Sejnowski⁶ solved this problem a year later by adding a little bit of noise, allowing the algorithm to sometimes move up instead of down, making it possible to escape wrong minima. The resulting network model is analogous to spin glasses, models for disordered magnetic systems studied in statistical physics. The connection strengths between neurons correspond to interactions between local micro magnets (spins).

During the last 30 years, physicists have made tremendous progress in understanding the physical properties of spin glasses. Three years ago, Giorgio Parisi received the Nobel prize in physics for his contributions to spin-glass physics. Physicists have used these and earlier results to understand how neural networks learn, leading to a detailed understanding how the associative memory works (and when it fails). Related theories have been used recently to understand how deep neural networks learn.

Noisy Hopfield networks are also called Boltzmann machines, recognizing that the work of Ludwig Boltzmann (1844-1906) established the foundations of statistical physics, a research field that describes the macroscopic behavior of systems comprising a larger number of interacting units (such as spins or neurons). In other words, the 2024 Nobel prize in physics recognizes work in statistical physics, and it underscores how purely theoretical work may become crucial to future applications of immense practical importance.

A few years later, Geoffrey Hinton suggested to use Boltzmann machines as generative models⁷. Instead of specifying the neural connections using Hebb’s rule, one adjusts the connection strengths iteratively in order to fit a model distribution to a data distribution. This is called training. The model distribution is actually the Boltzmann distribution used in statistical physics to describe how an energy landscape of hills and valleys is explored at a given temperature. After training, one samples from the model distribution by Monte-Carlo sampling. In this way, the Boltzmann machine can generate numerical digits that look like real ones, but that have never been written by a human before. In other words, the Boltzmann machine is the first generative AI model.

It turns out that hidden neurons are needed for Boltzmann machines to work well⁸, neurons that are neither input nor output units. Hinton showed how such machines with one or more layers of hidden neurons (restricted Boltzmann machines), could be efficiently trained to become the first generative AI models. More importantly perhaps, he explained why hidden neurons are necessary in the first place, and how they help the network to learn⁷, setting the foundations for the deep networks of today.

The breakthrough, in other words, was to use physics models consisting of many interacting units (a.k.a neurons) for machine learning, and to use the concepts of statistical physics (energy landscape, Boltzmann distribution) to understand how these machines work. Many of the fundamental insights (the role of noise in training, the significance of hidden neurons) are highly relevant for the algorithms of today, even though it is now known that the energy landscape of deep convolutional networks has more saddle points than minima.

Shortly after the seminal work on Hopfield networks and Boltzmann machines, it was understood how to train deep networks for image recognition⁹. But at that time we did not have data sets large and accurate enough to train machines for advanced image-analysis tasks. Now we have that, but I want to stress that the network layout and the training techniques are essentially the same as in the 80ies.

The work of Hopfield and Hinton is so important not only because it has led to the deep-learning revolution, but because they analyzed how artificial neural networks can learn. In order to more reliably assess the risks these machine-learning methods pose, as well as the opportunities they present, we need to understand much better how deep neural networks and large language models work. This is our task as scientists, and this is what I emphasize when I teach machine learning with neural networks to my students. Therefore I am delighted by the decision of the Nobel committee to award the physics prize for 2024 to John Hopfield and Geoffrey Hinton.

References

1. A. Krizhevsky, I. Sutskever & G. E. Hinton, ImageNet Classification with Deep Convolutional Neural Networks, NIPS (2021).

2. A. Hern, Computers now better than humans at recognising and sorting images, The Guardian (2015).

3. T. J. Sejnowski, The deep learning revolution, MIT Press (2018).

4. J. J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, PNAS 79, 2554 (1982).

5. M. Minsky & S. A. Papert, An introduction to computational geometry (1969).

6. G. E. Hinton & T. J. Sejnowski, Optimal perceptual inference, Proc. IEEE Conf. on Computer Vision and Pattern Recognition (1983).

7. G. E. Hinton & T. J. Sejnowski, Learning and relearning in Boltzmann machines, in: Parallel
distributed processing: explorations in the microstructure of cognition, MIT Press (1986).

8. P. Smolensky, Information Processing in dynamical systems: foundations of harmony theory, in: Parallel distributed processing: explorations in the microstructure of cognition, MIT Press (1986).

9. D. E. Rumelhart, G. E. Hinton & R. J. Williams, Learning representations by back-propagating errors, in: Parallel distributed processing: explorations in the microstructure of cognition, MIT Press (1986).

Title: Machine Learning with Neural Networks

ISBN: 9781108494939

Author: Bernhard Mehlig

About The Author

Bernhard Mehlig

Bernhard Mehlig is Professor in Physics at the University of Gothenburg, Sweden. His research is focused on statistical physics of complex systems, and he has published extensively...

View profile >