Abstract
Neural networks trained on large datasets by minimizing a loss have become the state-of-the-art approach for resolving data science problems, particularly in computer vision, image processing and natural language processing. In spite of their striking results, our theoretical understanding about how neural networks operate is limited. In particular, what are the extrapolation capabilities of trained neural networks if any? In this paper we discuss a theorem of Domingos stating that “every machine learned by continuous gradient descent is approximately a kernel machine”. According to Domingos, this fact leads to conclude that all machines trained on data are mere kernel machines. We first extend Domingo’s result in the discrete case and to networks with vector-valued output. We then study its relevance and significance on simple examples. We find that in simple cases, the “neural tangent kernel” arising in Domingos’ theorem does provide understanding of the networks’ predictions. When the task given to the network grows in complexity, the interpolation capability of the network can be effectively explained by Domingos’ theorem, and no extrapolation capability of the network beyond its learning domain is found, even when the network’s structure would allow for it. We illustrate this fact on a classic perception theory problem: recovering a shape from its boundary.
| Original language | English |
|---|---|
| Article number | 79 |
| Journal | Revista de la Real Academia de Ciencias Exactas, Fisicas y Naturales - Serie A: Matematicas |
| Volume | 117 |
| Issue number | 2 |
| Early online date | 2 Mar 2023 |
| DOIs | |
| Publication status | Published - Apr 2023 |
| Externally published | Yes |
Bibliographical note
Publisher Copyright:© 2023, The Author(s) under exclusive licence to The Royal Academy of Sciences, Madrid.
Keywords
- Gradient descent
- Kernel machine
- Machine learning
- Neural networks
- Neural tangent kernel
- Planar topology
Fingerprint
Dive into the research topics of 'Can neural networks extrapolate? Discussion of a theorem by Pedro Domingos'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver