Adrian
3 min readNov 16, 2021

There’s No Free Lunch in Machine Learning: 10 Quotes

Data Science Series

"Consider any of the heuristics that people have come up with for supervised learning: avoid overfitting, prefer simpler to more complex models, boost your algorithm, bag it, etc. The no free lunch theorems say that all such heuristics fail as often (appropriately weighted) as they succeed. This is true despite formal arguments some have offered trying to prove the validity of some of these heuristics." (David H Wolpert, "The lack of a priori distinctions between learning algorithms", Neural Computation Vol. 8(7), 1996)

"[...] an algorithm’s average performance is determined by how 'aligned' it is with the underlying probability distribution over optimization problems on which it is run." (David H Wolpert & William G Macready, "No free lunch theorems for optimization", IEEE Transactions on Evolutionary Computation 1 (1), 1997)

"The No Free Lunch (NFL) theorem […] tells us that without any structural assumptions on an optimization problem, no algorithm can perform better on average than blind search." (Yu-Chi Ho, "The no free lunch theorem and the human-machine interface", IEEE Control Systems Magazine, 1999)

"[...] a general-purpose universal optimization strategy is theoretically impossible, and the only way one strategy can outperform another is if it is specialized to the specific problem under consideration." (Yu-Chi Ho & David L Pepyne, "Simple explanation of the no-free-lunch theorem and its implications", Journal of Optimization Theory and Applications 115, 2002)

"No Free Lunch dictates that any algorithm may be deceived, a difficulty to which the inference algorithm is not immune." (Christopher K Monson, "No Free Lunch, Bayesian Inference, and Utility: A Decision-Theoretic Approach to Optimization", [thesis] 2006)

"A priori, it is clear that no method will always be the best [...]. However, it is reasonable to argue that each method will have a set of functions, a type of data, and a range of sample sizes for which it is optimal – a sort of catchment region for each procedure. Ideally, one could partition a space of regression problems into catchment regions, depending on which methods were under consideration, and determine which catchment region seemed most appropriate for each method. This ideal solution would amount to a selection principle for nonparametric methods. Unfortunately, it is unclear how to do this, not least because the catchment regions are unknown." (Bertrand Clarke et al, "Principles and Theory for Data Mining and Machine Learning", 2009)

"The problem of comparing classifiers is not at all an easy task. There is no single classifier that works best on all given problems, phenomenon related to the 'No-free-lunch' metaphor, i.e., each classifier (’restaurant’) provides a specific technique associated with the corresponding costs (’menu’ and ’price’ for it). It is hence up to us, using the information and knowledge at hand, to find the optimal trade-off." (Florin Gorunescu, "Data Mining Concepts, Models and Techniques", 2011)

"As a consequence of the no free lunch theorem, we need to develop many different types of models, to cover the wide variety of data that occurs in the real world. And for each model, there may be many different algorithms we can use to train the model, which make different speed-accuracy-complexity tradeoffs." (Kevin P Murphy, "Machine Learning: A Probabilistic Perspective", 2012)

"The idea of feature learning is to automate the process of finding a good representation of the input space. As mentioned before, the No-Free-Lunch theorem tells us that we must incorporate some prior knowledge on the data distribution in order to build a good feature representation." (Shai Shalev-Shwartz & Shai Ben-David, "Understanding Machine Learning: From Theory to Algorithms", 2014)

"Roughly stated, the No Free Lunch theorem states that in the lack of prior knowledge (i.e. inductive bias) on average all predictive algorithms that search for the minimum classification error (or extremum over any risk metric) have identical performance according to any measure." (N D Lewis, "Deep Learning Made Easy with R: A Gentle Introduction for Data Science", 2016)

More quotes on “There’s No Free Lunch” at sql-troubles.blogspot.com.

Adrian

IT professional/blogger with more than 24 years experience in IT - Software Engineering, BI & Analytics, Data, Project, Quality, Database & Knowledge Management