## Statistics’ Mis(usage) in Data Science: 10 Quotes

*“A witty statesman said, you might prove anything by figures.” (Thomas Carlyle, Chartism, 1840)*

*“Great discoveries which give a new direction to currents of thoughts and research are not, as a rule, gained by the accumulation of vast quantities of figures and statistics. These are apt to stifle and asphyxiate and they usually follow rather than precede discovery. The great discoveries are due to the eruption of genius into a closely related field, and the transfer of the precious knowledge there found to his own domain.” (Theobald Smith, Boston Medical and Surgical Journal Volume 172, 1915)*

*“Of itself an arithmetic average is more likely to conceal than to disclose important facts; it is the nature of an abbreviation, and is often an excuse for laziness.” (Arthur L Bowley, “The Nature and Purpose of the Measurement of Social Phenomena”, 1915)*

*“The preliminary examination of most data is facilitated by the use of diagrams. Diagrams prove nothing, but bring outstanding features readily to the eye; they are therefore no substitutes for such critical tests as may be applied to the data, but are valuable in suggesting such tests, and in explaining the conclusions founded upon them.” (Sir Ronald A Fisher, “Statistical Methods for Research Workers”, 1925)*

*“The enthusiastic use of statistics to prove one side of a case is not open to criticism providing the work is honestly and accurately done, and providing the conclusions are not broader than indicated by the data. This type of work must not be confused with the unfair and dishonest use of both accurate and inaccurate data, which too commonly occurs in business. Dishonest statistical work usually takes the form of: (1) deliberate misinterpretation of data; (2) intentional making of overestimates or underestimates; and (3) biasing results by using partial data, making biased surveys, or using wrong statistical methods.” (John R Riggleman & Ira N Frisbee, “Business Statistics”, 1951)*

*“The statistics themselves prove nothing; nor are they at any time a substitute for logical thinking. There are […] many simple but not always obvious snags in the data to contend with. Variations in even the simplest of figures may conceal a compound of influences which have to be taken into account before any conclusions are drawn from the data.” (Alfred R Ilersic, “Statistics”, 1959)*

*“The manipulation of statistical formulas is no substitute for knowing what one is doing.” (Hubert M Blalock Jr., “Social Statistics” 2nd Ed., 1972)*

*“Statistics is a very powerful and persuasive mathematical tool. People put a lot of faith in printed numbers. It seems when a situation is described by assigning it a numerical value, the validity of the report increases in the mind of the viewer. It is the statistician’s obligation to be aware that data in the eyes of the uninformed or poor data in the eyes of the naive viewer can be as deceptive as any falsehoods.” (Theoni Pappas, “More Joy of Mathematics: Exploring mathematical insights & concepts”, 1991)*

*“Averages, ranges, and histograms all obscure the time-order for the data. If the time-order for the data shows some sort of definite pattern, then the obscuring of this pattern by the use of averages, ranges, or histograms can mislead the user. Since all data occur in time, virtually all data will have a time-order. In some cases this time-order is the essential context which must be preserved in the presentation.” (Donald J Wheeler,” Understanding Variation: The Key to Managing Chaos” 2nd Ed., 2000)*

*“Even properly done statistics can’t be trusted. The plethora of available statistical techniques and analyses grants researchers an enormous amount of freedom when analyzing their data, and it is trivially easy to ‘torture the data until it confesses’.” (Alex Reinhart, “Statistics Done Wrong: The Woefully Complete Guide”, 2015)*

*For more quotes on “Statistics’ Mis(usage)” see **http://sql-troubles.blogspot.com**.*