Initiale Ziffernverteilung nach Benford

Das Benfordsche Gesetz ist ein sehr merkwürdige Erscheinung natürlicher Grössen. Die erste Ziffer einer Grösse ist nicht zu einem Neuntel auf 1-9 verteilt, sondern die kleinen Ziffern sind viel häufiger.

Überprüfung mit realen Daten

Ich habe das mit den Dateien auf meiner Festplatte ausprobiert, es stimmt wirklich. Die Abbildung zeigt die Häufigkeit einer jeweiligen Ziffer für die Grösse aller Dateien (gemessen in Byte). Die Punkte zeigen ersten und zweiten Stelle der Grösse der Dateien. Die Linie zeigt die Theorie, das Benfordsche Gesetz. Während die zweite Ziffer (fast) gleichverteilt ist, folgt die erste Ziffer ziemlich genau der Verteilung nach Benford. Anwendung: Überprüfen der Glaubwürdigkeit von Angaben (Steuererklärung z.B.). Wenn die Zahlen gefälscht sind, folgen sie nicht dem Benfordschem Gesetz.

Benford's Law

Benford's Law (which was apparently first stated by Simon Newcomb in 1881) states that if you randomly select a number from a table of physical constants or statistical data, the probability that the first digit will be a "1" is about 0.301, rather than 0.1 as we might expect if all digits were equally likely. In general, the "law" says that the probability of the first digit being a "d" is

			  log_10( 1+ (1/d) )

This implies that a number in a table of physical constants is more likely to begin with a smaller digit than a larger digit. It was published by Newcomb in a paper entitled "Note on the Frequency of Use of the Different Digits in Natural Numbers", which appeared in The American Journal of Mathematics (1881) 4, 39-40. It was re-discovered by Benford in 1938, and he published an article called "The Law of Anomalous Numbers" in Proc. Amer. Phil. Soc 78, pp 551-72. Several other references can be found in Hill's article.

Just for fun, I tabulated the first-digits of the physical constants listed in Table 2.3 of Abramowitz and Stegun's "Handbook of Mathematical Functions":

		  1  2
		  1  2
		  1  2*
		  1  2
		  1  2   *
		  1  2     4  5
		  1  2     4* 5
		  1  2     4  5* 6        9
		  1  2     4  5  6   *    9*
		  1  2  3  4  5  6  7  8  9

The "*" symbols indicate the approximate predicted number of occurrances according to Benford's Law. Aside from the conspicuous deficiency of 3's, that's not a bad match for just 44 data points.

Although there have been many lengthy "explanations" for Benford's Law, it seems to me this is a good candidate for a "Proof Without Words":



[Okay, not entirely without words:] The underlying premise is simply that physical constants, expressed in the base 10 and more or less arbitrary units, will be somewhat evenly distributed on a logarithmic scale. This is confirmed by the fact that the exponents on these constants are fairly uniformly distributed, at least over several "decades". As a result, the probability of the leading digit being "d" clearly approaches

		    log(d+1) - log(d)
		    -----------------  =  log(1+(1/d))
		    log(10) - log(1)

where "log" signifies the common logarithm (base 10). Of course, we COULD have chosen units for our physical constants such that the leading digits were all 9's (for example), but evidently we have a natural tendancy to choose units so that our numbers are evenly distributed by order of magnitude, rather than absolute value. This may be related to our basic impressions of hearing and sight (not to mention earthquakes), where our intuitive senses of loudness and brightness are logarithmic.

(siehe auch "Der Spiegel", 47/1998, vom 16.11.1998)