Showing above IB standard level. The Benford´s law

Showing
how Benford´s law applies to real-life
data

                                                              
Introduction

We have been talking in
TOK whether mathematics is discovered or
invented and I have been always thinking that it was invented, but when I heard
that mathematics can explain things in the real
world such as magnetism and waves I immediately wanted to see with my
own eyes if mathematics really applies to real world. And even if we can
´´see´´ mathematics in nature, does it actually explain anything or can we use
it somehow? Finding mathematical explanations in nature is great, but if we
could use that information in real life then that would be glorious.
Studying Benford’s law is interesting as
the law applies to so many real-life
lists of numbers such as house
prices, population numbers, and death
rates. To do that I will investigate 3 sets of data, calculate the incidence of
the leading digits and see how it matches the distribution of leading digits
obtained by the law. The aim of this exploration to me is to see how
mathematics can be seen in nature. For sources I have many electronic sources
as literary about the topic of this exploration was not available in my library
in my country and one of the main sources used is Wikipedia as it contained the
most detailed information about the law and many other sources recommended it
for more detailed information. I also have one university-level
essay I used which is more detailed than Wikipedia, but the mathematics in it
is above IB standard level.

                                                             
The Benford´s law

It was 1881 when Simon
Newcomb was reading through logarithm
tables when he observed that the earlier pages were more common than the other
pages. In 1983 physicist Frank Benford tested it on real data that included numbers
taken from newspapers, population sizes,
air pressure measurements and many else,
introducing the law in a more detailed way
to other mathematicians1. Benford´s law is a
mathematical theory which determines the distribution of leading digits. Leading
digit is the first digit of a number so
for example, a leading digit of 1099 is 1
and 1,2,3,4,5,6,7,8 and 9 are all possible leading digits. When observing a
list of numbers, we often assume that the leading digits would be evenly
distributed and number 1 occurs as often as number nine, but Benford´s law
states that actually number one is the first digit with 30.1% probability which
is a lot greater than the first guess, 11.1%2. Benford´s law makes
predictions for distribution of other digits too and so Benford’s law can be used to explain the
distribution of leading digits in sets of numbers. This distribution of first
digits can be seen from a bar graph below:

Graph 1. Distribution of first digits

   3 This graph is not mine                                                                         
        Each bar represents a digit, and the
height of the bar is the percentage of numbers that start with that digit. The
graph shows how number one has the greatest probability of appearing as a
leading digit and then the probability
gradually decreases as the numbers get
bigger until number nine that has the smallest probability to appear as a
leading digit.

The
probability of the first digit (d) in a set
of numbers that satisfy Benford´law can also be represented by the formula:

Formula for
probability of the first digit (d) where d ? (1,…,9)

4

And it can be simplified as?

Here we have base 10, but Benford´s law also works with
any other base when b ? 1.5

Now, we can use this formula to calculate the probability of a leading
digit 2 (or any other leading digit):

P(2) = log10 (1+
12)

        =log10(1.5)

        =0.17609…

        =17.6% (rounded)

This way we can calculate the
distribution of leading digits and show them in a table.

Table 1. Distribution of
leading digits calculated using the formula

This table shows the same
thing as graph 1., but in tabular form showing the same trend -the probability
gets smaller when the leading digit gets bigger.

Logarithmic scale

6 This
scale is not mine

One way to explain Benford´s
law is to look at the logarithmic scale.
If we take a number,  for example a number 5645, we can observe that log10(5645)=3,75.
On the scale value 0.75 lies between log104 (0.70) and log105
(0.78). So, number 5645 has a leading digit 5.7 Also, the distance between
each value gets shorter when you move along the scale. The width of each
section is proportional to log10(d+1)-log10(d)8.

Now, we can take this scale
to the next level and have a scale below where the colored area in the logarithmic scale shows the probability of each
leading digit (check which color represents which leading digit from the table below it:

9 This
scale is not mine

The table below shows the
leading digit, its probability, and the
color representing it.

10 This
table is not mine

Restrictions of the law

It should be noted that
Benford´s law does not apply to all sets of numbers and the law only works if
the values are distributed across multiple orders of magnitude, therefore the
law works the best with large sets of numbers11. Order of magnitude is a
measure of the size of a number and values distributed across multiple orders
of magnitude differ a lot from each other when compared. So, Benford´s law
would not work with for example heights of humans as the values are not
distributed across multiple orders of magnitude as all humans have a height
varying from zero to two meters12.

                                                                    
Analysis

Now, I can take a look at
data from online and see if the law works there and if it does then how precise
the distribution of leading digits is compared to table 1. Each of my set of
numbers will contain about 200 or fewer
values and I will calculate the number of incidences by copy-pasting all the values from the source one by one to Excel and
then use Excel to calculate the number of incidences. I will use data
containing about 200 values as Benford´s law works the best with a large set of numbers (100 and above) and the
bigger the data, the smaller the difference between the distribution of leading digits when comparing incidences obtained
with values from table 1. The raw data will be found at the end of this essay
in section Appendix.

First, I will show you
step-by-step how Excel was used. I learned these steps from a document that has
instructions about how to apply Benford´s law, a link to the dovument will be
found in footnote 13.13

1.    
Start with Excel
that has all the values you need.

2.    
On the cell (box)
next to the first number, perform following steps:

Type in =LEFT(

Copy the first number

Type in )

Press the Enter key

3.    
Now, the leading
digit for the first number will appear. Click it, hold the left mouse and drag
the cursor down until you reach the end of the list of numbers. Then release
the mouse.

4.    
Click A-Z button on the top of the
menu.

5.    
When the Sort Warning window appears, select
Expand the selection and click sort.

6.    
All the transactions now will be arranged by the
first digit.

7.    
Select the column containing all of the leading digit 1.
Click on data on the top menu and choose subtotal.

8.    
Choose Use Function in the subtotal window and click count. Then
click OK.

9.    
Now you will have
the total number of leading digit 1s.

10. 
Repeat steps 7. to
8. for other leading digits.

11. 
Make a table.

The incidence % for each
leading digit I calculated by dividing the number of incidences with total
incidences and multiplied by 100%.

Trial 1.

First, we are going to look
at populations of 200 different countries and see if the law works here and if
it does then how precisely do the probability
of the leading digits obey the law. I will take a look at all the values
in Excel and count how many times does each leading digit occur in the data and
then calculate the incidences as a percentage.

Table 2. Number of incidences
and the incidence as percentage, for each leading digit in data containing
populations of 200 countries

When the incidences as a percentage in this table are compared with the distribution
of leading digits in table 1 or observe from Graph 1. we can observe that the
incidence percentage of leading digit being 3 is exactly the same and almost
the same for leading digits 1,9 and 6. Overall the values obey the law almost
perfectly and the incidence does gradually decrease as the leading digit gets
bigger, the only error is between leading digit 7 and 8 where the incidence in
greater for leading digit 8 than 7. This is remarkable as we can see that the
mathematical law exists in the real world and I can say with no doubt now that
Benford´s law does apply to real sets of numbers.

Graph 2. The distribution
of leading digits for populations of countries

Each bar in this graph
represents a leading digit and the height the incidence percentage. The trendline
shows the trend of decreasing incidence when the leading digit gets bigger. The
trend is not as smooth here as in graph 1 obtained by Benford’s law and the reason for that can be the size of the data.
As Benford’s law works the best with large sets of numbers, the bigger the set of
numbers, the more the graph is like to graph 1.

Trial 2.

We can also look at 199
countries listed by their total area and see if Benford´s law works here and
how precisely it works if it works. My gut already tells me that it works, but
maybe not as precisely as in trial 1 for leading digit 1 as it is difficult to
imagine for me why it would occur so often here, but let see if I am right.

Table 3. Number of incidences
and the incidence as percentage, for each leading digit in data containing
total areas of 200 countries

As we compare the incidences
in this table with the values from graph 1 or the distribution of leading
digits from table 1. we observe that the values are really close to each other.
For example, the difference between first
digit being 5 in this table and in graph 1 is only 2,4% and the difference gets
even smaller when we calculate the difference for other values. For some reason, this data did not obey the law as
clearly as trial 1, but still, we can
clearly say that the law works here. Also, in graph 1 we saw a trend where the incidence gets gradually smaller when the
leading digit gets smaller, but unfortunately here we see the same trend for
only the first 4 leading digits. The total areas of countries existing today
are set my wars and history in general, so I think it is pretty remarkable that
the law works even here as I think it is not natural to think that mathematical
theory could match history.

Graph 3. The distribution
of leading digits for total areas of countries

Each bar in this graph
represents a leading digit and the height the incidence percentage. The
trendline shows the general trend of decreasing incidence when the leading
digit gets bigger, but there are bars higher than the bar of bigger
leading digit and for example, the bar of leading digit 9 is higher than bars
for leading digits 8, 7 and 5.

Trial 3.

We can then take a look at something in nature and it
could be linked to rivers, lakes, altitudes etc., but I have chosen to look at elevations of 166 countries as I found
enough data to examine them. I will have my elevations is meters, but the law
should work as well with other units. The original data contained elevations of
200 countries, but some of them had an elevation
of 0 meters and I will not include them in my investigation as Benford’s law does not consider 0 to be a
leading digit.

Table 4. Number of incidences and the incidence as
percentage, for each leading digit in data containing elevations of countries

From this table, we
can see that the incidences are similar compared to incidences obtained by Benford’s law, but definitely differ a lot more than in trials 1 and 2. I am not sure
why, but possible reasons could be the decreased size of the data and possible
human manipulation of the numbers as Benford’s
law works the best with numbers that are not changed by humans and are natural.

Graph 4. The distribution
of leading digits for evaluations of countries

Each bar in this graph
represents a leading digit and the height the incidence percentage. From
this graph, we can see the trend of decreasing
incidence as the leading digit gets bigger, but the trend is scattered and a
lot less precise than in Benford’s graph
(graph 1) or in graph 2. For some reason,
there are bars higher than the bar of bigger leading digit and an example of
this would be bars for leading digit 4
and 5.

                                                          
Applications

One of the reasons the Benford’s law is so amazing is the variety of
applications. The most famous one would be its uses in fraud detection. If a
person is to make up numbers to cheat for example the government or the tax
system, the person is likely to aim to distribute the numbers uniformly, but as
Benford’s law shows this should not be
naturally possible in large sets of numbers and so can be used in fraud
detection when the values are compared with distribution according to Benford’s law.

The law can also be used when
checking the reliability of election
results and was used to catch a fraud in
Iranian election 2009. But it should be noted that some experts don´t support the reliability of the law in case of
elections.

There are also other cases
where Benford’s law has been used to
catch a fraud. For example some years after Greece joined the eurozone, their
macroeconomic data they used to get into the eurozone was shown to be false
using the law.14

                                                       
   Conclusion

Overal the data I tested to
see if the Benford’s law works matched
the distribution of leading digits obtained from the use of the formula (values in table 1) almost perfectly
and best showed the incidence of leading digit 1 being always about 30%. The
trend of decreasing probability when the leading digit gets bigger, is not
shown as clearly and there were errors, but overall the results still show the
trend. These errors would have been possible to minimize by the use of larger
data as Benford´s law works the best with large sets of numbers. It would have
been interesting to also examine the law with smaller (about 100 values) sets
of numbers, which is significantly less than in trial 1. and 2. and I
would like to see if decreasing the values so dramatically causes the
distribution of leading digits be further away from the values obtained using
the formulae (table 1). As mentioned before, Benford´s law works the best which
large set of numbers, but about 100 values should still be enough to see the
law working.The importance of the law can be seen
from the applications of the law, arguably most importantly when catching frauds against the tax systems and government. To
me, the importance of this exploration was to understand and ´´see with my own
eyes´´ that math is discovered, not invented as if it would be invented I don´t
think it would be possible that mathematical theories could be seen in so many real-life scenarios and have such a variety of
applications.

1 Jamain, Adrian. “Benford´s Law.” Imperial Collage
of London, Sept. 2001,
www.bing.com/cr?IG=04ABF71CBE694010A9CA3E627AE106E5=19EAC60DCE376EC7000ECD66CF986FF3=1=fjMFn1uriLq3N9AabKCJCC2g88_M30rKwuFA-NY8joI=1=http%3a%2f%2fwwwf.imperial.ac.uk%2f%7enadams%2fclassificationgroup%2fBenfords-Law.pdf=DevEx,5037.1.,
30.12.17

2 “Benford’s Law.” From Wolfram MathWorld,
mathworld.wolfram.com/BenfordsLaw.html., 30.12.17

3 “Benford’s Law.” Wikipedia, Wikimedia Foundation,
9 Dec. 2017, en.wikipedia.org/wiki/Benford%27s_law#History., 30.12.17

4 Corn, Patrick. “Benford’s Law.” Brilliant Math
& Science Wiki, brilliant.org/wiki/benfords-law/., 01.01.18

5 “Benford’s
Law.” Wikipedia, Wikimedia Foundation, 9 Dec. 2017,
en.wikipedia.org/wiki/Benford%27s_law#History., 03.01.18

 

6 Corn, Patrick. “Benford’s Law.” Brilliant Math
& Science Wiki, brilliant.org/wiki/benfords-law/.,
20.01.18

 

7 Corn, Patrick. “Benford’s Law.” Brilliant Math
& Science Wiki, brilliant.org/wiki/benfords-law/.,
20.01.18

8 Berry, Nick. “Benford´s Law.” Benford’s Law,
datagenetics.com/blog/march52012/index.html., 20.01.18

9 Berry, Nick. “Benford´s Law.” Benford’s Law,
datagenetics.com/blog/march52012/index.html., 20.01.18

10 Berry, Nick. “Benford´s Law.” Benford’s Law,
datagenetics.com/blog/march52012/index.html., 20.01.18

11 “Number 1 and Benford’s Law – Numberphile.”
Numberphile, 20 Jan. 2013, www.youtube.com/watch?v=XXjlR2OK1kM.,
01.01.2018

12 “Benford’s Law.” Wikipedia, Wikimedia Foundation,
9 Dec. 2017, en.wikipedia.org/wiki/Benford%27s_law#History., 31.12.17

13 “APPLYING BENFORD’S LAW.” Benford’s Law,
datagenetics.com/blog/march52012/index.html., 20.01.18

14 “Benford’s Law.” Wikipedia, Wikimedia
Foundation, 9 Dec. 2017, en.wikipedia.org/wiki/Benford%27s_law#History., 01.01.18