Showing

how Benford´s law applies to real-life

data

Introduction

We have been talking in

TOK whether mathematics is discovered or

invented and I have been always thinking that it was invented, but when I heard

that mathematics can explain things in the real

world such as magnetism and waves I immediately wanted to see with my

own eyes if mathematics really applies to real world. And even if we can

´´see´´ mathematics in nature, does it actually explain anything or can we use

it somehow? Finding mathematical explanations in nature is great, but if we

could use that information in real life then that would be glorious.

Studying Benford’s law is interesting as

the law applies to so many real-life

lists of numbers such as house

prices, population numbers, and death

rates. To do that I will investigate 3 sets of data, calculate the incidence of

the leading digits and see how it matches the distribution of leading digits

obtained by the law. The aim of this exploration to me is to see how

mathematics can be seen in nature. For sources I have many electronic sources

as literary about the topic of this exploration was not available in my library

in my country and one of the main sources used is Wikipedia as it contained the

most detailed information about the law and many other sources recommended it

for more detailed information. I also have one university-level

essay I used which is more detailed than Wikipedia, but the mathematics in it

is above IB standard level.

The Benford´s law

It was 1881 when Simon

Newcomb was reading through logarithm

tables when he observed that the earlier pages were more common than the other

pages. In 1983 physicist Frank Benford tested it on real data that included numbers

taken from newspapers, population sizes,

air pressure measurements and many else,

introducing the law in a more detailed way

to other mathematicians1. Benford´s law is a

mathematical theory which determines the distribution of leading digits. Leading

digit is the first digit of a number so

for example, a leading digit of 1099 is 1

and 1,2,3,4,5,6,7,8 and 9 are all possible leading digits. When observing a

list of numbers, we often assume that the leading digits would be evenly

distributed and number 1 occurs as often as number nine, but Benford´s law

states that actually number one is the first digit with 30.1% probability which

is a lot greater than the first guess, 11.1%2. Benford´s law makes

predictions for distribution of other digits too and so Benford’s law can be used to explain the

distribution of leading digits in sets of numbers. This distribution of first

digits can be seen from a bar graph below:

Graph 1. Distribution of first digits

3 This graph is not mine

Each bar represents a digit, and the

height of the bar is the percentage of numbers that start with that digit. The

graph shows how number one has the greatest probability of appearing as a

leading digit and then the probability

gradually decreases as the numbers get

bigger until number nine that has the smallest probability to appear as a

leading digit.

The

probability of the first digit (d) in a set

of numbers that satisfy Benford´law can also be represented by the formula:

Formula for

probability of the first digit (d) where d ? (1,…,9)

4

And it can be simplified as?

Here we have base 10, but Benford´s law also works with

any other base when b ? 1.5

Now, we can use this formula to calculate the probability of a leading

digit 2 (or any other leading digit):

P(2) = log10 (1+

12)

=log10(1.5)

=0.17609…

=17.6% (rounded)

This way we can calculate the

distribution of leading digits and show them in a table.

Table 1. Distribution of

leading digits calculated using the formula

This table shows the same

thing as graph 1., but in tabular form showing the same trend -the probability

gets smaller when the leading digit gets bigger.

Logarithmic scale

6 This

scale is not mine

One way to explain Benford´s

law is to look at the logarithmic scale.

If we take a number, for example a number 5645, we can observe that log10(5645)=3,75.

On the scale value 0.75 lies between log104 (0.70) and log105

(0.78). So, number 5645 has a leading digit 5.7 Also, the distance between

each value gets shorter when you move along the scale. The width of each

section is proportional to log10(d+1)-log10(d)8.

Now, we can take this scale

to the next level and have a scale below where the colored area in the logarithmic scale shows the probability of each

leading digit (check which color represents which leading digit from the table below it:

9 This

scale is not mine

The table below shows the

leading digit, its probability, and the

color representing it.

10 This

table is not mine

Restrictions of the law

It should be noted that

Benford´s law does not apply to all sets of numbers and the law only works if

the values are distributed across multiple orders of magnitude, therefore the

law works the best with large sets of numbers11. Order of magnitude is a

measure of the size of a number and values distributed across multiple orders

of magnitude differ a lot from each other when compared. So, Benford´s law

would not work with for example heights of humans as the values are not

distributed across multiple orders of magnitude as all humans have a height

varying from zero to two meters12.

Analysis

Now, I can take a look at

data from online and see if the law works there and if it does then how precise

the distribution of leading digits is compared to table 1. Each of my set of

numbers will contain about 200 or fewer

values and I will calculate the number of incidences by copy-pasting all the values from the source one by one to Excel and

then use Excel to calculate the number of incidences. I will use data

containing about 200 values as Benford´s law works the best with a large set of numbers (100 and above) and the

bigger the data, the smaller the difference between the distribution of leading digits when comparing incidences obtained

with values from table 1. The raw data will be found at the end of this essay

in section Appendix.

First, I will show you

step-by-step how Excel was used. I learned these steps from a document that has

instructions about how to apply Benford´s law, a link to the dovument will be

found in footnote 13.13

1.

Start with Excel

that has all the values you need.

2.

On the cell (box)

next to the first number, perform following steps:

Type in =LEFT(

Copy the first number

Type in )

Press the Enter key

3.

Now, the leading

digit for the first number will appear. Click it, hold the left mouse and drag

the cursor down until you reach the end of the list of numbers. Then release

the mouse.

4.

Click A-Z button on the top of the

menu.

5.

When the Sort Warning window appears, select

Expand the selection and click sort.

6.

All the transactions now will be arranged by the

first digit.

7.

Select the column containing all of the leading digit 1.

Click on data on the top menu and choose subtotal.

8.

Choose Use Function in the subtotal window and click count. Then

click OK.

9.

Now you will have

the total number of leading digit 1s.

10.

Repeat steps 7. to

8. for other leading digits.

11.

Make a table.

The incidence % for each

leading digit I calculated by dividing the number of incidences with total

incidences and multiplied by 100%.

Trial 1.

First, we are going to look

at populations of 200 different countries and see if the law works here and if

it does then how precisely do the probability

of the leading digits obey the law. I will take a look at all the values

in Excel and count how many times does each leading digit occur in the data and

then calculate the incidences as a percentage.

Table 2. Number of incidences

and the incidence as percentage, for each leading digit in data containing

populations of 200 countries

When the incidences as a percentage in this table are compared with the distribution

of leading digits in table 1 or observe from Graph 1. we can observe that the

incidence percentage of leading digit being 3 is exactly the same and almost

the same for leading digits 1,9 and 6. Overall the values obey the law almost

perfectly and the incidence does gradually decrease as the leading digit gets

bigger, the only error is between leading digit 7 and 8 where the incidence in

greater for leading digit 8 than 7. This is remarkable as we can see that the

mathematical law exists in the real world and I can say with no doubt now that

Benford´s law does apply to real sets of numbers.

Graph 2. The distribution

of leading digits for populations of countries

Each bar in this graph

represents a leading digit and the height the incidence percentage. The trendline

shows the trend of decreasing incidence when the leading digit gets bigger. The

trend is not as smooth here as in graph 1 obtained by Benford’s law and the reason for that can be the size of the data.

As Benford’s law works the best with large sets of numbers, the bigger the set of

numbers, the more the graph is like to graph 1.

Trial 2.

We can also look at 199

countries listed by their total area and see if Benford´s law works here and

how precisely it works if it works. My gut already tells me that it works, but

maybe not as precisely as in trial 1 for leading digit 1 as it is difficult to

imagine for me why it would occur so often here, but let see if I am right.

Table 3. Number of incidences

and the incidence as percentage, for each leading digit in data containing

total areas of 200 countries

As we compare the incidences

in this table with the values from graph 1 or the distribution of leading

digits from table 1. we observe that the values are really close to each other.

For example, the difference between first

digit being 5 in this table and in graph 1 is only 2,4% and the difference gets

even smaller when we calculate the difference for other values. For some reason, this data did not obey the law as

clearly as trial 1, but still, we can

clearly say that the law works here. Also, in graph 1 we saw a trend where the incidence gets gradually smaller when the

leading digit gets smaller, but unfortunately here we see the same trend for

only the first 4 leading digits. The total areas of countries existing today

are set my wars and history in general, so I think it is pretty remarkable that

the law works even here as I think it is not natural to think that mathematical

theory could match history.

Graph 3. The distribution

of leading digits for total areas of countries

Each bar in this graph

represents a leading digit and the height the incidence percentage. The

trendline shows the general trend of decreasing incidence when the leading

digit gets bigger, but there are bars higher than the bar of bigger

leading digit and for example, the bar of leading digit 9 is higher than bars

for leading digits 8, 7 and 5.

Trial 3.

We can then take a look at something in nature and it

could be linked to rivers, lakes, altitudes etc., but I have chosen to look at elevations of 166 countries as I found

enough data to examine them. I will have my elevations is meters, but the law

should work as well with other units. The original data contained elevations of

200 countries, but some of them had an elevation

of 0 meters and I will not include them in my investigation as Benford’s law does not consider 0 to be a

leading digit.

Table 4. Number of incidences and the incidence as

percentage, for each leading digit in data containing elevations of countries

From this table, we

can see that the incidences are similar compared to incidences obtained by Benford’s law, but definitely differ a lot more than in trials 1 and 2. I am not sure

why, but possible reasons could be the decreased size of the data and possible

human manipulation of the numbers as Benford’s

law works the best with numbers that are not changed by humans and are natural.

Graph 4. The distribution

of leading digits for evaluations of countries

Each bar in this graph

represents a leading digit and the height the incidence percentage. From

this graph, we can see the trend of decreasing

incidence as the leading digit gets bigger, but the trend is scattered and a

lot less precise than in Benford’s graph

(graph 1) or in graph 2. For some reason,

there are bars higher than the bar of bigger leading digit and an example of

this would be bars for leading digit 4

and 5.

Applications

One of the reasons the Benford’s law is so amazing is the variety of

applications. The most famous one would be its uses in fraud detection. If a

person is to make up numbers to cheat for example the government or the tax

system, the person is likely to aim to distribute the numbers uniformly, but as

Benford’s law shows this should not be

naturally possible in large sets of numbers and so can be used in fraud

detection when the values are compared with distribution according to Benford’s law.

The law can also be used when

checking the reliability of election

results and was used to catch a fraud in

Iranian election 2009. But it should be noted that some experts don´t support the reliability of the law in case of

elections.

There are also other cases

where Benford’s law has been used to

catch a fraud. For example some years after Greece joined the eurozone, their

macroeconomic data they used to get into the eurozone was shown to be false

using the law.14

Conclusion

Overal the data I tested to

see if the Benford’s law works matched

the distribution of leading digits obtained from the use of the formula (values in table 1) almost perfectly

and best showed the incidence of leading digit 1 being always about 30%. The

trend of decreasing probability when the leading digit gets bigger, is not

shown as clearly and there were errors, but overall the results still show the

trend. These errors would have been possible to minimize by the use of larger

data as Benford´s law works the best with large sets of numbers. It would have

been interesting to also examine the law with smaller (about 100 values) sets

of numbers, which is significantly less than in trial 1. and 2. and I

would like to see if decreasing the values so dramatically causes the

distribution of leading digits be further away from the values obtained using

the formulae (table 1). As mentioned before, Benford´s law works the best which

large set of numbers, but about 100 values should still be enough to see the

law working.The importance of the law can be seen

from the applications of the law, arguably most importantly when catching frauds against the tax systems and government. To

me, the importance of this exploration was to understand and ´´see with my own

eyes´´ that math is discovered, not invented as if it would be invented I don´t

think it would be possible that mathematical theories could be seen in so many real-life scenarios and have such a variety of

applications.

1 Jamain, Adrian. “Benford´s Law.” Imperial Collage

of London, Sept. 2001,

www.bing.com/cr?IG=04ABF71CBE694010A9CA3E627AE106E5=19EAC60DCE376EC7000ECD66CF986FF3=1=fjMFn1uriLq3N9AabKCJCC2g88_M30rKwuFA-NY8joI=1=http%3a%2f%2fwwwf.imperial.ac.uk%2f%7enadams%2fclassificationgroup%2fBenfords-Law.pdf=DevEx,5037.1.,

30.12.17

2 “Benford’s Law.” From Wolfram MathWorld,

mathworld.wolfram.com/BenfordsLaw.html., 30.12.17

3 “Benford’s Law.” Wikipedia, Wikimedia Foundation,

9 Dec. 2017, en.wikipedia.org/wiki/Benford%27s_law#History., 30.12.17

4 Corn, Patrick. “Benford’s Law.” Brilliant Math

& Science Wiki, brilliant.org/wiki/benfords-law/., 01.01.18

5 “Benford’s

Law.” Wikipedia, Wikimedia Foundation, 9 Dec. 2017,

en.wikipedia.org/wiki/Benford%27s_law#History., 03.01.18

6 Corn, Patrick. “Benford’s Law.” Brilliant Math

& Science Wiki, brilliant.org/wiki/benfords-law/.,

20.01.18

7 Corn, Patrick. “Benford’s Law.” Brilliant Math

& Science Wiki, brilliant.org/wiki/benfords-law/.,

20.01.18

8 Berry, Nick. “Benford´s Law.” Benford’s Law,

datagenetics.com/blog/march52012/index.html., 20.01.18

9 Berry, Nick. “Benford´s Law.” Benford’s Law,

datagenetics.com/blog/march52012/index.html., 20.01.18

10 Berry, Nick. “Benford´s Law.” Benford’s Law,

datagenetics.com/blog/march52012/index.html., 20.01.18

11 “Number 1 and Benford’s Law – Numberphile.”

Numberphile, 20 Jan. 2013, www.youtube.com/watch?v=XXjlR2OK1kM.,

01.01.2018

12 “Benford’s Law.” Wikipedia, Wikimedia Foundation,

9 Dec. 2017, en.wikipedia.org/wiki/Benford%27s_law#History., 31.12.17

13 “APPLYING BENFORD’S LAW.” Benford’s Law,

datagenetics.com/blog/march52012/index.html., 20.01.18

14 “Benford’s Law.” Wikipedia, Wikimedia

Foundation, 9 Dec. 2017, en.wikipedia.org/wiki/Benford%27s_law#History., 01.01.18