The Squared Circle

Indian states as countries

August 8, 2009 · 6 Comments

The 28 Indian states labeled by countries with similar populations

The 28 Indian states labeled by countries with similar populations

  1. India with an estimated population of 1.17 billion is one-sixth of the world’s population.
  2. The population of India is greater than the population of any continent (except Asia which contains India).
  3. The population of India is significantly more than twice the population of North America (including the so-called “central America”).
  4. India has 28 states and 7 union territories. The most populous state of Uttar Pradesh (UP) has almost as many people living in it as the combined populations of Germany, France and UK. UP’s population is comparable to that of Brazil – the fifth most populated nation.
  5. The two most populous states of UP and Maharashtra combined have almost as many people as the USA.
  6. The island city of Mumbai (Bombay) along with the suburbs of Navi Mumbai and Thane have as many people living in it as the continent nation of Australia.

For completeness, I am also including the country population equivalents of the Union Territories of India:

  1. National Capital Territory (Delhi) – Niger
  2. Pondicherry – East Timor
  3. Chandigarh – Equitorial Guinea + Bahamas
  4. Andaman and Nicobar Islands – Brunei
  5. Dadra and Nagar Haveli – Barbados
  6. Daman and Diu – Samoa
  7. Lakshadweep – Dominica

The last Indian population census was conducted in 2001. There are estimates for the population of the country in 2009, but I couldn’t find estimates for state populations. The state populations that I used are uniformly scaled up according to the growth in the country population.

Many, many thanks to PlaneMad Arun Ganesh who blogs at BitterScotch for producing free editable maps of India among a huge number of other maps related to India.

Also many thanks to Strange Maps, which is one of the most delightful blogs I have come across. My post is obviously inspired by a similar one for US states and by a similar concept applied to GDP.

Without straying too much from the topic, a couple of observations from the GDP maps:

  1. India’s GDP is equivalent to that of Texas and the population is 50 times that of Texas.
  2. India’s GDP is only slightly greater than that of Mexico but the population is 11 times. When compared to the US both countries may be poor, but Mexico is significantly wealthy in terms of per capita GDP compared to India.

Disclaimer: All the figures used in this post are approximate (and are often estimates). Comparisons and relations with other known quantities make it easier to ‘understand’ figures, but for precise statistics I strongly recommend looking up more authoritative sources.

→ 6 CommentsCategories: India · demographics · graphic design
Tagged: , , , ,

That can’t be right!

June 26, 2009 · 2 Comments

Many years ago, I came across this remarkable ‘fact’

    \displaystyle \sum_{n=1}^\infty n^{-n} = \int_0^1 x^{-x}dx

Sadly I don’t know the source of it, nor have I ever seen a proof. It looks like a result that should have a name attached to it. So it occurred to me that Wolfram Alpha might be able to shine light on it, but apart from being provided with evidence in favor of the equality, I learned nothing more

→ 2 CommentsCategories: mathematics

Sum of powers (Part II – An estimate)

June 22, 2009 · Leave a Comment

To continue from last time, you can show that \displaystyle f(m):= \sum_{i=1}^{m-1}\left(\frac{i}{m}\right)^m < 1 for all m by thinking of the sum of powers of integers as a lower sum of a Riemann integral.

Here I am interested in finding \displaystyle \lim_{m \rightarrow \infty} f(m). This article evaluates the limit as an illustration of the Euler-Maclaurin formula. But if all you cared about is finding the limit, then it can be done much more simply.

First thing to notice is that m shows up in 3 different places in the sum, each individual m ‘pulls’ in a different direction. We are trying to find how the three pulls ‘balance’ to give an interesting sum. So to get insight, let’s generalize by using different variables, and if necessary we will set two of them equal to each other.

    \displaystyle \sum_{i=1}^{t-1} \left(\frac{i}{n}\right)^m

Sum of powers are generally difficult and so if I want to make progress I should try to get the index i in the exponent. First I will flip the sum over. To do this I need to set t=n.

    \displaystyle \sum_{i=1}^{n-1} \left(\frac{i}{n}\right)^m = \sum_{i=1}^{n-1} \left(1-\frac{i}{n}\right)^m

Anticipating an exponential, let’s write this as

    \displaystyle \sum_{i=1}^{n-1} \left(\left(1-\frac{i}{n}\right)^n\right)^{m/n}

Now if you let m \rightarrow \infty, then it will drag the sum down to 0, and on the other hand n \rightarrow \infty will take the sum to \infty. So there is a possibility of something interesting when both go to \infty together (i.e. linearly with respect to each other). In fact let \displaystyle  \frac{m}{n}  = c > 0 . Then letting n \rightarrow \infty, the sum becomes

    \displaystyle \lim_{n \rightarrow \infty} \sum_{i=1}^{n-1}\left(\frac{i}{n}\right)^{cn} = \sum_{i=1}^{\infty} e^{-ic} = \frac{e^{-c}}{1-e^{-c}}.

For the original problem, c=1 and so the limit is \displaystyle \frac{1}{e-1} which gives the useful estimate

    \displaystyle \sum_{i=1}^{m-1} i^m \approx \frac{m^m}{e-1}.

In fact for arbitrary integers n,m we get the following estimate

    \displaystyle \sum_{i=1}^{n-1} i^m \approx \frac{n^m e^{-m/n}}{1-e^{-m/n}}.

This estimate is particularly good for large values of m (how large?). For small values of m, we may as well use the Riemann sum estimate \displaystyle \frac{n^{m+1}}{m+1}.

→ Leave a CommentCategories: math · mathematics

Sum of powers

June 22, 2009 · 1 Comment

Let \displaystyle f(m) = \sum_{i=1}^{m-1}\left(\frac{i}{m}\right)^m.

    Show the following:

1) f(m) < 1 for all m \in \mathbb{N}.

The bound begs for a probabilistic interpretation, but I haven't found one yet. Can you?

To state the problem in terms of integers, \displaystyle \sum_{i=1}^{m-1}i^m < m^m.

2) f(m) is increasing.

3) Find \displaystyle \lim_{m \rightarrow \infty}f(m).

I don’t know yet how to prove (or disprove!) (2), (1) is pretty straightforward and so is (3), for a clue on (3) you could look at my previous post on a birthday problem.

→ 1 CommentCategories: math · mathematics
Tagged: , , ,

The other birthday problem (Part II-It all adds up to nothing)

June 11, 2009 · 2 Comments

Problem: What is the probability p(m,n) that every face on an m sided fair die has appeared at least once after n rolls?

In the previous post, I showed that this probability is

\displaystyle p(m,n) = \sum_{i=0}^m {m \choose i} \left(1-\frac{i}{m}\right)^n (-1)^i

Clearly, when the number of rolls (n) is less than the number of faces (m) on the die the probability is 0. Think about the sum above, it is not at all obvious that the sum is zero for n<m. In fact, let's rewrite the sum as

\displaystyle p(m,n) = (-1)^m\sum_{i=0}^m {m \choose m-i} \left(\frac{m-i}{m}\right)^n (-1)^{m-i}

and so by a change of variables

\displaystyle p(m,n) = \frac{(-1)^m}{m^n}\sum_{i=0}^m {m \choose i} i^n (-1)^i := \frac{(-1)^m}{m^n}q(m,n)

and ignoring the scaling factor, we have just discovered the remarkable combinatorial result

Theorem:

For all n=0,1,2,\ldots,m-1,

    \displaystyle q(m,n) = \sum_{j=0}^m {m \choose j} j^n (-1)^j = 0

I have only seen a special case of this result when n=0 which can be stated in words as the alternating sum of elements in any row of a Pascal’s triangle is 0. (By definition, 0^0=1).

The result is too elementary to be a new result, but I haven’t been able to find it anywhere on the web. Has anyone seen it? I found a similar result on the Wikipedia page on binomial coefficients, which says that

\sum_{j=0}^n (-1)^j\tbinom n j j^{(k)} = 0 for all k=0,1,2,\ldots,n-1

where I presume that j^{(k)} denotes the falling factorial, i.e. j^{(k)} = j(j-1)\ldots(j-k+1)

In fact, even though the probabilistic proof is perfectly fine (and way too elegant), we should give an analytic proof to be on the safe side. The analytic proof also has a useful byproduct. So here goes:

Proof: First note that -1 = e^{i \pi} and so we can write

\displaystyle q(m,n) = \sum_{j=0}^m \left. {m \choose j} j^n e^{\lambda j}  \right|_{\lambda=i \pi}

= \displaystyle \sum_{j=0}^m \left. \frac{d^n}{d\lambda^n}{m \choose j} e^{\lambda j}  \right|_{\lambda=i \pi}

= \displaystyle \left. \frac{d^n}{d\lambda^n}\left(1 + e^{\lambda}\right)^m  \right|_{\lambda=i \pi}

Note that for n=0,1,2,\ldots,m-1, each term in the the n -th derivative has the factor (1+e^\lambda), which evaluates to 0 when we plug in \lambda = i \pi. This proves the result \square.

Note that the above expression gives the exact value of q(m,m) since the only term that doesn’t evaluate to 0 is m!(-1)^m and so

\displaystyle p(m,m) = \frac{m!}{m^m}

We can check our work using a probabilistic calculation, every one of the m people needs to have a different birthday so \displaystyle p(m,m) = \left(1-\frac{1}{m}\right)\left(1-\frac{2}{m}\right)\ldots\left(1-\frac{m-1}{m}\right) which is clearly the same as what we found above.

Since p(m,n) is clearly monotonic with respect to n (for a fixed m), p(m,m) is the smallest non-zero value over non-negative integers and asymptotically p(m,n) goes to 1 as a function of n.

    Okay, so how many people do I need to invite?

If you have been paying attention, we haven’t yet answered the original question about how many people you need to invite, so that with probability \frac{1}{2}, each day is someone’s birthday. That amounts to solving the equation

\displaystyle p(m,n) = \sum_{j=0}^m {m \choose j} \left(1-\frac{j}{m}\right)^n (-1)^j

for n. Since we don’t know how to do that, for sufficiently large m, we may use the approximation

\displaystyle \left(1-\frac{j}{m}\right)^n = \left(\left(1-\frac{j}{m}\right)^m\right)^{\frac{n}{m}} \approx \displaystyle e^{-\frac{jn}{m}}

and so

\displaystyle p(m,n) \approx \sum_{j=0}^m {m \choose j} e^{-\frac{jn}{m}} (-1)^j = \sum_{j=0}^m {m \choose j} \left(-e^{-n/m}\right)^j = \left(1-e^{-n/m}\right)^m

which implies that

n \approx -m \ln \left(1- p^{1/m}\right)

For instance when m =365 and p=\frac{1}{2}, the above expression tells us that n \approx 2287.59.

To check how good this approximation is, we calculate p(365,2286) = 0.499414 and p(365,2287) = 0.500371. So we need to invite 2287 people to ensure that with probability \frac{1}{2} each day is a birthday.

It turned out to be a pretty good approximation for this value of p. In fact, when p is closer to 0 or 1, a little thought shows that the approximation is even better!

    Some more numerology

(All of which has been performed using the exponential approximation and a computer software):

1) The probability that there is a day which is not a birthday for anyone alive today is \approx p(365, 6.5 \times 10^9) = 1.7 \times 10^{-7734009}.

2) For a city of size 1 million, this probability is 5 \times 10^{-1188}.

3) For a largish university of size 10,000, it is roughly 4.6 \times 10^{-10}. Compare this to 2,287 that we needed to cross 1/2 !

4) Once a certain size threshold is reached it becomes rapidly certain that every birthday will be taken. For instance, the number of people needed to have a probability 0.1 of all days being birthdays is n(0.1)=1850, while n(0.9) = 2975. Similarly, n(0.01)=1598 and n(0.99)=3832.

    Two and a half men

Why invite only integer number of people? One might wonder what does the function q(m,n) do for a fixed large m and n \in \mathbb{R}^+.

The question is what does the function

Q(x) := q(m,\cdot)(x) : \mathbb{R}^+ \rightarrow \mathbb{R} look like?

    Tame to wild in 60 seconds

It’s perhaps safe to say that the function is monotonic and boring (proof?) for x>m. What about x<m? We know that the integers (less than m) are zeros of the function Q. Which are the other zeros? How many zeros are there? Can we set bounds on the values of Q? Bounds on its derivative?

I will write more in a subsequent post about some of these fascinating questions. Incidentally, as I play around with the sum q(m,n) both MATLAB and Mathematica are giving me very misleading results due to (I presume) floating-point errors.

→ 2 CommentsCategories: mathematics · probability
Tagged: , , , , ,

The other birthday problem (Part I – Party every day)

June 9, 2009 · 1 Comment

Everyone has heard about the birthday problem which is to find the minimum number of people so that with probability \frac{1}{2}, two of them share a birthday. So I will talk about its less popular, but no less entertaining cousin.

The other birthday problem asks for the minimum number of people required so that with probability \frac{1}{2} each day is someone’s birthday.

To state it a little more generally:

Problem How many times should we roll an m sided fair die to get all m faces at least once with probability p?

Or to invert it: What is the probability p(m,n) that every face on an m sided fair die has appeared at least once after n rolls? (m birthdays, n people)

Solution: Suppose I roll the die n times. The probability p that every face shows up at least once is the complement of the probability that at least one face never shows up.

The probability that 3 (for instance) never shows up is \displaystyle \left(1-\frac{1}{m}\right)^n. But there is nothing special about 3, so the probability that any one of the sides doesn’t show up should be roughly \displaystyle m\left(1-\frac{1}{m}\right)^n. This actually gives a pretty good approximation to the answer i.e. p(m,n) \approx 1 - m\left(1-\frac{1}{m}\right)^n (and in fact can be easily inverted for n).

However this answer is not exact because we are overcounting. For instance, the probability that 3 doesn’t show up and 7 doesn’t show up are not mutually exclusive (both may not show up!), so we need to subtract this probability. Since there {m \choose 2} pairs each of which is equally likely to not show up, this probability is \displaystyle {m \choose 2}\left(1-\frac{2}{m}\right)^n.

But when we subtract, we end up taking away too much and we need to add back the probability that some three faces do not show up and so on. Doing this recursively amounts to using the Inclusion-Exclusion Principle, and we end up with an exact answer.

\displaystyle p(m,n) = \sum_{i=0}^m {m \choose i} \left(1-\frac{i}{m}\right)^n (-1)^i

Thereby we declare the problem solved (at least the inverse problem). But the fun really begins here! Coming up – in the next post – the above expression will take us to some surprising places.

→ 1 CommentCategories: mathematics · probability

Goldbach-Euler theorem

June 7, 2009 · Leave a Comment

The previous post asked about the sum of a certain series.

Problem What is \displaystyle \sum_{k=2}^\infty \sum_{n=2}^\infty \frac{1}{k^n}?
Solution The summation with respect to n is a geometric series which sums to \displaystyle \frac{1}{k(k-1)} = \frac{1}{k-1}-\frac{1}{k}. When summed over k, this telescopes to 1 \square.

Problem If S = \{a^n : a,n \in \mathbf{N}, a,n \geq 2\}, then what is \displaystyle \sum_{k \in S} \frac{1}{k-1}?

Solution Let T = \mathbf{N} \setminus \{1\} \setminus S, i.e. T is the set of integers greater than or equal to 2 that are not non-trivial integer powers of other integers.

Note the following:
(1) \displaystyle \mathbf{N}\setminus \{1\} =  \dot{\bigcup_{m \geq 1}} T^m.

(2) \displaystyle S = \dot{\bigcup_{n \geq 2}} T^n.

So the sum

\displaystyle \sum_{k \in S} \frac{1}{k-1} = \sum_{n=2}^\infty \sum_{j \in T}\frac{1}{j^n-1}

\displaystyle = \sum_{n=2}^\infty  \sum_{j \in T} \sum_{m=1}^\infty\left(\frac{1}{j^n}\right)^m = \sum_{n=2}^\infty  \sum_{m=1}^\infty \sum_{j \in T} \frac{1}{j^{mn}}

\displaystyle = \sum_{n=2}^\infty \sum_{k=2}^\infty \frac{1}{k^n}=1.

We used (2), then the geometric series formula, a slight rearrangement followed by (1) and finally the result of the previous problem \square.

This result is not contained in any of the several hundreds of extant letters of correspondence between Euler and Goldbach. But the result has been attributed to Goldbach by Euler.

→ Leave a CommentCategories: history of math · mathematics

Reciprocal sum of powers

June 2, 2009 · 1 Comment

Here’s a completely elementary series summation. What is the sum of reciprocals of all integers that can be written as nontrivial powers of other integers? i.e.

Problem What is \displaystyle \sum_{k=2}^\infty \sum_{n=2}^\infty \frac{1}{k^n}?

If you haven’t seen it before, you will like working it out.

The above sum has repeats, for example 2^4 = 4^2 =16. If we disallow repeats and subtract one from each integer, then again the exact sum can be written down.

Problem If S = \{a^n : a,n \in \mathbf{N}, a,n \geq 2\}, then what is \displaystyle \sum_{k \in S} \frac{1}{k-1}?

This one uses the result from the previous problem but is not completely straightforward. Enjoy!

Solutions coming up soon! Feel free to write solutions in comments or email me.

→ 1 CommentCategories: mathematics

Are you game? (Part VI – Epilogue)

June 1, 2009 · 1 Comment

In the previous post I left the proofs for finding expected time as exercises.

Let’s just go over one easy but interesting case. Suppose player 1 wins with probability \frac{1}{2}, player 2 with probability \frac{1}{3} and in general player i with probability \frac{1}{i+1}. Then the probability that the game hasn’t ended when it is player i’s turn is

\displaystyle \left(1-\frac{1}{2}\right)\left(1-\frac{1}{3}\right)\left(1-\frac{1}{4}\right)\ldots\left(1-\frac{1}{i}\right) = \frac{1}{2}\frac{2}{3}\frac{3}{4}\ldots\frac{i-1}{i} = \frac{1}{i} which goes to 0 as i goes to \infty. So the game ends almost surely and the probability that the game ends with player i is \displaystyle \frac{1}{i} \frac{1}{i+1}. The expected time of completion is \displaystyle\sum_{i=1}^\infty i \frac{1}{i} \frac{1}{i+1} = \sum_{i=1}^\infty  \frac{1}{i+1} = \infty.

→ 1 CommentCategories: mathematics · probability

Are you game? (Part V – Game over)

May 31, 2009 · 3 Comments

Recall that a game consists of a countable sequence of events that occur at the discrete times 1,2,3, \ldots. A game-ender occurs at time i with probability p_i where 0 \leq p_i \leq 1 where not all p_i are 0.

1) Then the the game ends almost surely if and only if \displaystyle \alpha = \prod_{i=1}^\infty (1-p_i) = 0 i.e. if and only if \displaystyle \prod_{i=1}^\infty (1-p_i) diverges (to 0).

2) The expected time of completion of the game is given by \displaystyle E = \sum_{j=1}^\infty j \alpha_{j-1}p_j where \displaystyle \alpha_j = \prod_{i=1}^j (1-p_i).

Suppose that \displaystyle p_i = \frac{1}{(i+1)^k}. Then you can show with a little bit of work the following :

1) If k>1, then the game does not end with probability 1 (\alpha > 0).

2) If 0 \leq k < 1, then the game ends with probability 1 (\alpha = 0) and the expected time of completion is finite.

3) If k=1, then the game ends almost surely, but the expected time of completion is \infty.

The thing I find interesting is that we calculated the expected time of completion of the game where the sequence p_i is periodic with period n to be

\displaystyle E  = \frac{\sum_{j=1}^n jp_j\alpha_{j-1} + n \alpha_n}{1-\alpha_n}

We should be able to let n \rightarrow \infty and recover the expression for expected time that we obtained earlier. But if we do so, then the two expressions give the same result for the finite expected time case if and only if n \alpha_n \rightarrow 0 as n \rightarrow \infty.

This leads me to the following conjecture:

    n \alpha_n \rightarrow 0 if and only if E < \infty that is iff \displaystyle \sum_{j=1}^\infty j \alpha_{j-1}p_j < \infty

Note: The conjectural leap is that if n \alpha_n \rightarrow 0 then the sum converges. The converse is more or less clear from the previous argument.

I haven’t (yet) tried to prove the conjecture. It is possible that either 1) it is false or 2) it is true for some trivial reason. I would appreciate feedback.

→ 3 CommentsCategories: mathematics · probability