# How to approach this algorithm question?

- A website has a database of n questions.
- You click a button and are shown one random question per click. The probability of a particular question showing up at the click event is 1/n.

On average, how many clicks would be required to see all the questions in the database?

What is the approach required for such questions?

## Answers

See coupon collector's problem.

This is a mathematical question rather than an algorithmic one. As sdcvvc said, this is the famous coupon collector's problem.

Suppose you have n questions to "gather". Let X be a random variable denoting the required number of clicks. If we define Xi to be the number of clicks from the moment we have (i-1) questions to the moment we have i questions, then:

X = X1 + X2 + ... + Xn

Due to the linearity of the expected value:

E(X) = E(X1 + X2 + ... + Xn) = EX1 + EX2 + ... + EXn

If we inspect Xi, we see that in fact it has geometric distribution with p=(n-i+1)/n, hence a mean value of n/(n-i+1). Therefore:

EX = n * (1/n + 1/(n-1) + ... + 1/2 + 1/1) = n * Hn

Where Hn is the nth Harmonic number, which can be approximated by ln n:

**EX ~= n * ln n**

You can run a simple simulation and test this approximation.

Why don't we run a simulation and found out?

<?php function simulate($size) { $range = range(1, $size); $hits = 0; $hit = array(); while(count($hit) != $size) { $rand = array_rand($range); $hit[$rand] = 1; $hits++; } return $hits; } for ($j=10; $j<101; $j+=10) { $res = array(); for ($i=0; $i<10; $i++) { $res[] = simulate($j); } echo "for size=$j, avg=" . array_sum($res)/10 . "<br />"; }

Output:

for size=10, avg=35.9 for size=20, avg=68.8 for size=30, avg=123.3 for size=40, avg=176.9 for size=50, avg=205.9 for size=60, avg=276.8 for size=70, avg=304.9 for size=80, avg=401.9 for size=90, avg=371 for size=100, avg=617.7