Scribed by Madhur Tulsian
Summary
Today we show how to construct a pseudorandom function from a pseudorandom generator.
1. Pseudorandom generators evaluated on independent seeds
We first prove a simple lemma which we will need. This lemma simply says that if is a pseudorandom generator with output length , then if we evaluate on independent seeds the resulting function is still a pseudorandom generator with output length .
Lemma 1 (Generator Evaluated on Independent Seeds) Let be a pseudorandom generator running in time . Fix a parameter , and define as
Then is a pseudorandom generator.
Proof: We will show that if for some , is not a psedorandom generator, then cannot be a pseudorandom generator.
The proof is by a hybrid argument. If is not a pseudorandom generator, then there exists an algorithm of complexity at most , which distinguishes the output of on a random seed, from a truly random string of bits i.e.
We can then define the hybrid distributions , where in we relplace the first outputs of the pseudorandom generator by truly random strings.
As before, the above statement which says would imply that there exists an between 0 and such that
We can now define an algorithm which violates the pseudorandom property of the generator . Given an input , generates random strings , , and outputs . It then follows that
Hence, distinguishes the output of on a random seed from a truly random string , with probability at least . Also, the complexity of is at most , where the term corresponds to generating the random strings and the terms corresponds to evaluating on at most random seeds.
2. Construction of Pseudorandom Functions
We now describe the construction of a pseudorandom function from a pseudorandom generator. Let be a length-doubling pseudorandom generator. Define such that equals the first bits of , and define such that equals the last bits of .
The the GGM pseudorandom function based on is defined as follows: for key and input :
The evaluation of the function can be visualized by considering a binary tree of depth , with a copy of the generator at each node. The root receives the input and passes the outputs and to its two children. Each node of the tree, receiving an input , produces the outputs and which are passed to its children if its not a leaf. The input to the function , then selects a path in this tree from the root to a leaf, and produces the output given by the leaf.
We will prove that if is a pseudorandom generator running in time , then is a secure pseudorandom function.
2.1. Considering a tree of small depth
We will first consider a slightly simpler situation which illustrates the main idea. We prove that if is pseudorandom and runs in time , then the concatenated output of all the leaves in a tree with levels, is pseudorandom. The result is only meaninful when is much smaller than .
Theorem 2 Suppose is a pseudorandom generator and is computable in time . Fix a constant and define as Then defined as
is a pseudorandom generator.
Proof: The proof is again by a hybrid argument. The hybrids we consider are easier to describe in terms of the tree with nodes as copies of . We take to be the distribution of outputs at the leaves, when the input to the nodes at depth is replaced by truly random bits, ignoring the nodes at depth . Hence, is simply distributed as for a random i.e. only the input to the root is random. Also, in we replace the outputs at depth by truly random strings. Hence, is simply distributed as a random string of length . The figure below shows the hybrids for the case , with red color indicating true randomness.
We will prove that is not a secure pseudorandom generator, then is not secure. If we assume that there is an algorithm of complexity such that
then we get that there is an such that .
We now consider again the difference between and . In the bits which are the inputs to the nodes at depth are replaced by random bits. These are then used to generate bits which serve as inputs to nodes at depth . In , the inputs to nodes at depth are random.
Let denote the function which given bits, treats them as inputs to the nodes at depth and evaluates the output at the leaves in the tree for . If , then is distributed as . Also, if , then is distributed as .
Hence, can be used to create a distinguisher which distinguishes evaluated on independent seeds, from random strings of length . In particular, for , we take . This gives
Hence, distinguishes from a random string. Also, has complexity . However, by Lemma 1, if is not secure then is not secure. Since , this completes the proof.
2.2. Proving the security of the GGM construction
Recall that the GGM function is defined as
We will prove that
Theorem 3 If is a pseudorandom generator and is computable in time , then is a secure pseudorandom function.
Proof: As before, we assume that is not a secure pseudorandom function, and will show that this implies is not a pseudorandom generator. The assumption that is not secure, gives that there is an algorithm of complexity at most which distinguishes on a random seed from a random function , i.e.
We consider hybrids as in the proof of Theorem 2. is the distribution of for and is the uniform distribution over all functions from to . As before, there exists such that
However, now we can no longer use to construct a distinguisher for as in Theorem 2 since may now be as large as . The important observation is that since has complexity , it can make at most queries to the function it is given as an oracle. Since the (at most) queries made by will be paths in the tree from the root to the leaves, they can contain at most nodes at depth . Hence, to simulate the behavior of , we only need to generate the value of a function distributed according to or at inputs.
We will use this to contruct an algorithm which distinguishes the output of on independent seeds from random strings of length . takes as input a string of length , which we treat as pairs with each being of length . When queried on an input , will pick a pair according to the first bits of (i.e. choose the randomness for the node at depth which lies on the path), and then choose . In particular, works as below:
- Start with counter .
- Simulate . When given a query
- Check if a pair has already been chosen from the first pairs.
- If not, set and set .
- Answer the query made by as .
- Return the final output given by .
Then, if all pairs are random strings of length , the answers received by are as given by a oracle function distributed according to . Hence,
Similarly, if the pairs are outputs of the pseudorandom generator on independent seeds , then the view of is the same as in the case with an oracle function distributed according to . This gives
Hence, distinguishes the output of from a random string with probability . Also, it runs in time . Then Lemma 1 gives that is not secure.