CS276 Lecture 14: Pseudorandom Functions from Pseudorandom Generators

Scribed by Madhur Tulsian

Summary

Today we show how to construct a pseudorandom function from a pseudorandom generator.

1. Pseudorandom generators evaluated on independent seeds

We first prove a simple lemma which we will need. This lemma simply says that if {G} is a pseudorandom generator with output length {m} , then if we evaluate {G} on {k} independent seeds the resulting function is still a pseudorandom generator with output length {km} .

Lemma 1 (Generator Evaluated on Independent Seeds) Let ${G:\{ 0,1 \}^n \rightarrow \{ 0,1 \}^m}$ be a ${(t,\epsilon)}$ pseudorandom generator running in time ${t_g}$ . Fix a parameter , and define ${G^k : \{ 0,1 \}^{kn} \rightarrow \{ 0,1 \}^{km}}$ as

$\displaystyle G^k(x_1,\ldots,x_k) := G(x_1),G(x_2),\ldots,G(x_k)$

Then ${G^k}$ is a ${(t-O(km + kt_g),k\epsilon)}$ pseudorandom generator.

Proof: We will show that if for some ${(t,\epsilon)}$ , ${G^k}$ is not a ${(t,\epsilon)}$ psedorandom generator, then {G} cannot be a ${(t+O(km + kt_g),\epsilon/k)}$ pseudorandom generator.

The proof is by a hybrid argument. If ${G^k}$ is not a ${(t, \epsilon)}$ pseudorandom generator, then there exists an algorithm {D} of complexity at most {t} , which distinguishes the output of ${G^k}$ on a random seed, from a truly random string of {km} bits i.e.

$\displaystyle \left\lvert\mathop{\mathbb P}_{x_1 , \ldots, x_k}\left[D(G(x_1), \ldots, G(x_k)) = 1\right] - \mathop{\mathbb P}_{r_1, \ldots, r_k}\left[D(r_1, \ldots, r_k) = 1\right] \right\rvert ~>~ \epsilon$

We can then define the hybrid distributions ${H_0, \ldots, H_k}$ , where in ${H_i}$ we relplace the first {i} outputs of the pseudorandom generator {G} by truly random strings.

$\displaystyle H_i = (r_1, \ldots, r_i, G(x_{i+1}), \ldots, G(x_n))$

As before, the above statement which says ${\left\lvert\mathop{\mathbb P}_{z \sim H_0} [D(z) = 1] - \mathop{\mathbb P}_{z \sim H_k}[D(z) = 1]\right\rvert > \epsilon}$ would imply that there exists an {i} between 0 and {k-1} such that

$\displaystyle \left\lvert\mathop{\mathbb P}_{z \sim H_i} [D(z) = 1] - \mathop{\mathbb P}_{z \sim H_{i+1}}[D(z) = 1]\right\rvert > \epsilon/k$

We can now define an algorithm {D'} which violates the pseudorandom property of the generator {G} . Given an input ${y \in \{ 0,1 \}^m}$ , {D'} generates random strings ${r_ 1, \ldots, r_i \in \{ 0,1 \}^m}$ , ${x_{i+2}, \ldots, x_{k} \in \{ 0,1 \}^n}$ , and outputs ${D(r_1, \ldots, r_i, y, G(x_{i+2}), \ldots, G(x_{k}))}$ . It then follows that

$\displaystyle \mathop{\mathbb P}_{x \sim \{ 0,1 \}^n} [D'(G(x)) = 1] ~= \mathop{\mathbb P}_{z \sim H_i}[D(z) = 1] ~~\text{and}~ \mathop{\mathbb P}_{r \sim \{ 0,1 \}^m} [D'(r) = 1] ~= \mathop{\mathbb P}_{z \sim H_{i+1}}[D(z) = 1]$

Hence, {D'} distinguishes the output of {G} on a random seed {x} from a truly random string {r} , with probability at least ${\epsilon/k}$ . Also, the complexity of {D'} is at most ${t + O(km) + O(kt_g)}$ , where the {O(km)} term corresponds to generating the random strings and the ${O(kt_g)}$ terms corresponds to evaluating {G} on at most {k} random seeds. $\Box$

2. Construction of Pseudorandom Functions

We now describe the construction of a pseudorandom function from a pseudorandom generator. Let ${G:\{ 0,1 \}^n \rightarrow \{ 0,1 \}^{2n}}$ be a length-doubling pseudorandom generator. Define ${G_0 : \{ 0,1 \}^n \rightarrow \{ 0,1 \}^n}$ such that ${G_0(x)}$ equals the first {n} bits of {G(x)} , and define ${G_1:\{ 0,1 \}^n \rightarrow \{ 0,1 \}^n}$ such that ${G_1(x)}$ equals the last {n} bits of {G(x)} .

The the GGM pseudorandom function based on {G} is defined as follows: for key ${K\in \{ 0,1 \}^n}$ and input ${x\in \{ 0,1 \}^n}$ :

$\displaystyle F_K (x) := G_{x_n} (G_{x_{n-1}} ( \cdots G_{x_2} ( G_{x_1} (K) ) \cdots )) \ \ \ \ \ (1)$

The evaluation of the function {F} can be visualized by considering a binary tree of depth {n} , with a copy of the generator {G} at each node. The root receives the input {K} and passes the outputs ${G_0(K)}$ and ${G_1(K)}$ to its two children. Each node of the tree, receiving an input {z} , produces the outputs ${G_0(z)}$ and ${G_1(z)}$ which are passed to its children if its not a leaf. The input {x} to the function ${F_K}$ , then selects a path in this tree from the root to a leaf, and produces the output given by the leaf.

We will prove that if ${G:\{ 0,1 \}^n \rightarrow \{ 0,1 \}^{2n}}$ is a ${(t,\epsilon)}$ pseudorandom generator running in time ${t_g}$ , then {F} is a ${(t/O(n\cdot t_g), \epsilon \cdot nt)}$ secure pseudorandom function.

2.1. Considering a tree of small depth

We will first consider a slightly simpler situation which illustrates the main idea. We prove that if {G} is ${(t,\epsilon)}$ pseudorandom and runs in time ${t_g}$ , then the concatenated output of all the leaves in a tree with {l} levels, is ${(t-O(2^l \cdot t_g), l2^l \cdot\epsilon)}$ pseudorandom. The result is only meaninful when {l} is much smaller than {n} .

Theorem 2 Suppose ${G: \{ 0,1 \}^n \rightarrow \{ 0,1 \}^{2n}}$ is a ${(t,\epsilon)}$ pseudorandom generator and is computable in time ${t_g}$ . Fix a constant and define ${F_K:\{ 0,1 \}^l \rightarrow \{ 0,1 \}^n}$ as ${F_K (y) := G_{y_l} (G_{y_{l-1}} ( \cdots G_{y_2} ( G_{y_1} (K) ) \cdots ))}$ Then ${\overline{G}:\{ 0,1 \}^n \rightarrow \{ 0,1 \}^{2^l\cdot n}}$ defined as
$\displaystyle \overline{G}(K) := (F_K(0^l), F_K(0^{l-1}1), \ldots, F_K(1^l))$

is a ${(t - O(2^{l}\cdot t_g), l \cdot 2^l \cdot \epsilon)}$ pseudorandom generator.

Proof: The proof is again by a hybrid argument. The hybrids we consider are easier to describe in terms of the tree with nodes as copies of {G} . We take ${H_i}$ to be the distribution of outputs at the leaves, when the input to the nodes at depth {i} is replaced by truly random bits, ignoring the nodes at depth {i-1} . Hence, ${H_0}$ is simply distributed as ${\overline{G}(K)}$ for a random {K} i.e. only the input to the root is random. Also, in ${H_l}$ we replace the outputs at depth {l-1} by truly random strings. Hence, ${H_l}$ is simply distributed as a random string of length ${n \cdot 2^l}$ . The figure below shows the hybrids for the case {l=2} , with red color indicating true randomness.

We will prove that ${\overline{G}}$ is not a ${(t, \epsilon)}$ secure pseudorandom generator, then {G} is not ${(t + O(2^{l}\cdot t_g), \epsilon/(l \cdot 2^l))}$ secure. If we assume that there is an algorithm {D} of complexity {t} such that

$\displaystyle \left\lvert\mathop{\mathbb P}_{x \sim \{ 0,1 \}^n}[D(\overline{G}(x)) = 1] - \mathop{\mathbb P}_{r \sim \{ 0,1 \}^{2^l\cdot n}}[D(r) = 1]\right\rvert ~>~ \epsilon$

then we get that there is an {i} such that ${\left\lvert\mathop{\mathbb P}_{z \sim H_i}[D(z) = 1] - \mathop{\mathbb P}_{z \sim H_{i+1}}[D(z) = 1]\right\rvert ~>~ \epsilon/l}$ .

We now consider again the difference between ${H_i}$ and ${H_{i+1}}$ . In ${H_i}$ the ${2^i \cdot n}$ bits which are the inputs to the nodes at depth {i} are replaced by random bits. These are then used to generate ${2^{i+1}\cdot n}$ bits which serve as inputs to nodes at depth {i+1} . In ${H_{i+1}}$ , the inputs to nodes at depth {i+1} are random.

Let ${\overline{G}_{i+1}:\{ 0,1 \}^{2^{i+1}\cdot n} \rightarrow \{ 0,1 \}^{2^l \cdot n}}$ denote the function which given ${2^{i+1} \cdot n}$ bits, treats them as inputs to the nodes at depth {i+1} and evaluates the output at the leaves in the tree for ${\overline{G}}$ . If ${r_1, \ldots, r_{2^i} \sim \{ 0,1 \}^{2n}}$ , then ${\overline{G}_{i+1}(r_1, \ldots, r_{2^i})}$ is distributed as ${H_{i+1}}$ . Also, if ${x_1, \ldots, x_{2^i} \sim \{ 0,1 \}^{n}}$ , then ${\overline{G}_{i+1}(G(x_1), \ldots, G(x_{2^i}))}$ is distributed as ${H_{i}}$ .

Hence, {D} can be used to create a distinguisher {D'} which distinguishes {G} evaluated on ${2^i}$ independent seeds, from ${2^i}$ random strings of length {2n} . In particular, for ${z \in \{ 0,1 \}^{2^{i+1}\cdot n}}$ , we take ${D'(z) = D(G_{i+1}(z))}$ . This gives

$\displaystyle \left\lvert \mathop{\mathbb P}_{x_1, \ldots, x_{2^i}}[D'(G(x_1), \ldots, G(x_{2^i})) = 1] - \mathop{\mathbb P}_{r_1, \ldots, r_{2^i}}[D'(r_1, \ldots, r_{2^i}) = 1] \right\rvert\ > \epsilon/l$

Hence, {D'} distinguishes ${G^{2^i}(x_1, \ldots, x_{2^i}) = (G(x_1), \ldots, G(x_{2^i}))}$ from a random string. Also, {G'} has complexity ${t + O(2^l \cdot t_g)}$ . However, by Lemma 1, if ${G^{2^i}}$ is not ${(t + O(2^l \cdot t_g),\epsilon/l)}$ secure then {G} is not ${(t + O(2^l \cdot t_g + 2^i \cdot n), \epsilon/(l\cdot 2^i))}$ secure. Since ${i \leq l}$ , this completes the proof. $\Box$

2.2. Proving the security of the GGM construction

Recall that the GGM function is defined as

$\displaystyle F_K (x) := G_{x_n} (G_{x_{n-1}} ( \cdots G_{x_2} ( G_{x_1} (K) ) \cdots )) \ \ \ \ \ (2)$

We will prove that

Theorem 3 If ${G: \{ 0,1 \}^n \rightarrow \{ 0,1 \}^{2n}}$ is a ${(t,\epsilon)}$ pseudorandom generator and is computable in time ${t_g}$ , then is a ${(t/O(nt_g), \epsilon \cdot n \cdot t)}$ secure pseudorandom function.

Proof: As before, we assume that {F} is not a ${(t,\epsilon)}$ secure pseudorandom function, and will show that this implies {G} is not a ${(t\cdot O(nt_g), \epsilon/(n\cdot t))}$ pseudorandom generator. The assumption that {F} is not ${(t,\epsilon)}$ secure, gives that there is an algorithm {A} of complexity at most {t} which distinguishes ${F_K}$ on a random seed {K} from a random function {R} , i.e.

$\displaystyle \left\lvert \mathop{\mathbb P}_{K}\left[A^{F_K(\cdot)} = 1\right] - \mathop{\mathbb P}_R\left[A^{R(\cdot)} = 1\right]\right\rvert > \epsilon$

We consider hybrids ${H_0, \ldots, H_n}$ as in the proof of Theorem 2. ${H_0}$ is the distribution of ${F_K}$ for ${K \sim \{ 0,1 \}^n}$ and ${H_n}$ is the uniform distribution over all functions from ${\{ 0,1 \}^n}$ to ${\{ 0,1 \}^n}$ . As before, there exists {i} such that

$\displaystyle \left\lvert \mathop{\mathbb P}_{h \sim H_i}\left[A^{h(\cdot)} = 1\right] - \mathop{\mathbb P}_{h \sim H_{i+1}}\left[A^{h(\cdot)} = 1\right]\right\rvert > \epsilon/n$

However, now we can no longer use {A} to construct a distinguisher for ${G^{2^i}}$ as in Theorem 2 since {i} may now be as large as {n} . The important observation is that since {A} has complexity {t} , it can make at most {t} queries to the function it is given as an oracle. Since the (at most) {t} queries made by {A} will be paths in the tree from the root to the leaves, they can contain at most {t} nodes at depth {i+1} . Hence, to simulate the behavior of {A} , we only need to generate the value of a function distributed according to ${H_i}$ or ${H_{i+1}}$ at {t} inputs.

We will use this to contruct an algorithm {D} which distinguishes the output of ${G^{t}}$ on {t} independent seeds from {t} random strings of length {2n} . {D} takes as input a string of length {2tn} , which we treat as {t} pairs ${(z_{1,0},z_{1,1}), \ldots, (z_{t,0}, z_{t,1})}$ with each ${z_{i,j}}$ being of length {n} . When queried on an input ${x \in \{ 0,1 \}^n}$ , {D} will pick a pair ${(z_{k,0},z_{k,1})}$ according to the first {i} bits of {x} (i.e. choose the randomness for the node at depth {i} which lies on the path), and then choose ${z_{k,x_{i+1}}}$ . In particular, ${D((z_{1,0},z_{1,1}), \ldots, (z_{t,0}, z_{t,1}))}$ works as below:

Start with counter .
Simulate . When given a query
- Check if a pair ${P(x_1, \ldots, x_i)}$ has already been chosen from the first pairs.
- If not, set ${P(x_1, \ldots, x_{i+1}) = k+1}$ and set .
- Answer the query made by as ${G_{x_n}( \cdots G_{i+2}(z_{P(x_1, \ldots, x_{i+1}),x_{i+1}}) \cdots )}$ .
Return the final output given by .

Then, if all pairs are random strings ${r_1, \ldots, r_t}$ of length {2n} , the answers received by {A} are as given by a oracle function distributed according to ${H_{i+1}}$ . Hence,

$\displaystyle \mathop{\mathbb P}_{r_1,\ldots,r_t}[D(r_1, \ldots, r_t) = 1] = \mathop{\mathbb P}_{h \sim H_{i+1}}\left[A^{h(\cdot)} = 1\right]$

Similarly, if the {t} pairs are outputs of the pseudorandom generator {G} on independent seeds ${x_{1}, \ldots, x_{t} \in \{ 0,1 \}^n}$ , then the view of {A} is the same as in the case with an oracle function distributed according to ${H_i}$ . This gives

$\displaystyle \mathop{\mathbb P}_{x_1,\ldots,x_t}[D(G(x_1), \ldots, G(x_t)) = 1] = \mathop{\mathbb P}_{h \sim H_i}\left[A^{h(\cdot)} = 1\right]$

Hence, {D} distinguishes the output of ${G^t}$ from a random string with probability ${\epsilon/n}$ . Also, it runs in time ${O(t \cdot n \cdot t_g)}$ . Then Lemma 1 gives that {G} is not ${(O(t \cdot n\cdot t_g), \epsilon/(n \cdot t))}$ secure. $\Box$

Latest Images

Trending Articles

Latest Images