Attack on GCM in TLS August 18, 2016

An attack on TLS with AES-GCM was recently presented at BlackHat (slides, paper). The attack is relatively simple, so I thought it’d be a good idea for a first post to explain it.

AES-GCM is a block cipher introduced in TLS 1.2, the latest TLS standard. It was proposed after repeated discoveries of attacks on TLS. In light of that, researchers argued that a particularly weak link is in combining encryption and authentication. They thus set out to combine both of them in a single protocol; such protocols are commonly known by the acronym AEAD, for Authenticated Encryption with Associated Data. An AEAD encrypts only part of the data to be sent; however, all of it is authenticated.

As an AEAD, AES-GCM became a popular choice. Currently it is the most common one. It is not without weaknesses, however. During the standardization process, Antoine Joux pointed out that GCM is vulnerable to an attack if IVs are reused. This is the attack used in the BlackHat paper.

GCM

GCM follows the so-called encrypt-then-MAC paradigm: encrypt the plaintext with a block cipher, then use a MAC to authenticate the ciphertext along with the associated data, then send both the ciphertext and the authentication tag.

Say the plaintext is divided into $n$ blocks: $P_1,\dots,P_n$, the associated data is divided into $m$ blocks: $A_1,\dots,A_m$, and let $K$ be the secret key. GCM proceeds as follows:

compute $J_i := J(\text{IV}, i)$ for $i \in {0,\dots,n}$, where $J$ is a counter function specified by the protocol, and IV is a value chosen by the application.
compute $C_i = \text{Enc}_K(J_i) \oplus P_i$ for $i \in {1,\dots,n}$, where $\text{Enc}_K$ is the encryption function for a symmetric-key algorithm (here, AES), used with key $K$.
let $H=\text{Enc}(0^{128})$.
compute $X_0=0$, $X_i = G_H(X_{i−1} \oplus A_i)$ for $i \in {1,\dots,m}$, where $G_H$ is defined as multiplication by $H$ in the Galois field $\text{GF}(2^{128})$.
compute $X_{i+m} = G_H(X_{i+m−1} \oplus C_i)$ for $i \in {1,\dots,n}$.
compute the authentication tag $T = G_H(X_{m+n} \oplus L) \oplus S$ where $L=\text{len}(A) | \text{len}(C)$ and $S = \text{Enc}_K(J_0)$.

The attack

The attack is possible when an IV is reused, so that two authentication tags are computed using the same value $S$.

First, note that $T$ is given by computing the following polynomial in $\text{GF}(2^{128})$:

$$g(X) = A_1 X^{m+n+1} + \dots + A_m X^{n+2} + C_1 X^{n+1} + \dots + C_n X^2 + L X + S$$

at the point $X = H$.

Now assume that an IV is reused to compute two tags $T_M,T_{M'}$ for two messages $M, M'$. We can write $T_M = g_M(H), T_{M'} = g_{M'}(H)$ with $g_m(X) = f_m(X) + L_m X + S$, where $f_m(X)$ is a polynomial on both the ciphertext $c$ and associated data $a$ and $L_m=\text{len}(a) | len(c)$. Note that the attacker can see the ciphertext and associated data, so she knows $f_M,f_{M′},L_M,L_{M′}$. It does not know $S$, but $S$ does *not* change with the message.

Note also that by computing $g_M(X) + g_{M'}(X)$, we get a polynomial that depends only on values known to the attacker (since the field has characteristic 2, $S + S = 0$). Now let $g'(X) = g_M(X) + g_{M'}(X) + T_M +T_{M'}$. This polynomial is also known to the attacker, and furthermore $g'(H) = 0$. The attacker can thus factor $g'(X)$ and try to guess $H$. In principle $g'(X)$ could have a large number of roots, since its degree is the number of blocks, but in practice it is usually small.

Once $H$ is known, it is easy to forge messages. This makes it possible to do a man-in-the-middle attack on TLS. The attacker simply relays messages back and forth, recording the IVs used. Once an IV is reused, she performs the computation described above to learn $H$.

The attacker can then redirect the client to a static page on the server. She can now easily replace the server’s response by its own messages, as follows. Since the contents of the page are known, she simply XORs the content with the server’s response and XORs the resulting string with her own message. Due to the way GCM is constructed, this will be a valid ciphertext. She then sends that along with an authentication tag (which she can generate on her own, now that she knows $H$) to the client.

The attack in the real world

As remarked by Joux, IVs should never be reused, so the fix is easy from the theoretical point of view: just don’t reuse IVs! Nevertheless, the authors have found a number of vulnerable devices. They note that most of their attempts to contact the owners—some of which were pretty big companies, like VISA—were unsuccessful. This feels frustrating to me—does VISA not have people working on Internet security?—, but for them it’s probably just a matter of cost.