A lot has been written about procrastination, perhaps as a way for the author to distract themselves from what they really want to write about. So I’ll be brief.

For me it really comes down to a mismatch between feeling productive and actually being productive. I want to write, but first I want to make sure I have the perfect environment for writing: perfect text editor, perfect-looking website, perfect publish process, and so on.

I do enjoy each of those activities and they have some value, but the amount of time I’m willing to spend doing them is insanely disproportionate with the benefit they bring. But doing them makes me feel productive, so I keep at it. In an attempt to remove obstacles from writing, I end up creating obstacles in my mind.

I decided to try out blogging again. This time I’ll try not to spend too much time researching static site generators and reorganizing files. At least not after the initial setup.

I’m going to try Hugo instead of Jekyll, which I used on the old blog. The reason is that Hugo is the new hotness. I could say that it’s because it supports org-mode natively, but I’m actually going with ox-hugo (exporting to Markdown) because it’s apparently more powerful.

The getting started guide on Hugo’s page is simple enough. Unfortunately the theme it recommends seems to strip out all markup, which led me to some confusion: I thought something was wrong with my system, but it’s the theme that’s wrong. I know this because I tried Minimal (which turns out not to be that minimal) and the markup was all there.

I guess I’ll eventually have to write my own theme (or more realistically, tweak a nice-looking, functioning theme). This initial experience suggests that it’ll be a bumpy road.

I could never get into poetry. Whenever I read a poem I wind up only paying attention to form and style and miss whatever the poem is trying to say. I’m sure there is a way to learn how to read poems, but I’ve never managed to find it.

This of course leaves me out of a lot of literature I’d like to know better, like much of Ancient Greece, The Canterbury Tales, Os Lusíadas, and Shakespeare plays (excluding the ones I had the opportunity to see performed). The closest I can generally get to those works is by reading summaries on Wikipedia.

So I was excited when I heard that John Dolan, who I’d known for his straightforward and insightful analysis of war (usually under his nickname The War Nerd), was writing a prose translation of the Iliad. After a few years of waiting I finally had the chance to read it, and I have a few thoughts.

(Disclaimer: prior to reading this I knew very little about the Iliad, Homer, or indeed Greek mythology in general, besides references in modern popular culture. So a lot of what I say might sound obvious, outrageous or just wrong.)

I was familiar with the story of the Trojan War, mostly via the 2004 movie Troy. I remember a common criticism of the movie was that the gods don’t show up at all, whereas they are prevalent in the original text. I imagined the Iliad would have gods do things like God in the Old Testament: bringing a huge flood, causing ten plagues, or sending out a bear to kill children. But in fact Greek gods are not just occasionally doing things to people, they are the main drivers of action. And they are of course extremely human. Indeed they are just like humans, just a lot more powerful and better connected. As Dolan notes in the introduction, they’re more like the Sopranos than the Christian god.

As powerful as Hektor and Akilles are, they always need help from Zeus, Apollo, Athena or Thetis. And the gods are always meddling in human affairs, often for petty reasons that are nevertheless well-established in the text (often disliking a relative or not receiving enough sacrifices). This differentiates the use of gods here from the artifice of deus ex machina that was used in the tragedies of Classical Greece.

I’m convinced that prose it’s the best way to consume this work. If the Iliad really evolved over centuries as oral tradition, as modern scholars conjecture, and was not conceived as poetry by a single person, as Classical scholar assume, then a prose translation is the most true to its origins, given that people tell stories as prose these days.

Eroom’s law, named after Moore’s law spelled backwards, predicts that discovering drugs is becoming more difficult with time; more specifically, the average cost of getting a drug approved by the FDA doubles every 9 years.

If Moore’s law has fueled a large part of the technological revolution of the past 50 years, as well as speculation about what technology still has to bring us, then Eroom’s law reflects a lackluster performance in pharmacology, and predicts a grimmer future for it still.

Why does Eroom’s law hold? One easy explanation that jumps to mind is that of course, discovering things gets harder with time. But that might be incorrect; Moore’s law provides a striking counterexample for that. Fortunately, the paper on Eroom’s law provides a thorough discussion of the possible issues.

I recently listened to the Greater than Code episode with Felienne Hermans. A lot of interesting points are raised during the episode, but the one that really stuck with me was the question of how the way programming is perceived in society can drive people (in particular women) from it. Felienne says that the more people perceive skill as coming from innate ability, the more women are driven away from it.

This is of course related to the idea of fixed vs. growth mindset and the work of Carol Dweck. A person with fixed mindset believes that skills are innate, whereas a growth mindset postulates that people learn skills mainly through effort. I still haven’t read the research (or its criticism, for example Scott Alexander’s several long posts on the subject), but I want to offer my own view of it before I’m inevitably proven wrong by hard numbers.

I believe most people have a mixture of both mindsets, in a specific way: they believe in the growth mindset in general, but when it comes to concrete skills, fixed mindset sets in. The belief in growth mindset is exemplified by inspirational phrases and quotes such as “Genius is 10 percent inspiration and 90 percent transpiration”. These phrases come in particularly handy whenever people see others being a lot more skilled than they are at various tasks.

But where growth mindset serves as a coping mechanism, people think with a fixed mindset when choosing which skills to learn, or, more importantly, which skills not to learn. When I hear non-mathematically inclined people talking about their experience with mathematics, they most commonly say things like “I was never good at math” or “math wasn’t for me”. There’s never any mention of the fact that they at some point made an effort to learn the subject, and that at a later point they chose to stop making that effort.

Mathematics might be an extreme example of fixed mindset domination, but it really shows up whenever people decide to stop learning a skill. Math is just something everyone who went to school had an experience with, with many finding it unpalatable. The fact that the fixed mindset effectively excludes certain skills from people’s radar should be of more concern, especially if you’re (as I am) less interested in measuring how well people learn and more interested in how and why people choose different things to learn.

This ties us back to the discussion on Greater than Code. Regardless of how the effectiveness of teaching is influenced by the two mindsets, it seems clear that in order to teach people technical skills like programming, the growth mindset must be encouraged. It’s embarrassing that it’s never been this easy to start programming, and yet the programmer class is still very much a closed, homogeneous club. As Felienne notes, the way forward is removing the barriers that act as gatekeepers but aren’t really fundamental to programming. She talks a bit about the more algorithmic, CS-y aspects of it, which I agree are massively overrated; I’d also point to the emphasis on tools, which generally get a lot more attention than they deserve.

It’s no coincidence that these two aspects relate back to two other male-dominated fields, namely math and (real) engineering. Fortunately programming allows for a lot more variety than that, and I’m always excited to see what programs people come from neither of these backgrounds are writing. I particularly like the parallels Felienne draws between programming and writing (beyond the obvious ones); certainly we want everyone to be able to write, not just the ones who come from literary backgrounds.

I recently attended a lecture by Bjarne Stroustrup on the future of C++.
As someone who hasn’t written any C++ since the days of the C++98
standard, and so still sees the STL as the main feature of the language,
I was curious to see what has been developed since then. I knew that it
now has lambda expressions and a primitive form of type inference, but
not much beyond that. Also, as someone who was exposed to Stroustrup’s
book in college, I was curious to see him speaking.

To be honest, I was expecting a cranky, old, cult-like leader figure.
But now I don’t have that impression at all. While I still see some
hubris there — when asked about possible design failures, he didn’t
recognize anything major, besides focusing too much on multiple
inheritance —, he seemed to be just a guy who wants to provide the
highest possible level of low (or zero) cost abstractions.

As to the future of C++, I’m actually a bit optimistic. It seems that
the standards committee has recognized the competition (and here I’m
thinking of Rust) and added quite a few nice features to the language.
For instance, concepts (which seem to be similar to Rust traits), error
types and ranges. And it looks like ASIO is finally making it into the
language. It might not seem like much, but for a language that was
originally designed as an extension of C (and thus was more-or-less
doomed from the beginning) and has accumulated a lot of cruft over the
years, it’s understandable that things take a long time.

Stroustrup says that the next challenges for him are concurrency, and
complete type and resource safety. I’m excited to see the solutions the
committee will come up with, although I’ll probably have to wait a few
years for that. After the talk, someone revealed his lack of enthusiasm
about the new standards (which are supposed to arrive every 3 years now,
according to Stroustrup) by commenting that compiler vendors for
embedded devices are very slow at implementing them. Which is a bit of a
downer, especially considering C++’s stated goal of being the fastest
language around. But at least clang and gcc seem to be very eager to
implement the new changes, so there’s no good reason not to try them out
if you have access to these compilers.

A lot of cryptography has been developed based on assumptions that some problem
is computationally hard. For lattice-based cryptography, the hardness assumption
is on that some problem on lattices—usually the problem is finding the shortest
vector in the lattice (SVP).

There’s an obvious caveat in every computational assumption—what if it’s
wrong?—, but often there are good reasons to believe it, at least compared to
other assumptions. Under this lens, lattices look pretty good: there haven’t
been any decent advances in classical algorithms for SVP in decades, whereas one
cannot say the same about factoring or discrete log. (Of course, that might just
be because fewer people care about lattices now.) Perhaps more importantly,
nobody has succeeded at using quantum computers to improve the situation,
compared to a complete break for factoring and discrete log.

Why should anyone expect quantum computers to help? As usual in complexity,
there’s a lot of conjectures everywhere, but some are more plausible than
others, and we can use them to guide us. The first thing is that approximating
SVP (also known as GapSVP) within a polynomial factor is known not to be
NP-hard, unless NP=coNP. Hence it is a candidate for being in the strange class
of problems that are neither in P nor are NP-complete. Other interesting
candidates are graph isomorphism as well as our friends factoring and discrete
log.

Now if we combine the widely-believed conjecture that NP ≠ coNP with the also
widely-believed conjecture that NP ⊄ BQP, then there’s some hope that we can
find a quantum algorithm that solves GapSVP. (If those conjectures are false,
then an algorithm would be a huge breakthrough in complexity theory, which is
considered unlikely).

The hidden subgroup problem (HSP)

Another point is that SVP is connected to the hidden subgroup problem, which
has been the source of a few interesting positive results in quantum complexity
theory.

Let \(G\) be a finite group and \(H \leq G\). Assume we have oracle access to a
function \(f \colon G \to \{0,1\}^* \) which is promised to be constant on each
coset of \(H\) (we say that \(f\) *hides* \(H\)). HSP asks us to find \(H\).

HSP is perhaps better seen as a framework. Some problems for which quantum
computers seem to help can be seen as instances of it. For example, consider
Simon’s problem, where we’re given oracle access to a function \(f \colon
\{0,1\}^n \to \{0,1\}^n\) which we are promised to be either 1-to-1 or 2-to-1,
and we have to find out which is the case. Note that this is HSP over
\(\mathbb{Z}_2^n\).

It has been shown that if you only have a classical computer, you need an
exponential number of queries to \(f\), whereas with a quantum computer, you only
need a linear number of queries, plus a polynomial-time computation. In other
words, Simon’s problem is a case of a separation between classical and quantum
query complexity.

More famously, Shor’s algorithm for factoring is very similar to the algorithm
to solve Simon’s problem. This is because factoring can be seen to reduce to the
problem of period finding, where we are given oracle access to \(f \colon
\mathbb{Z} \to \mathbb{Z}\) and we have to find its period. Here we can also see
the problem as finding a hidden subgroup, with the caveat that since
\(\mathbb{Z}\) is not finite, the problem doesn’t quite fit in HSP. In any case,
this is another case where quantum computers since to help, although this
doesn’t prove a separation since we don’t know whether factoring is not in P.

As mentioned, the relevance of HSP for lattices is that GapSVP reduces to the
dihedral coset problem, which is essentially HSP over the dihedral group \(D_n\).
(I say “essentially” because the problem also asks that the solution be done by
the “standard method” for HSP: namely, sampling cosets.) Another interesting
reduction is from graph isomorphism to HSP over the symmetric group \(S_n\). As
noted, graph isomorphism is unlikely to be NP-hard, so it might be that these
specific instances of the HSP are not without reach, at least from the quantum
point of view.

LWE and quantum reductions

Nevertheless, just like in the classical case, there hasn’t been a lot of
progress in quantum algorithms for SVP. In light of that, some people have used
the quantum hardness of the problem as an assumption. This enables a reduction
from LWE to GapSVP.

The reduction, as shown
by Regev, is roughhly as
follows. Assume we have an oracle for the LWE problem
on with error distribution \(\mathcal{N}(0, n)\). The first point is that it is
sufficient to solve the discrete Gaussian sampling (DGS) problem: given a
lattice \(L\) and \(r > 0\), sample a point from \(D_{L,r}\), the Gaussian
distribution on \(r\), where each point is distributed according to a Gaussian
distribution with variance proportional to \(r\). Here, \(r\) is chosen large enough
that \(D_{L,r}\) behaves somewhat like a continuous Gaussian distribution, but not
too large.

The algorithm proceeds iteratively. We start with a set of \(\text{poly}(n)\)
samples from \(D_{L,R}\), where \(R\) is large (say \(R = r 2^n\)), and end with a set
of \(\text{poly}(n)\) samples from \(D_{L,r}\). Because \(R\) is very large, it is
possible to sample from it efficiently.

First the algorithm uses its \(\text{poly}(n)\) samples to produce a sample from
\(D_{L,R’}\), where \(R’ < R/2\). This is done by using the LWE oracle and a quantum
algorithm. Repeating this enough times, we get \(\text{poly}(n)\) samples from
\(D_{L,R’}\). Now we can repeat this procedure \(n\) times to get \(\text{poly}(n)\)
samples from \(D_{L,R/(2^n)} = D_{L,r}\).

In a bit more detail, each iteration step is as follows. Assume we have the
samples from \(D_{L,R}\). We use the LWE oracle to solve a CVP instance in the
dual lattice \(L^*\). Then, we prepare a quantum state representing the
(classical) Fourier transform of the lattice. This can be done by using the CVP
solution. Finally, we perform a quantum Fourier transform to get a state
corresponding to \(D_{L,R’}\).

It is interesting that there’s only a small (although repeated many times)
quantum step in the reduction, which yet seems difficult to remove. As Regev
notes, giving an oracle for the CVP instance seems completely useless in the
classical case: in order to properly use the oracle, we’d have to use
information we already know. But with a quantum computer, the oracle enables us
to perform a reversible computation on a quantum superposition, which in this
case gives us the desired result.

It also makes me wonder whether one could get more interesting reductions by
delving more deeply into the quantum world. Of course classical reductions are
in general more convincing, but to me that just reflects the fact that we are a
lot more uncertain about quantum computers. But quantum computers aren’t
all-powerful, and it’d be interesting to see more stuff based on quantum
hardness.

I’ve been learning a bit about lattice-based cryptography. Specifically,
constructions based on Ring-LWE. These constructions have been getting a
bit of attention lately because they can be made quite efficient, and it
is hoped that they are post-quantum.

By “post-quantum”, cryptographers mean cryptosystems that would answer
the question “If someone built a large, universal quantum computer now,
what could we still use?” Note that RSA and Diffie-Hellman are out,
since factoring and discrete log are fast in a quantum computer, and the
same is true for elliptic cryptography. Note also that they do not
mean algorithms that will never be broken by quantum computers, as such
a result would be a major breakthrough in complexity theory; one still
uses assumptions. The assumption for lattice-based cryptography is that
the problem of finding the shortest vector in a lattice (SVP) is hard
even for quantum computers.

Ring-LWE is inspired by a problem in machine learning: given a list of
input-output pairs from an unknown function $f$, where some random noise is
added to the output, find $f$. Ring-LWE is essentially this problem in a ring.
It is assumed to be hard because there’s a reduction from SVP to it. Also, there
have been efficient implementations of cryptosystems based on the assumption
that it is hard, and it looks like there’s still room to improvement. So this
area seems to have the very nice qualities of being both interesting for
research and close to practical.

\(\Lambda \circ \lambda\)

I was particularly delighted when I saw this
paper detailing a framework for
doing Ring-LWE in Haskell. I have always been a fan of Haskell,
especially for math-y stuff, so I was very curious to know how it works,
and, more importantly, how I can work with it.

The first obstacle is that the (very well-written) paper explains how
the framework works, but not how one uses it. Which is fair: this is a
research paper, not documentation. But it also meant that getting
started with it would be a bit of a challenge. Especially as the actual
documentation, which I suppose would be enough for a sufficiently
experienced Haskell programmer, was also not enough for me.

My main issues seem to derive from the fact that the framework does some
quite “advanced” use of the Haskell type system: GADTs, phantom types,
kind polymorphism. These might very well be standard now—I haven’t
followed the evolution of Haskell over the past few years. Back when I
was learning the language, the main obstacle was understanding how
monads work, and after that most code seemed more or less
straightforward. But using the
API requires being versed in these more advanced features.
Of course, there is a good reason why they were used. After a quick look
at the documentation, one will note that even things like integers are
often supposed to be used as types. The purpose here is that checking
things like whether an integer divides another can then be done at
compile time. Which is nice, but also makes programming a bit more
difficult to learn (which of course is partly the point: if you don’t
know what you’re doing, your program won’t compile).

I suppose that’s one drawback of pandering to both researchers and
practitioners: I am interested in the practice part, but a lot more has
been written about the research. At any rate, I’m willing to help with
the practice part. So I have been writing a tutorial of sorts on how to
use $\Lambda \circ \lambda$
. I’ve been writing it mostly for myself, but I also want it to be
useful to others, so I put it up on my
GitHub page. It is pretty
bare-bones as of now, but check this space in the future.

An attack on TLS with AES-GCM was recently presented at BlackHat
(slides,
paper). The attack is relatively simple,
so I thought it’d be a good idea for a first post to explain it.

AES-GCM is a block cipher introduced in TLS 1.2, the latest TLS standard. It was
proposed after repeated discoveries of attacks on TLS. In light of that,
researchers argued that a particularly weak link is in combining encryption and
authentication. They thus set out to combine both of them in a single protocol;
such protocols are commonly known by the acronym AEAD, for Authenticated
Encryption with Associated Data. An AEAD encrypts only part of the data to be
sent; however, all of it is authenticated.

As an AEAD, AES-GCM became a popular choice. Currently it is the most common
one. It is not without weaknesses, however. During the standardization process,
Antoine
Joux
pointed out that
GCM is vulnerable to an attack if IVs are reused. This is the attack used in the
BlackHat paper.

GCM

GCM follows the so-called encrypt-then-MAC paradigm: encrypt the plaintext with
a block cipher, then use a MAC to authenticate the ciphertext along with the
associated data, then send both the ciphertext and the authentication tag.

Say the plaintext is divided into \(n\) blocks: \(P_1,\dots,P_n\), the associated
data is divided into \(m\) blocks: \(A_1,\dots,A_m\), and let \(K\) be the secret key.
GCM proceeds as follows:

compute \(J_i := J(\text{IV}, i)\) for \(i \in {0,\dots,n}\), where \(J\) is a
counter function specified by the protocol, and IV is a value chosen by the
application.

compute \(C_i = \text{Enc}_K(J_i) \oplus P_i\) for \(i \in {1,\dots,n}\), where
\(\text{Enc}_K\) is the encryption function for a symmetric-key algorithm
(here, AES), used with key \(K\).

let \(H=\text{Enc}(0^{128})\).

compute \(X_0=0\), \(X_i = G_H(X_{i−1} \oplus A_i)\) for \(i \in {1,\dots,m}\),
where \(G_H\) is defined as multiplication by \(H\) in the Galois field
\(\text{GF}(2^{128})\).

compute \(X_{i+m} = G_H(X_{i+m−1} \oplus C_i)\) for \(i \in {1,\dots,n}\).

compute the authentication tag \(T = G_H(X_{m+n} \oplus L) \oplus S\) where
\(L=\text{len}(A) | \text{len}(C)\) and \(S = \text{Enc}_K(J_0)\).

The attack

The attack is possible when an IV is reused, so that two authentication tags are
computed using the same value \(S\).

First, note that \(T\) is given by computing the following polynomial in
\(\text{GF}(2^{128})\):

Now assume that an IV is reused to compute two tags \(T_M,T_{M’}\) for two
messages \(M, M’\). We can write \(T_M = g_M(H), T_{M’} = g_{M’}(H)\) with \(g_m(X) =
f_m(X) + L_m X + S\), where \(f_m(X)\) is a polynomial on both the ciphertext \(c\)
and associated data \(a\) and \(L_m=\text{len}(a) | len(c)\). Note that the
attacker can see the ciphertext and associated data, so she knows
\(f_M,f_{M′},L_M,L_{M′}\). It does not know \(S\), but \(S\) does *not* change with the
message.

Note also that by computing \(g_M(X) + g_{M’}(X)\), we get a polynomial that
depends only on values known to the attacker (since the field has characteristic
2, \(S + S = 0\)). Now let \(g’(X) = g_M(X) + g_{M’}(X) + T_M +T_{M’}\). This
polynomial is also known to the attacker, and furthermore \(g’(H) = 0\). The
attacker can thus factor \(g’(X)\) and try to guess \(H\). In principle \(g’(X)\)
could have a large number of roots, since its degree is the number of blocks,
but in practice it is usually small.

Once \(H\) is known, it is easy to forge messages. This makes it possible to do a
man-in-the-middle attack on TLS. The attacker simply relays messages back and
forth, recording the IVs used. Once an IV is reused, she performs the
computation described above to learn \(H\).

The attacker can then redirect the client to a static page on the server. She
can now easily replace the server’s response by its own messages, as follows.
Since the contents of the page are known, she simply XORs the content with the
server’s response and XORs the resulting string with her own message. Due to the
way GCM is constructed, this will be a valid ciphertext. She then sends that
along with an authentication tag (which she can generate on her own, now that
she knows \(H\)) to the client.

The attack in the real world

As remarked by Joux, IVs should never be reused, so the fix is easy from the
theoretical point of view: just don’t reuse IVs! Nevertheless, the authors have
found a number of vulnerable devices. They note that most of their attempts to
contact the owners—some of which were pretty big companies, like VISA—were
unsuccessful. This feels frustrating to me—does VISA not have people working on
Internet security?—, but for them it’s probably just a matter of cost.