Acoustic cryptanalysis
On nosy people and noisy
machines
[preliminary
proof-of-concept presentation]
Adi Shamir Eran Tromer
Introduction and FAQ
One of the methods
for extracting information from supposedly secure systems is
side-channel attacks:
cryptanalytic techniques that rely on
information unintentionally leaked by computing devices. Most
side-channel attack research has focused on electromagnetic emanations
(TEMPEST), power
consumption and,
recently, diffuse
visible light from CRT displays. The oldest eavesdropping channel,
namely acoustic emanations, has received little attention. Our
preliminary analysis of acoustic emanations from personal computers
shows
them to be a
surprisingly rich source of information on CPU activity.
Q1: What
information is leaked?
This depends on the specific computer hardware. We have tested
several desktop and laptop computers, and in all cases it was
possible to distinguish an idle CPU (i.e., 80x86 "HLT" state) from a
busy CPU. For some computers, it was also possible to distinguish
various patterns of CPU operations and memory access. This can be
observed for artificial cases (e.g., loops of various CPU
instructions), and also for real-life cases (e.g., RSA decryption). The
time resolution is usually on the order of milliseconds. In some
context, such information can be used to reveal secret keys; see the
next question.
Q2: How can a low-frequency
(KHz) acoustic source yield information on a much faster (GHz) CPU?
In two ways. First, when the CPU is carrying out a long
operation, it may create a characteristic acoustic spectral signature: for
example, below we show how RSA signature/decryption sounds different
for different secret keys. Second, we get temporal information about
the length of each operation, and this can be used to mount timing attacks (see Q10),
especially when the attacker can affect the input to the
operation (i.e., in chosen-ciphertext attack scenario).
Q3: Won't the
attack be foiled by loud fan noise, or by multitasking, or by several
computers in the same room?
Probably not. The interesting acoustic signals are mostly above
10KHz, whereas typical computer fan noise and normal room noise are
concentrated at lower frequencies and can thus be filtered out by
suitable equipment. In a task-switching systems, different tasks can be
distinguished by their different acoustic spectral signatures. When
several computers are present, they can be told apart by their
different acoustic signatures, since these vary with the hardware, the
component temperatures, and other environmental conditions.
Q4: What
countermeasures are available?
One obvious countermeasure is to use sound dampening
equipment, such as "sound-proof" boxes, that is designed to
sufficiently attenuate all relevant frequencies. Conversely, a
sufficiently strong wide-band noise source
can mask the informative
signals, though ergonomic concerns may render this unattractive.
Careful circuit design and high-quality electronic components can
probably reduce the emanations. Alternatively, one can employ known
algorithmic techniques to reduce the usefulness of the emanations to
attacker. These techniques ensure the rough-scale behavior of the
algorithm is independent of the inputs it receives; they usually carry
some performance penalty, but are often already used to thwart other
side-channel attacks.
Q5: What
about other acoustic attacks?
Eavesdropping on keyboard keystrokes has been often discussed;
keys can be distinguished by timing, or (as recently proposed by Asonov
and Agrawal) by their different sounds. While this attack is
applicable to data that is entered manually (e.g., passwords), it is
not applicable to larger secret data such as RSA keys. Another acoustic
source is hard disk head seeks; this source does not appear very useful
in the presence of caching, delayed writes and multitasking. Preceding
modern computers, one may recall MI5's "ENGULF" technique (recounted in
Peter Wright's book Spycatcher),
whereby a
phone tap was used to eavesdrop on the operation of an Egyptian
embassy's Hagelin cipher machine, thereby recovering its secret key.
Q6: Why
bother with acoustic attacks, when TEMPEST and power-analysis attacks
are available?
Side-channel attacks based on electromagnetic emanations are
indeed very powerful and widely discussed. For precisely this reason,
secure facilities take measures to protect against these, such as
Faraday cages and isolated power supplies. However, these measures may
be transparent to acoustic radiations -- consider a Faraday cage
constructed of metallic mesh. Also, digital audio recording equipment
is
ubiquitous, and this creates new attack scenarios: for example, a
compromised laptop carried into a secure computer room may
record valuable acoustic information without its owner's knowledge.
Another scenario is a program recording the computer on which it runs
in order to learn information on other running programs, thereby
breaching sandbox security boundaries or compromising NGSCB-like
systems. Finally, known eavesdropping techniques, such as detecting
window vibration by its effect on reflected laser beams, could allow
additional attack scenarios.
Q8: What's
so special about the "HLT" instruction, and why is it useful to detect
it?
The CPU instruction that is easiest to detect acoustically,
though by now means the only one detectable, is the 80x86 "HLT
instruction. This instruction puts the CPU into a special low-power
sleep state that lasts until the next hardware interrupt. On modern
CPUs this temporarily shuts down many of the on-chip circuits, which
dramatically lowers power consumption and alters acoustic emissions for
relatively long time. Experimentally, the difference between active
computation (which normally never involves HLT instructions) and an
idle CPU (where the kernel executes HLT instructions in its idle loop)
is usually very prominent. If the only program running is a
cryptographic application, then this already suffices to detect when
the program awakens to handle input and when it finishes its
cryptographic tasks, and this information can be used to mount timing
attacks as discussed above. Of course, additional subtler acoustic cues
will yield further information.
Q9: What's
so special about cryptographic operations?
Our experiments suggest that in most computers, each type
of operation has an acoustic signature -- a characteristic sound. This
applies to any operation, cryptographic or otherwise. We focus on
cryptographic operations because these are designed and trusted to
protect information, and thus information leakgage from within them can
be critical. For example, recovering a single decryption key can
compromise the secrecy of all messages sent over the corresponding
communication channel.
Q10: How do timing attacks work?
Timing attacks are one of the classes of attacks that
take advantage of auxiliary side-channel information. They exploit the
fact that many computational operations vary in time depending on the
inputs to the operation, and thus by measuring the running time of the
operation we learn something about its inputs. For example, consider
the RSA cryptosystem. In this system, decryption of a ciphertext c is done by treating c as a large number and raising it to the d-th power, where d is the secret key. The simplest (though inefficient) algorithm for computing this exponentiation is to multiply c by itself d times; this takes time proportional to d, so by measuring this time we get an estimate of d. The
algorithms used in practice ("square and multiply" and its variants)
are much more efficient, but exhibit similar properties unless
carefully designed to thwart such attacks. By combining many
measurements that correspond to different properties of the key, the
possibilities can be narrowed down until the key is fully recovered.
This type of timing attacks was introduced by Kocher and demonstrated in practical settings by Boneh and Brumley.
Experimental setup
Below are several short samples, given in the form of a spectrogram and
a WAV file. The spectrograms are snapshots from the Baudline
signal analysis software running on GNU/Linux; horizontal axis is
frequency (0Hz to 48KHz), vertical axis is time, and intensity is
determined by
power per frequency window (the greener the stronger). All recordings
were equalized (roughly -10dB below 1KHz, +10dB above 10KHz) using the
mixer's rudimentary built-in equalizer.
The recordings below were made using low-end equipment: a Røde NT3
condenser microphone (US$170), an Alto
S-6 mixer (US$55) serving as an amplifier and rudimentary
equalizer, and a Creative Labs
Audigy 2 sound card (US$70) for recording into a separate computer.
The recordings below were made under nearly ideal conditions: the
microphone was placed 20cm from the recorded computer, the PC case was
opened and noisy fans were disconnected (where applicable).
Comparable
results were achieved under more realistic conditions
(i.e., the subject computer is intact and placed 1m to 2m from the
microphone) using more
expensive audio equipment. For example, a high-quality analog equalizer
can be used to attenuate strong low-frequency fan hums and background
noise, allowing further amplification of interesting signals before
analog-to-digital quantization.
Except where noted otherwise, the computer being recorded is a no-brand
box using a PC Chips
M754LMR motherboard, an Intel Celeron 666MHz CPU and an Astec
ATX200-3516 power supply. This computer was chosen for its particularly
striking acoustic emanations, but is by no means a special case: every
computer we tested showed significant correlation between acoustic
spectrum and CPU activities, and in about half the cases the effect
could be heard by naked ear when using appropriate CPU activity
patterns.
The sound of GnuPG RSA signatures
The following is a recording of GnuPG
1.2.4 signing a short message using a random precomputed 4096-bit
RSA key. The signature is repeated twice, each time
preceded by a sleep state (HLT instruction), manifesting as wideband
noise. GnuPG
uses CRT-based exponentiation for signing, and this is visible in the
spectrogram: the duration of each signature is partitioned into two
similar but distinct stages, corresponding to exponentiation modulo p and modulo q.
How can we be sure that we're picking up a real acoustic signal,
and not just electromagnetic emanations with the microphone or its
cable acting as antenna? For one, an audible difference can be
heard by an attentive but unassisted human listener. For more
conclusive evidence, here is the above experiment repeated except that
this time the microphone is muffled by placing a non-conductive
folded handkerchief in front of it:
If we turn off the microphone (using its built-in switch) but leave it
connected to an running amplifier, the signal is all gone:
Sound signatures of signatures
The following records GnuPG 1.2.4 signing a fixed message using several
different 4096-bit RSA keys generated beforehand. Each signature is
preceded by a short sleep (HLT state). An X-curve equalization is
applied to attenuate low frequencies. You can clearly see that each
signature (and in fact, each modulus p
or q) has a unique spectral
signature.
Loops of CPU operations
We next turn to a more controlled experiment, trying to distinguish
between characteristic spectra of different CPU operations. We wrote a
simple program that executes (partially unrolled) loops containing one
of the following x86 instructions: HLT, MUL, FMUL, memory access
missing the L1 and L2 caches, and REP NOP. Below we execute each such
homogeneous loop, and then execute them a second time. X-curve
equalization is applied.
Here is the same experiment (apart
from a difference in time scale), carried out on an IBM ThinkPad T21
running on batteries. Notably, its acoustic emanations are different
(and less informative) when running on AC power supply.
Source of acoustic emanations
The PC Chips
M754LMR motherboard has a bank of 1500µF capacitors near the
CPU and power connector. Here is the effect of applying a generous dose
of Quik-Freeze spray (non-conductive, non-flammable, "will freeze small
areas to -48°C") to these capacitors while the CPU is executing a
loop of MUL instructions:
This
concludes the preliminary proof-of-concept presentation.
Questions and suggestions are very welcome.
We are indebted to Pankaj Rohatgi for inspiring this research, to Nir
Yaniv for use of the Nir
Space Station recording studio and for valuable advice, and to
Oded Smikt for his help with the experimental setup. Erik Olson's Baudline signal analysis software
was instrumental to this research.