Some Thoughts on Serial Numbers in Intel CPUs --------------------------------------------- Ronald L. Rivest MIT Laboratory for Computer Science 1/26/1999 (with slight revisions 8/23/99) Today's New York Times contains an article, ``Intel Alters Plan Said to Undermine PC Users' Privacy.'' (NYT, 1/26/1999, page 1) The article explains that EPIC and other groups are calling for a boycott of the new Intel CPU because each CPU will contain an unique serial number that can be read by any program, unless this feature is turned off by the user. The concern is that this feature might contribute to the loss of privacy by users, even as it contributes to electronic commerce and guards against software piracy. I must admit that I was a little surprised by this reaction to the Intel announcement, which was made at the annual RSA Data Security Conference in San Jose last week. It hadn't occurred to me that someone might see such a feature as a threat to privacy. It is worth noting that many computers on the Internet already have unique identifying numbers: the IP addresses used to route information to them. Each computer on the Internet is uniquely identified by its IP address. (Some computers have more than one IP address.) Furthermore, it is not hard for a typical application program to determine the IP address of the computer it is running on. Thus, a CPU serial number would not in these cases add anything new, since the computer is already uniquely identified by the IP address. However, many users that have dial-up connections to the Internet have IP addresses dynamically assigned by their Internet Service Provider (ISP), so that in these cases the IP address only uniquely identifies the user's computer temporarily. Nonetheless, the presence of a new identifying number is not something dramatically different than what exists today for many users. There are other ways in which a computer can be uniquely identified by software running on that computer. For example, there is normally a unique number on each board connecting a computer to the Ethernet. This could also be used as a unique identifying number of the computer. The Intel proposal would give every CPU a unique identifying serial number that could be easily read by a program in a standard way. While Intel asserts that this feature could be turned off by the user, they don't say how this would be implemented. For example, if the feature is under program control, then a program could turn on the feature, read the number, and then turn it off. On the other hand, if the feature is under manual control (e.g. a new switch on the keyboard), then how is the user to know that only the program that he wishes will be enabled to read the serial number? A modern computer can be running many processes at once, and there may be a corrupted process running in addition to the normal processes that could sample the serial number and save it away. Without further details from Intel, it is hard to see how they can make this feature controllable in a secure way. Probably they have some thoughts on how to do this. But a real concern is that the user will be forced to leave the serial number feature turned on, in order to be able to execute programs that he has purchased or downloaded off the Internet. If it becomes standard for a program to refuse to run unless the feature is turned on, then the user will eventually give up and leave the serial number feature always enabled. I think this is a not likely evolution of the state of affairs. How damaging to a user's privacy is the serial number feature? Well, one risk is that Internet applets could leak this number in their communications back to their home server. This is not in and of itself a privacy problem. The risk is that servers could get together and correlate (link) their information on users, using the CPU serial number as a common identifying tag. Website A would know that some user with CPU serial number 4136795 was browsing sites about some nasty disease, and Website B would find out that a user with name Mary Smith and credit-card number 41556792346601 was connecting from a computer with CPU serial number 4136795. Putting two and two together, they discover that Mary Smith is interested in some nasty disease. While this is a possible concern, I still find it a bit surprising that this sort of issue is raised in a country where credit cards are so prevalent, and where everyone's buying habits are minutely detailed and correlated by the credit card companies. I guess the concern may be one of control; people are happy to give up their privacy when using their credit cards, because they know that they could in principle not use the cards, whereas CPU serial numbers are bothersome because users may not have such discretionary control over their use. (This seems a bit weak as an argument, since there is no easy way to make purchases over the Internet except by using a credit card.) I don't really see the difference between the option (?) not to use a credit card number, and the option (?) to turn off the CPU serial number feature. And credit cards are perhaps a more insidious problem because they are already linked to your name, whereas the Intel CPU's would be sold without any record-keeping for anyone to know who has the CPU with which serial number. Nonetheless, the privacy issue, once raised, raises the question as to whether the benefits gained are worth the privacy risks, whatever you assess those to be, and whether there might not be better ways to achieve those benefits without incurring the risks. At the end of this paper, I sketch a proposal for replacing serial numbers with a functionality that may accomplish these goals. First, we must ask: what are the benefits of serial numbers on a CPU? To my mind, the benefits of a serial number scheme are that it might help fight the battle against software piracy, and that it might assist more generally in helping to protect intellectual property rights. Distributors of software and music might be able to (albeit weakly) guarantee that the software and music they distribute would be runnable or playable only on designated CPU's. A software program (e.g. Microsoft Office) would check that it was running on an authorized CPU, by checking the serial number before (and even during) execution. If not, it would halt execution. Similarly, a music player could check that the music that was downloaded was specifically intended to run on that CPU. If not, the music wouldn't play. Such schemes have been around for a long time. Some manufacturers provide "dongles" to attach to your PC that provide the PC with a unique serial number, where one was previously lacking, allowing software that checks for the dongle number to run only when the dongle is present. (The dongle has an advantage over the CPU serial number in that it can be moved to a new machine, when the user upgrades, whereas the same is not true of the CPU serial number.) It is well recognized that such simple schemes are often not hard to defeat, by spoofing the dongle-checking routine to believe that it has queried the dongle when it has not, or by modifying the software so that it no longer checks for the dongle. Similarly, it would be possible in principle to modify software that checks the CPU serial number so that it no longer checks for this number. (It would, however, presumably be hard to spoof the checking routine, since the CPU serial number is directly available by executing a certain machine instruction.) Extensions to the basic idea involve incorporating "essential functionality" into the dongle, rather than having it contain just a serial number. For example, the dongle could contain a key subroutine for the program. (But then this dongle is only usable for that one program.) In another variant, the dongle contains a secret key that can be used to decrypt portions of the code so that they can be executed by the PC. The dongle might even contain a CPU itself so that an encrypted subroutine could be loaded into the dongle and executed there. Steve Kent's Master's thesis [2] gives a discussion of some of these variants. I note that there is an issue of "key management" or "serial number management" involved in these schemes. That is, the user (the purchaser of the software) must somehow let the manufacturer (or distributor) know the serial number or secret key of the CPU or dongle, so that the manufacturer can prepare a version of the software that runs only on that CPU (or in the presence of that dongle). This is explicitly an "identification" procedure. The user needs to identify himself (or at least identify his CPU) so that the manufacturer can prepare the software. Thus, the user is clearly giving up his privacy in such a scenario. Is there some way in which you could get the benefits of protection against software piracy while not having such an explicit identification scenario as a necessary part of the process? I think it is fair to say that a manufacturer is only likely to be concerned about piracy when he is being paid for the software (or music or whatever) that he wishes to distribute. Who cares about piracy of free software? But this implies that schemes for software protection are always going to violate the user's privacy (or at least reveal his identity), unless an anonymous payment scheme is used to pay for the software. By paying for the software in the first place, the user has already given away who he is. While schemes for anonymous payment are certainly possible in principle, they have not caught on in practice. Perhaps it is best to assume that this is likely to remain true, at least for a while. On the other hand, even if one were to grant that one must reveal one's identity in order to purchase intellectual property like software (and this is not really a given, since some corporations purchase software en masse with a site license, for example), it would still be potentially bothersome to have a mechanism that is designed to prevent software piracy (for paid transactions) turn out to be usable to compromise further a user's privacy in other situations (e.g. for free transactions). The CPU serial number has the risk of being bothersome in exactly this way, since the CPU can't really tell if it is being queried in order to facilitate electronic commerce (by preventing piracy) or to facilitate snooping on individuals (by giving away an identifying tag on his free transactions). Here is a simple proposal for a variant scheme that might satisfy the desiderata for the current situation: it facilitates electronic commerce without providing unique identifiers. I'm sure that my crypto colleagues can invent many further elaborations of this simple idea; further improvements are certainly going to be possible. First of all, we eliminate the serial number from the CPU. There is no serial number, and so it can't be queried for, or used as an identifier for the user of the CPU. Second, we give each CPU a unique secret key Ki. These secret keys may be 128-bit AES (Advanced Encryption Standard) keys, for example. No two chips have the same key Ki. The keys might be randomly generated by Intel as it manufactures the CPUs. We trust Intel not to keep copies of these keys. (This is a soft spot in this design, which can presumably be addressed by having the chip generate Ki and store it in nonvolatile memory without revealing it, or by having a variation on the rest of the scheme somehow.) There is no way for a user of the CPU to determine Ki; it can't be "read out" like a serial number. Third, we give the CPU two new instructions: a "challenge" instruction and a "decrypt and compare" instruction. The "challenge" instruction causes the CPU to do a randomized encryption of a supplied challenge, and return the resulting ciphertext. The "decrypt and compare" instruction causes the chip to determine if two such ciphertexts could have been produced on the current CPU from the same challenge. Details in a moment. Note that the Intel proposal also proposes that the Intel Pentium III architecture will allow the chip to generate random numbers from thermal noise. Presumably there is a new instruction that causes the chip to return a register (or several registers) full of random bits. Generating random numbers is an essential requirement for our proposal here, so it is convenient that Intel has proposed this capability. The "challenge" instruction works as follows: the chip takes in a (say) 64-bit challenge c. It then generates a (say) 64-bit random number r, using the random number generation circuitry already announced. It then returns as the result of the challenge instruction the ciphertext: C(c,r) = AES(Ki,cr) That is, it returns the encryption using the AES algorithm, under control of the key Ki, of the plaintext consisting of the concatenation of the challenge c with the random value r. (The first 64-bit half of the plaintext is c, the second 64-bit half of the plaintext is r.) The resulting 128-bit ciphertext C(c,r) is returned by the chip in an appropriate register or set of registers. The AES algorithm (not yet chosen) takes in 128-bit plaintext values and returns 128-bit ciphertext values, under control of a 128 (or 192 or 256)-bit key. The "decrypt and compare" instruction takes in two values C1 and C2, and decrypts them using the chip's secret key Ki, to obtain (c1,r1) and (c2,r2), where C1 = C(c1,r1) and C2 = C(c2,r2). That is, C1 was produced (or could have been produced) by the challenge instruction on input challenge c1, and C2 has produced (or could have been produced) by the challenge instruction on input challenge c2. The chip returns "true" if c1 = c2, and returns "false" otherwise. Note again that the challenge instruction is randomized---it returns (with very high probability) a different result every time it is invoked, even if it is invoked with the same challenge. Thus, it is not usable as a way of producing a unique "serial number" for the chip. For example, the result of running the challenge instruction on input challenge "0" is always changing, so that it can't be used to identify the chip. I also note that although the scheme proposed here involves an encryption operation, it is not possible to use the chip to "get at" the underlying AES encryption and thus perform encryption efficiently. This is important if one must live (as Intel currently must) with the current set of (defective, in my mind) export control laws on encryption. Chips with this scheme on them could presumably be exported without difficulty. Now: how does a manufacturer use these instructions to provide software that can only be run on a particular CPU? The "serial number management" or "key management" process that we had before for dongles now becomes the following three-step process. First, the user runs a challenge instruction on some challenge on his CPU. The challenge might be supplied by the manufacturer, or chosenly randomly by the user. Second, the user then informs the manufacturer of the challenge c and the response C1 = C(c,r) that he obtained from the chip. Third, the manufacturer supplies the user with custom software that has embedded within it the ciphertext C1 and a test of the form: give the challenge c to the chip, apply the "challenge" instruction, and then use the "decrypt and compare" instruction to compare the result of the challenge instruction and C1. If the "decrypt and compare" instruction returns "true", then proceed to execute the software. Otherwise, abort. Software produced this way will only run on the CPU that produced the original response C1. This allows one to protect against software piracy, in that a manufacturer can produce software that runs on only one CPU. (The scheme extends easily to handle the case that the user owns multiple CPUs, by embedding multiple ciphertexts in the software, and seeing if any of them compare successfully.) Note that manufacturers can not get together off-line to compare what they know, since all they have are ciphertexts produced using unknown keys for plaintexts of which they only know half. There is no way to "link" together different results of the "challenge" instruction, without using the very same chip on which those results were produced. Of course, this scheme has the problem that a user can not upgrade his hardware easily; all of his purchased and protected software also needs to be upgraded to run on the new CPU. This finishes my description of the scheme. I suppose that every scheme needs a name, so why don't we call this the "C/DAC" scheme (challenge/decrypt and compare). This note needs substantial elaboration to include additional pointers to other relevant work (e.g. Canetti's work on randomized hash functions, other schemes for preventing software piracy, etc...). This note also needs the usual caveats that even a scheme like this is not so hard to defeat, since you can presumably modify the purchased software to remove the checks it makes, just as one could modify the software to remove checks on the serial numbers. But it is presumably somewhat worthwhile nonetheless to have a software piracy protection scheme that provides protection against naive but malicious users that would copy code if they could, but who don't have the skill needed to hack the code. (It is arguable whether this is a huge benefit, since one skilled hacker can then provide the code to all of his friends...) However, it is interesting to see that the benefits that seem to accrue to the serial number scheme can be obtained without providing a means for violating a user's privacy by facilitating the linking of various transactions using the CPU serial number as an identifier. (If there are other applications of the serial number scheme that can not be handled by the C/DAC scheme, I would be interested in hearing about them...) [1] ``Intel Alters Plan Said to Undermine PC Users' Privacy,'' by Jeri Clausing, New York Times, January 26, 1999. Page 1. [2] Kent, Stephen. ``Protecting Externally Supplied Software in Small Computers,'' MIT Laboratory for Computer Science Technical Report TR--255. 1981.