Can Amazon Have A Feature-Rich Site And Protect Sensitive Information?How much information should a company retain about its customers? I started wondering about this question when I wrote Translucent Databases last year to explore the many ways that someone could build a database that kept no personal information yet still performed useful work. The technology is easy to implement, but the social matrix where the technology must live is hard to understand.
Over the last few weeks, I've been discussing the usefulness of the approach with Jon Udell. He sees the technical charm, but wonders whether anyone would ever want to use it. So as a challenge, I offered to show how Amazon could use the techniques to offer almost all of their services without retaining any personal information about their customers. If someone breaks into their computers or someone on staff decides to get nosy, they can't find a customer's browsing habits, credit card number, address or anything. Yet, Amazon could still tune their offerings to you and suggest items related to your previous purchases.
Although it sounds a paradox for a database to contain no useful information and give valuable responses, the systems can be built. The right amount of encryption can scramble data without retaining it. One-way functions like SHA can turn names, social security numbers, and medical records into unintelligible 160-bit numbers. These 160-bit numbers can be used as surrogates for names, credit card numbers and what not, but they can't be reversed because, behold the magic of definition, the functions like SHA only work in one-way. They can't be inverted.
Inscrutable SurrogatesThe best way to continue is with an example. SHA is a cryptographically secure hash function. That means no one has publically described an efficient way to take SHA(x) and figure out a value of x that produced it. For instance, SHA("firstname.lastname@example.org/swordfish") produces the hexadecimal string 0185fd1f5137ec04a564fdd8ef043e12fd643511. If you start with 0185fd1f5137ec04a564fdd8ef043e12fd643511, though, it is practically impossible to find any string that generates the number, much less find "email@example.com/swordfish".
The reason I computed that string is that they're my email address (I can't possibly get more spam) and a password. Mixing them together with SHA produces a bag of 160 bits that can act as a surrogate for the email address and password. Anyone who knows the name and address can compute them, but someone who begins with the value 0185fd1f5137ec04a564fdd8ef043e12fd643511 can't go backwards. There's no public method for reversing them and it seems unlikely that there's any method at all.
These surrogates can replace your email address and password in the Amazon system. Every time the machines at Amazon store information about the items you examined or purchased, they can store them in their huge databases under the surrogate 0185fd1f5137ec04a564fdd8ef043e12fd643511.
When you return, you log into the system again with your email address and password. Amazon passes them through SHA, looks the result up in their database and starts tuning the experience again. They can look up what 0185fd1f5137ec04a564fdd8ef043e12fd643511 bought on the last trip and start posting similar items on the screen as temptation.
It should be obvious that a malicious hacker or a curious insider at Amazon can't find out what someone bought or even browsed for in the past. If someone wants to browse through the databases, all they find is inscrutable numbers like 0185fd1f5137ec04a564fdd8ef043e12fd643511.
Offering ServicesHow many services offered by Amazon can be converted to use this system? A surprising amount. Without any real knowledge of the Amazon system beyond what I've experienced as a customer, I can divide their databases into these categories:
Some argue that the success of Amazon shows that many people don't care about privacy. This assumption may be a bit of a leap because it could be that many people weigh the benefits of Amazon against the costs of giving up some of their privacy. Free shipping and a great selection outweigh having someone track what you read. The approach in this note shows that the company could give people most of the benefits and protect their personal information.
The techniques can also fight crime and terrorism, a big problem today. One man watched his credit card number get stolen and used to send a night-vision scope to the Middle East. Insiders steal personal information all of the time. Others plunder databases filled with email and sell the lists to spammers. While I'm sure the majority of folks at Amazon are clean cut citizens, there may be someone who isn't. A translucent database can stop the insiders with malice aforethought.
MoreThis note is only meant to address a challenge that Jon Udell made to help determine the utility of translucency. He seemed skeptical that the ideas would find widespread use. I hope this note will at least make it clear that a fully-functional, highly customized store like Amazon can be built with a few very simple techniques. Amazon can have its cake and eat most of it too.
Of course, just because an idea is simple and stops terrorism (among other things), doesn't mean that it can or will be widely adopted. I think the resistence in ourselves is deeply buried, perhaps even below our logical layer. Many people still feel a packrat's instinct with data. They feel that this information should be kept around, just in case. This is a natural human wish, but it should also be balanced by the just as natural aversion to responsibility. Most businesses don't have to pay the price if a customer's identity gets stolen, their credit cards get cloned, or their bank account is raided. This may change as more people and businesses become aware of the danger of misused information and the responsibility to protect it.
If anyone has thoughts about the advantages and limitations of the approach taken here, please write. --- Peter Wayner, firstname.lastname@example.org