Order the Book
Table of Contents
FAQ about Translucent Databases
Case Studies from the book
Support for Professors who use the book in courses.
Download the Software
Other Books by Peter Wayner
Contact the author: p3@wayner.org
Errors and Corrections
Translucent Database Cover


A Supplementary Syllabus


If you're a professor teaching a database course, you may want to use Translucent Databases  as an additional textbook. You are welcome to consider this  one week module presents some of the most important concepts from Translucent Databases.  It consists ofthree parts that roughly correspond to the three hours spent in a classroom in a typical week.


Part I -- One-Way Functions
  • One-way functions are easy to compute but hard to reverse.
  • Some of the common ones are MD5, SHA, and raising a number to a power modulo a prime number. This section will just use generic one-way functions and call them h(x). There is no reason to do more with advanced mathematics.
  • Most common one-way functions are not truly impossible to reverse-- they're just practically impossible. Describe how hash functions like MD5 produce their answer. How long does it take to search for a collision? How long does it take to do brute force attack?
  • Show how to protect passwords using this approach. Anyone can look at the file and anyone can test a password presented as real. But no one can take the password database and work backwards to determine the password
  • Show how to protect credit cards. (Some systems leave the last four digits in the clear. Mention that this is a hint for how information is treated in Part III.)
  • Show how multiple people can use h(x) to look up information instead of just x. This can be used to synchronize schedules or protect personal information.
  • Show how to design a store database that stores h(name) instead of name.
  • Emphasize that the regular SQL database features still work with the fields of the database that aren't scrambled by h.



Part II -- Determining Reality

  • Digital signatures can use one-way functions. This section won't use the more sophisticated, traditional versions like RSA or Diffie-Hellman, although it could. It will only use simpler versions that are often called Message Authentication Codes. Describe how this is a weaker restriction. 
  • Someone can create a signature or MAC by computing h(password,document). Only someone with the right password can check the signature and see if it was generated by the document.
  • Show how fake entries in the database can disguise the real ones.
  • Only someone with the password can distinguish between the real and the fake.


Part III -- Blurring Reality with Quantization


  • Quantization is the act of taking a number from a big set and assigning it the closest value from a smaller subset.
  • Rounding off values is one form of quantization.
  • More sophisticated algorithms don't distribute the small set of surrogates evenly over the larger set.
  • Some basic algorithms block some fields if it makes it too easy to identify the human behind the record.
  • Other algorithms add random amounts to the data to disguise the true value.
  • Some encrypt this random amount so some users can get the real values.
  • Show how this can be applied to medical records used for research.
  • Show how this can help hide the position of ships.


Sample Homework Questions:


  • Write a program to try random values of x until MD5(x) ends with the sixteen bit value FF. How many random values should it take? Run your program. Do you come close? Repeat this 1000 times and report the average number of samples that must be tested before one is found. Now, extrapolate how long it will take for your computer to completely find an answer that matches a complete 160-bit result from MD-5.
  • Create a tool for protecting medical records in a trial. Determine which fields to scramble and which fields to leave in the clear. 
  • Describe some possible attacks against the scheduling algorithms described in Chapter 4.
  • Describe three ideal databases where one-way functions can prevent abuse. Describe several examples where the technique will fail.
  • Describe three ideal databases where false entries can distract attackers. Describe several cases where the fake entries will corrupt the database. Can this problem be avoided?
  • Describe three examples where blurring data with quantization can add enough confusion to block attackers. Can you think of examples where too much confusion also confounds the regular users? Are there examples where there's no middle ground?