Can Intelligence Agencies Read Overwritten Data?
Can Intelligence Agencies Read Overwritten Data? A response to Gutmann.A German translation here
Claims that intelligence agencies can read overwritten data on disk drives have been commonplace for many years now. The most commonly cited source of evidence for this supposed fact is a paper (Secure Deletion of Data from Magnetic and Solid-State Memory) by Peter Gutmann presented at a 1996 Usenix conference. I found this an extraordinary claim, and therefore deserving of extraordinary proof. Thanks to an afternoon at the Harvard School of Applied Science library I have had a chance to examine the paper ( http://www.usenix.org/publications/library/proceedings/sec96/full_papers/gutmann/index.html ) and many of the references contained therein.
Of course, modern operating systems can leave copies of " deleted" files scattered in unallocated sectors, temporary directories, swap files, remapped bad blocks, etc, but Gutmann believes that an overwritten sector can be recovered under examination by a sophisticated microscope and this claim has been accepted uncritically by numerous observers. I don't think these observers have followed up on the references in Gutmann's paper, however.
Gutmann explains that when a 1 bit is written over a zero bit, the "actual effect is closer to obtaining a .95 when a zero is overwritten with a one, and a 1.05 when a one is overwritten with a one". Given that, and a read head 20 times as sensitive as the one in a production disk drive, and also given the pattern of overwrite bits, one could recover the under-data.
The references Gutmann provides suggest that his piece is much overwrought. None of the references lead to examples of sensitive information being disclosed. Rather, they refer to experiments where STM microscopy was used to examine individual bits, and some evidence of previously written bits was found.
There is a large literature on the use of Magnetic Force Scanning Tunneling Microscopy (MFM or STM) to image bits recorded on magnetic media. The apparent point of this literature is not to retrieve overwritten data, but to test and improve the design of drive read/write heads. Two of the references (Rugar et al, Gomez et al) had pictures of overwritten bits, showing parts of the original data clearly visible in the micro-photograph. These were considered by the authors as examples of sub-optimal head design. The total number of bits seen was 6 in one photo and 8 in the other. Neither photo-micrograph was a total success, because in one case only transitions from one to zero were visible, and in the other case one of the transitions was ambiguous. Nevertheless, I accept that overwritten bits might be observable under certain circumstances.
So I can say that Gutmann doesn't cite anyone who claims to be reading the under-data in overwritten sectors, nor does he cite any articles suggesting that ordinary wipe-disk programs wouldn't be completely effective.
I should qualify that last paragraph a "bit". I was unable to locate a copy of the masters thesis with the tantalizing title "Detection of Digital Information from Erased Magnetic Disks" by Venugopal Veeravalli. However a brief visit to his web page shows that this was never published, he has never published on this or a related topic (his field is security of mobile communications) and his other work does not suggest familiarity with STM microscopes. So I am fairly sure he didn't design a machine to read under-data with an "unwrite" system call. In an email message to me Dr. Veeravalli said that his work was theoretical, and studied the possibility of using DC erase heads. [Since writing this paragraph the paper has been posted. It is indeed theoretical but has quantitative predictions about the possibility of recovering data with varying degrees of erasure. There isn't any suggestion that ordinary erase procedures would be inadequate].
Gutmann claims that "Intelligence organisations have a lot of expertise in recovering these palimpsestuous images." but there is no reference for that statement. There are 18 references in the paper, but none of the ones I was able to locate even referred to that possibility. Subsequent articles by diverse authors do make that claim, but only cite Gutmann, so they do not constitute additional evidence for his claim.
Gutmann mentions that after a simple setup of the MFM device, that bits start flowing within minutes. This may be true, but the bits he refers to are not from disk files, but pixels in the pictures of the disk surface. Charles Sobey has posted an informative paper "Recovering Unrecoverable Data" with some quantitative information on this point. He suggests that it would take more than a year to scan a single platter with recent MFM technology, and tens of terabytes of image data would have to be processed.
In one section of the paper Gutmann suggests overwriting with 4 passes of random data. That is apparently because he anticipates using pseudo-random data that would be known to the investigator. A single write is sufficient if the overwrite is truly random, even given an STM microscope with far greater powers than those in the references. In fact, data written to the disk prior to the data whose recovery is sought will interfere with recovery just as must as data written after - the STM microscope can't tell the order in which magnetic moments are created. It isn't like ink, where later applications are physically on top of earlier markings.
After posting this information to a local mailing list, I received a reply suggesting that the recovery of overwritten data was an industry, and that a search on Google for "recover overwritten data" would turn up a number of firms offering this service commercially. Indeed it does turn up many firms, but all but one are quite explicit that they can recover "overwritten files", which is quite a different matter. An overwritten file is one whose name has been overwritten, not its sectors. Likewise, partitioning, formatting, and "Ghosting" typically affect only a small portion of the physical disk, leaving plenty of potential for sector reads to reveal otherwise hidden data. There is no implication in the marketing material that these firms can read physically overwritten sectors. The one exception I found (Dataclinic in the UK) did not respond to an email enquiry, and they do not mention any STM facility on their web site.
A letter from an Australian homicide investigator confirms my view that even police agencies have no access to the technology Gutmann describes.
Of course it has been several years since Gutmann published. Perhaps microscopes have gotten better? Yes, but data densities have gotten higher too. A hour on the web this month looking at STM sites failed to come up with a single laboratory claiming it had an ability to read overwritten data.
Recently I was sent a fascinating piece by Wright, Kleiman and Sundhar (2008) who show actual data on the accuracy of recovered image data. While the images include some information about underlying bits, the error rate is so high that it is difficult to imagine any use for the result. While the occasional word might be recovered out of thousands, the vast majority of apparently recovered words would be spurious.
Another fact to ponder is the failure of anyone to read the "18 minute gap" Rosemary Woods created on the tape of Nixon discussing the Watergate break-in. In spite of the fact that the data density on an analog recorder of in the 1960s was approximately one million times less than current drive technology, and that audio recovery would not require a high degree of accuracy, not one phoneme has been recovered.
The requirements of military forces and intelligence agencies that disk drives with confidential information be destroyed rather than erased is sometimes offered as evidence that these agencies can read overwritten data. I expect the real explanation is far more prosaic. The technician tasked with discarding a hard drive may or may not have enough computer knowledge to know if running the command "urandom >/dev/sda2c1" has covered an entire disk with random data, or only one partition, nor is it easy to confirm that it was done. How would you confirm that the overwrite was not pseudo-random? Smashing the drive with a sledgehammer is easy to do, easy to confirm, and very hard to get wrong. The GPL'ed package DBAN is an apparent attempt to address this uncertainty without destroying hardware. Hardware appliances with similar aims include the Drive Erazer" and the Digital Shredder.
Surveying all the references, I conclude that Gutmann's claim belongs in the category of urban legend.
Or it may be in the category of marketing hype. I note that it is being used to sell a software package called "The Annililator".
Since writing the above, I have noticed a comment attributed to Gutmann conceding that overwritten sectors on "modern" (post 2003?) drives can not be read by the techniques outlined in the 1996 paper, but he does not withdraw the overwrought claims of the paper with respect to older drives.
An updated copy of this memo will be kept at http://www.nber.org/sys-admin/overwritten-data-gutmann.html. Additional information may be sent to feenberg at nber dot org.
"Magnetic force microscopy: General principles and application to longitudinal recording media", D.Rugar, H.Mamin, P.Guenther, S.Lambert, J.Stern, I.McFadyen, and T.Yogi, Journal of Applied Physics, Vol.68, No.3 (August 1990), p.1169.
"Magnetic Force Scanning Tunnelling Microscope Imaging of Overwritten Data", Romel Gomez, Amr Adly, Isaak Mayergoyz, Edward Burke, IEEE Trans.on Magnetics, Vol.28, No.5 (September 1992), p.3141.
Wright, C.; Kleiman, D, & Sundhar S. R. S.: (2008) "Overwriting Hard Drive Data: The Great Wiping Controversy". ICISS 2008: 243-257 http://portal.acm.org/citation.cfm?id=1496285 or http://www.vidarholen.net/~vidar/overwriting_hard_drive_data.pdf . See also a summary at http://sansforensics.wordpress.com/2009/01/15/overwriting-hard-drive-data/