We found a ticking data time bomb. Broken bits can linger at a recently written value for seconds before corrupting. Certain writes initially appear successful, which could easily confuse a write verification algorithm.
Follow us below for the Flash Destroyer wrap up. We’ll also look at the live stream viewer statistics: 5.5viewer/years were spent watching flash destruction in the first week alone.
You can preorder the Flash Destroyer ‘I like to solder’ kit for $30, including worldwide shipping.
Exact final count
Since the final EEPROM write-verify cycle count was greater than 10million, the least significant digit (1s) was dropped from the display. The exact final count was a mystery until we read the saved value from the PIC. We used the USB bootloader to dump the PIC to a HEX file, then loaded the HEX in MPLAB.
The image shows the final saved count: 0x00af62b5, or 11,494,069 in decimal. 0x78 is a checksum byte the Flash Destroyer uses to verify that the saved count is valid.
We removed the EEPROM from the Flash Destroyer and used the Bus Pirate to read the contents of the chip. Much to our surprise, all the values were correct, none appear stuck. This was a bit disappointing as we expected to see a nice clean failure with one bit or byte permanently stuck. We used the Bus Pirate to write 0xaa to the whole chip and verify it manually, the write was completely successful.
It keeps working
To get a better understanding of the error, we modified the Flash Destroyer firmware to display the location of the verify error, the value it wrote, and the value it got back. We put the EEPROM back in the Flash Destroyer and started the write-verify process.
The first run reached about 17,000 write-verify cycles before encountering another error. An error occurred at byte 23: the Flash Destroyer wrote 0x55 (01010101) and read 0x45 (01000101). It looks like bit 4 (underlined) read 0 when it should have read 1. We dumped the chip with the Bus Pirate, but again the values were correct, 0x45 was nowhere to be found.
We tried six more times, each attempt had between 10 and 100,000 successful writes before an error was detected. Errors always occurred at bit 4 of byte 23 after writing 0x55, but the error wasn’t there when we dumped the chip with the Bus Pirate.
After about 200,000 additional writes we finally got a catastrophic failure. The Flash Destroyer detected an error on every write, logging 0 successful writes.
Despite the catastrophic failure, we still didn’t see any errors when we dumped the EEPROM with the Bus Pirate. The dump still showed all correct 0x55 values.
The EEPROM was full of 0x55, so we used the Bus Pirate to fill it with 0xaa, and then dumped it to verify. The first read showed the correct values, but further reads returned the output shown above.
Once again byte 23 bit 4 had a problem: we wrote 0xaa (10101010) and got 0xba (10111010) back instead. The bit stayed at 0 for a few seconds (or reads), which gave the correct value (0xaa). With time and/or wear it eventually flipped to 1 and we read 0xba instead. The error is consistent now, it seems to happen every time we test it.
The ‘conventional wisdom’ is usually that solid state storage will stick at the last successfully written value when it dies. That does not appear to be the case in our test. Our failure seems to be intermittent, and is sometimes only detectable after a few seconds.
In our test, the damaged bit continued to work over 200,000 times after the initial error. It then entered a phase of continuous, but time delayed, bit-flipping. This failure could be difficult to detect during normal use because the corruption happens only after a few seconds (or reads) have passed.
When the bad bit is written with a 1, it seems to linger at 0 for milliseconds before turning to 1. This is long enough to cause problems with the Flash Destroyer’s high-speed write-verify routine, but not the Bus Pirate’s slower interface.
When the bad bit is written with 0, it appears to work for a few seconds, but eventually drifts back to 1. This is a ticking data time bomb! Even if you verify every write, this error can’t be detected until a few seconds have lapsed.
This is a single test of one EEPROM, there are probably many different failure types. We can’t draw any conclusion from a single data point, but the randomness of our failure would suggest that it’s better to design with the manufacturer’s stated limits than depend on write-verification to catch errors.
ustream.tv live streaming performed very well, even during a slashdotting. We only have access to the traffic statistics for the first week at this point (May 25 to June 2), the second week report will be available in a few days. There were 98 thousand views in the first week, nearly 5.5 man-years were expended watching the EEPROM destruction. Thank you for sharing your time with us!
The ustream broadcast originates from a Flash application in a web browser. We ran it in Firefox on an old laptop. There were two crashes during the 15day live stream, we feel like it was much more reliable than 6months ago when we used it for the @tweet_tree.
The webcam we used is capable of high-quality images, but it has a fixed focus that made the Flash Destroyer close-up look pretty fuzzy. We briefly swapped another cam with adjustable focus, but it had really bad auto color balance that got confused by the Flash Destroyer LED display.
Taking it further
Thank you for following our Flash Destroyer antics. If you’d like to destroy some solid state storage of your own, you can get a Flash Distroyer kit for $30, including worldwide shipping.