Especially today I have much to be thankful for.
For starters we had thanksgiving dinner tonight. My lovely wife cooked a 2 (two) pound turkey, stuffing, squash, her patented garlicy mashed potatoes and the ever lovely berry pie. Of the turkey we have exactly one breast left. Of the mashed potatoes we have a multitude, but I have a feeling they won’t last long.
The kids were delightful today. I don’t have prostate cancer. It’s movie night for Mommy and Daddy. And of course many other things too numerous to count.
Probably the best thing about today, or at least the most unexpected good thing, was that I was able to help a fellow student. I had offered to help her with some LaTeX problems she was having. There were tons of errors in the file she sent me and I couldn’t help thinking to myself that she wasn’t very good at LaTeX. As proof that you shouldn’t judge (and should always backup), I found out that her hard drive had crashed and the specialist had only been able to recover the pdf files (he didn’t speak English, nor she Hungarian, so perhaps there was more that could have been done but it’s too late for that now). Her earlier latex files were all missing. So she had copied and pasted all the text into a new file and was going through trying to make it all work again. She said she wasn’t too worried about the text (since she could copy/paste), but she had some large derivation diagrams or something like that. She’s studying philosophy and doing her thesis in logic of temporal what-sa-ma-hu-sit.
On a lark I asked if she still had the recovered files and she said yes. I did a quick grep and noticed that there were some files, “jpeg”s and “mp4″s mostly, that contained at least parts of her old latex files. So then I collected all the files that looked promising and ran strings on them to collect all the text into one big file. It was still working on that when I had to leave (there were about 50,000 files with LaTeX-looking parts and it was over 1 GB when I left). Of course the majority of that file is junk. I then gave her some instructions on how to proceed. Namely to run split -p to break it into smaller files containing (hopefully) either all junk or all useful information. Then she’ll use some bash and grep to delete all trash files leaving her with only her old latex files. Hopefully there won’t be too many duplicates. If it had finished before I left I might have tried to use diff to eliminate duplicates.
Just in case your hard drive ever crashes and your latex files weren’t recovered properly, here’s what to do. First open a terminal. If you don’t know how to do this please do not follow the instructions here. Get your nephew to help you or something. :-) Please note that I am typing these from memory and haven’t actually tested them. If you don’t know what you are doing please simply read for your own enlightenment.
cd /path/to/where/recovered/files/aregrep -lr -e 'usepackage' * > ~/interestingfiles.txt- This looks for files which have the
usepackagein them. This is something that all but the simplest latex files have. It saves the names in a file for later. export IFS=$'\n'for f in $(cat ~/interestingfiles.txt) do strings "$f" >> ~/strings.txt; done- This runs strings on each of the interesting files. Note that we need to set the Internal Field Separator to just a newline otherwise file names with spaces will cause problems.
mkdir recoveredsplit -a 100 -p '(begin|end){document}' ~/strings.txt ~/recovered/possible.- Here we split the file into smaller files at
begin{document}orend{document}. Since this is how latex files begin and end we are hopefully separating the wheat from the chaff. Note that we need to set the suffix length (-a) big enough to accommodate the potentially thousands of output files. 100 is almost certainly overkill, but hey, who cares. for f in ~/recovered/possible.*; do grep -E 'usepackage' "$f" || rm -i "$f"; done- We delete any of the newly split files which don’t have
usepackagein them. They are almost certainly full of junk.
Note that the last step is potentially destructive so I added -i to the rm command, which will prompt you before deleting any of the thousands of files and hence be very annoying. After you have verified that everything is working properly you may remove the -i. Of course recovering other types of files will require changing the regular expressions used. That is left as an exercise to the reader, or at least it’s left until you actually have to do it.
It’s nice to be able to help someone who has really had a bad accident, and it’s nice to know that my computer skills aren’t completely useless now that I no longer work for Omniture.
