...making Linux just a little more fun!

Recovering FAT Directory Stubs with SleuthKit

By Silas Brown

When I accidentally dropped an old Windows Mobile PocketPC onto the floor at the exact moment it was writing to a memory card, the memory card's master FAT was corrupted and several directories disappeared from the root directory. Since it had not been backed up for some time, I connected the memory card to a Linux system for investigation. (At this point it is important not to actually mount the card. If you have an automounter, turn it off before connecting. You have to access it as a device, for example /dev/sdb1. To see which device it is, you can do ls /dev/sd* both before and after connecting it and see what appears. The following tools read from the device directly, or from an image of it copied using the dd command.)

fsck.vfat -r offered to return over 1 GB of data to the free space. This meant there was over 1 GB of data that was neither indexed in the FAT nor listed as free space, i.e. that was the lost directories. It was important not to let fsck.vfat mark it as free yet, as this would impair data recovery. fsck.vfat can also search this data for contiguous blocks and reclaim them as files, but that would not have been ideal either because the directory structure and all the filenames would have been lost, leaving thousands of meaninglessly-named files to be sorted out by hand (some of which may end up being split into more than one file).

The directory structure was in fact still there; only the entries for the top-level directories had been lost. The listings of their subdirectories (including all filenames etc) were still present somewhere on the disk, but the root directory was no longer saying where to find them. A few Google searches seemed to suggest that the orphaned directory listings are commonly known as "directory stubs" and I needed a program that could search the disk for lost directory stubs and restore them. Unfortunately such programs were not very common. The only one I found was a commercial one called CnW Recovery, but that requires administrator access to a Windows PC (which I do not have).

SleuthKit to the rescue

A useful utility is SleuthKit, available as a package on most distributions (apt-get install sleuthkit) or from sleuthkit.org. SleuthKit consists of several commands, the most useful of which are fls and icat. The fls command takes an image file (or device) and an inode number, and attempts to display the directory listing that is stored at that inode number (if there is one). This directory listing will show files and other directories, and the inode numbers where they are stored. The icat command also takes an image file and an inode number, and prints out the contents of the file stored at that inode number if there is one. Hence, if you know the inode number of the root directory, you can chase through a normal filesystem with fls commands (following the inode numbers of subdirectories etc) and use icat to access the files within them. fls also lists deleted entries (marked with a *) as long as those entries are still present in the FAT. (Incidentally, these tools also work on several other filesystems besides FAT, and they make them all look the same.)

The range of valid inode numbers can be obtained using SleuthKit's fsstat command. This tells you the root inode number (for example 2) and the maximum possible inode number (for example 60000000). fsstat will also give quite a lot of other information, so you may want to pipe its output through a pager such as more or less (i.e. type fsstat|more or fsstat|less) in order to catch the inode range near the beginning of the output.

Scanning for directory stubs

Because the root FAT had been corrupted, using fls on it did not reveal the inode locations of the lost directories. Therefore it was necessary to scan through all possible inode numbers in order to find them. This is a lot of work to do manually, so I wrote a Python script to call the necessary fls commands automatically. First it checks the root directory and all subdirectories for the locations of "known" files and directories, and then it scans all the other inodes to see if any of them contain directory listings that are not already known about. If it finds a lost directory listing, it will try to recover all the files and subdirectories in it with their correct names, although it cannot recover the name of the top-level directory it finds.

Sometimes it finds data that just happens to pass the validity check for a directory listing, but isn't. This results in it creating a "recovered" directory full of junk. But often it correctly recovers a lost directory.

image = "/dev/sdb1"
recover_to = "/tmp/recovered"

import os, commands, sys

def is_valid_directory(inode):
    exitstatus,outtext = commands.getstatusoutput("fls -f fat "+image+" "+str(inode)+" 2>/dev/null")
    return (not exitstatus) and outtext
def get_ls(inode): return commands.getoutput("fls -f fat "+image+" "+str(inode))

def scanFilesystem(inode, inode2path_dic, pathSoFar=""):
    if pathSoFar: print "   Scanning",pathSoFar
    for line in get_ls(inode).split('\n'):
        if not line: continue
        try: theType,theInode,theName = line.split(None,2)
        except: continue # perhaps no name (false positive inode?) - skip
        if theInode=='*': continue # deleted entry (theName starts with inode) - ignore
        assert theInode.endswith(":") ; theInode = theInode[:-1]
        if theType=="d/d": # a directory
            if theInode in inode2path_dic: continue # duplicate ??
            inode2path_dic[theInode] = pathSoFar+theName+"/"
            scanFilesystem(theInode, inode2path_dic, pathSoFar+theName+"/")
        elif theType=="r/r": inode2path_dic[theInode] = pathSoFar+theName

print "Finding the root directory"
i=0
while not is_valid_directory(i): i += 1

print "Scanning the root directory"
root_inode2path = {}
scanFilesystem(i,root_inode2path)

print "Looking for lost directory stubs"

recovered_directories = {}
while i < 60000000:
    i += 1
    if i in root_inode2path or i in recovered_directories: continue # already known
    sys.stdout.write("Checking "+str(i)+"   \r") ; sys.stdout.flush()
    if not is_valid_directory(i): continue
    inode2path = root_inode2path.copy()
    scanFilesystem(i,inode2path)
    for n in inode2path.keys():
        if n in root_inode2path: continue # already known
        p = recover_to+"/lostdir-"+str(i)+"/"+inode2path[n]
        if p[-1]=="/": # a directory
            recovered_directories[n] = True
            continue 
        print "Recovering",p
        os.system('mkdir -p "'+p[:p.rindex("/")]+'"')
        os.system("icat -f fat "+image+" "+str(n)+' > "'+p+'"')

Note that the script might find a subdirectory before it finds its parent directory. For example if you have a lost directory A which has a subdirectory B, it is possible that the script will find B first and recover it, then later when it finds A it will recover A, re-recover B, and place the new copy of B inside the recovered A, so you will end up with both A, B and A/B. You have to manually decide which of the recovered directories are actually the top-level ones. The script does not bother to check for .. entries pointing to a directory's parent, because these were not accurate on the FAT storage card I had (they may be more useful on other filesystems). If you want you can modify the script to first run the inode scan to completion without recovering anything, then analyze them, discarding any top-level ones that are also contained within others. However, running the scan to completion is likely to take far longer than looking at the directories by hand.

As it is, you can interrupt the script once it has recovered what you want. If Control-C does not work, use Control-Z to suspend it and do kill %1 or whatever number bash gave you when you suspended the script.

This script can take several days to run through a large storage card. You can speed it up by using dd to take an image of the card to the hard disk (which likely has faster access times than a card reader); you can also narrow the range of inodes that are scanned if you have some idea of the approximate inode numbers of the lost directories (you can get such an idea by using fls to check on directories that are still there and that were created in about the same period of time as the lost ones).

After all the directories have been recovered, you can run fsck.vfat -r and let it return the orphaned space back to the free space, and then mount the card and copy the recovered directory structures back onto it.

Some GNU/Linux live CDs have a forensic mode that doesn't touch local storage media. For example if you boot the GRML live CD you can select "forensic mode" and can safely inspect attached harddisks or other media. AFAIK Knoppix has a similar option. -- René


Talkback: Discuss this article with The Answer Gang


[BIO]

Silas Brown is a legally blind computer scientist based in Cambridge UK. He has been using heavily-customised versions of Debian Linux since 1999.


Copyright © 2010, Silas Brown. Released under the Open Publication License unless otherwise noted in the body of the article. Linux Gazette is not produced, sponsored, or endorsed by its prior host, SSC, Inc.

Published in Issue 172 of Linux Gazette, March 2010

Tux