Directories are an integal part of a filesystem all current systems have moved to the the hierarchical model of having directory trees. Abstractly, a UNIX directory is a file that contains a table which maps filenames to inodes, as in: filename -----> inode (This info not in the directory!) _________ _____ (Where is it?) . 1251 // required inode for this dir .. 531 // required inode for parent README 2014 // text file a.c 4000 // test ile a.o 4001 // object file a 4007 // executable file R 2014 // same text file as README DIR 3011 // directory The following restrictions are inforced by the operating system: 1) All inodes refer to inodes in the same partiton as the directory. 2) Every directory has an entry ".", which maps to the directory's inode. 3) Every directory has an entry ".." which maps to the directory's parent's inode. 4) The partiton will remain a tree (no loops and only one root) 5) Lexical restriction on filenames (character set, length of name, ...) 6) A normal user can only use "directory safe" system calls to modify a directory. The old (1970s) v7 UNIX directory simply had 16 byte records Bytes 0-13 A null terminated string (for 14 character names no null was stored) Bytes 14-15 The i-node (0 - 2^16-1) This limited a filesystem to at most 64K files, which at the time was sufficient. Long ago UNIX filesystems removed these restrivctions. Typical filenames can be 255 characters (or longer) and have more i-nodes than you can shake a stick at. This does require a more complicated structure with variable length fields and some allignment restrictions for efficiency. You can usually explore the local directory structure with the "od(1)" utility. This may not work on remote file systems. To provide an API for portable directory programming that is independent of the filesystem type, and also works for remote directories. the opendir(3c), closedir(3c), readdir(3c), rewinddir(3c) functions were added to the standard C library. ALthough these "sound like" the open(1), close(1), and read(1) system calls, they are not simple system calls, they act more like the stream IO function fopen(3c), fclose(3c), and fread(3c). They will buffer directory entries betweeen readdir(3c) calls. EXAMPLE /* check for file name in directory dir */ #include #include #include /* These should be in s header for this function */ #define FOUND 0 #define NOT_FOUND 1 #define READ_ERROR 2 #define OPEN_ERROR 3 int check_for_file(char *name, char *dir) { struct dirent *dirp; /* for directory dir */ dirp = opendir(dir); while (dirp) { errno = 0; if ((dp = readdir(dirp)) != NULL) { if (strcmp(dp->d_name, name) == 0) { closedir(dirp); return FOUND; } } else { if (errno == 0) { closedir(dirp); return NOT_FOUND; } closedir(dirp); return READ_ERROR; } } return OPEN_ERROR; } This does point out one weakness in typical UNIX filesystems. The only way to find if a name is in a directory is to do a linear search of the directory, walking through all entries with readdir(3c) in a loop. For directories with a very large number of entries, this can be very slow. The fact that long filenames are handled as srtings is also of some concern. For the best results, limit file names to reasonable lengths (14 characters +- is fine) and do not create directories with a large number (thousands) of entries. There are experimental UNIX filesystems and APIs that try to remove these restrictions without losing preformance. For applications that need large database operatons, use a database, rather than overloading the filesystem. < memtion database filesystems (Oracle) and database files such as with MySQL, or making your own in a disk file structure within a file or a couple files, eg as in the standard C library ndbm dbm_clearerr, dbm_close, dbm_delete, dbm_error, dbm_fetch, dbm_firstkey, dbm_nextkey, dbm_open, dbm_store - database functions > Notes: Consider the inode for the root "/" of a filesystem. A directory is usually the same as a Regular file, except that a normal user can not create it with open(2) or creat(2) nor can it be use modified with write(2). A normal user must use the system calls mkdir(2), rmdir(2) ... The system calls fd=open("dir",O_RDONLY) and read(fd, buf, n) "may", or may not, work on an existing directory "dir". Consider building a hashtable for all the names the files in a directory so tha later you can determine if a file exists in the directory without searching it (as common shells do with PATH). 1) Can you also save the i-node number of the file, so the system does not havo to look it up if you want to cd to it or execute it after finding it? 2) What happens if you change a file in your already hashed directory? 3) What happens if you add or remove a file in it? Exercise: List all possible limitations a UNIX filesystem might have. Exercise: How can you find the limitationa a UNIX filesystem has? Exercise: Build the hashtable function mentioned above. Compare its speed to doing a linear search of directories. Exercise: Use ndbm to create a large lookup table, compare its preformance to that of a linear search. Exercise: Write a program "ls-l" that works like "ls -l", using the readdir finction and stat calls in a loop.