Files and Disks
- the File Manager manages files, directories (collections of files), and
file systems (collections of directories)
- a file is a named collection of data stored on a device
- the file manager is responsible for maintaining the file abstraction as a way of hiding details of long-term storage
- in other words, disk drives don't really know what a file is
- consider what happens when a disk begins to get full and there's not enough contiguous space for a large file
- a suitcase won't fit in a trunk? let the file manager take care of it by sawing the suitcase into pieces and fit them each in the smaller, available spaces
- a disk read or write typically works with a group of bytes called a block (sometimes called a cluster)
- a file is really just a name associated with a collection of blocks on the device
- this implies there is a table of contents somewhere that allows the file manager to find the blocks given a file name
- a disk can be formatted to use relatively small blocks (e.g. 512KB) or large blocks (e.g. 8MB)
- in general devices can be categorized as block devices (e.g. hard drives, DVD drives) or character devices (e.g. keyboards,
- a disk is organized in tracks and sectors
- each disk drive typically has multiple platters and the read/write head moves across each
- the same track on all platters makes a cylinder
- consider the physical movements involved in retrieving data from a hard disk
- most drivers optimize the requests by attempting to minimize movements
- contrast these strategies (see handout for details): FCFS, SSTF, Scan/Look, Circular Scan/Look
- consider how blocks are allocated when saving a file
- some possible strategies: contiguous, linked
list, indexed
- most systems use indexed, which operates much like the page directories of a memory manager
- fixed-sized blocks leads to internal fragmentation
- a defragger is designed to recover some of what is lost via internal fragmentation (also rearranges blocks to be contiguous)
- still, it calls for a wise choice in block size
- since a directory really just has a pointer to a file, a file can show
up in multiple directories (Unix does this with links)
- ultimately all directories have to report to somebody; even the
top-level directories have to adhere to some naming scheme
- this global naming scheme is really the file system; file systems fit
into the operating system's scheme
- consider how mounting a file system works
Streams and Buffers
- consider how the file manager manages files
- the device doesn't just store the file's contents; it
also keeps various attributes (e.g. owner, permissions)
- every time a file is opened, the file manager creates a file
descriptor to manage the interaction between application and file
- the application views a file as a stream of bytes with a file pointer;
however this is not how the data is actually stored
- data goes from sectors and tracks interpreted as blocks which are loaded
into buffers interpreted as streams
- consider the effect of a buffer when reading and writing (note particularly what happens when we insert text into a file)
- in addition to streams, some programming environments allow the data to be interpreted as records, or its modern equivalent: objects
- a persistent object is backed by a file or database
- historically this was referred to as record-level processing, a typical pattern for COBOL
programs and early databases
- in the end, each layer deals with different abstractions at the appropriate level of detail
RAID
- RAID is Redundant Array of Inexpensive Disks (note: some say the I stands for Independent)
- RAID creates a layer of abstraction between the file manager and the device
- in particular, RAID software translates between logical blocks and physical blocks
- the file manager translates physical blocks on a device into logical buffers in RAM
- RAID does not change this, but it provides a layer of virtual devices above the physical devices
- this layer of abstraction allows the contents of a file to be distributed and/or duplicated across multiple hard drives
- possible benefits of this include better performance, better reliability and recoverability
- RAID comes in different configurations or levels
- consider how splitting a file across multiple disks can improve performance
- consider how duplicating a file across multiple disks can improve reliability/recoverability