[Return to Library] [Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


4    File System


[Return to Library] [Contents] [Previous Chapter] [Next Section] [Next Chapter] [Index] [Help]


4.1    Overview

Digital UNIX Version 4.0 supports the following file systems which are accessed through the OSF/1 Version 1.0 Virtual File System (VFS):

Note that all of the file systems are integrated with the Virtual Memory Unified Buffer Cache (UBC).

In addition, Digital UNIX Version 4.0 supports the Logical Storage Manager (LSM) and the Prestoserve file system accelerator.

Note that the Logical Volume Manager is being retired in this release.

The following sections briefly discuss VFS, the file systems supported in Digital UNIX Version 4.0, the Logical Storage Manager, and the Prestoserve file system accelerator.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.2    Virtual File System

The Virtual File System (VFS), which is based on the Berkeley 4.3 Reno Virtual File System, provides a uniform interface abstracted from the file system layer which allows common access to files, regardless of the file system on which the files reside. A structure known as a vnode (analogous to an inode) contains information about each file in a mounted file system and is more or less a wrapper around file system-specific nodes. If, for example, a read or write is requested on a file, the vnode points the system call to the system call appropriate for that file system (a read request is pointed to a ufs_read if the request is made on a file in a UFS file system or to an nfs_read if the request is made on a file in an NFS-mounted file system). As a result, file access across different file systems is transparent to the user.

Digital's VFS implementation also supports Extended File Attributes (XFAs). Although originally intended to provide support for system security (Access Control Lists) and the Pathworks PC server (so that a Pathworks PC server could assign PC-specific attributes to a file, such as icon color, the startup size of the application, its backup date, and so forth), the XFA implementation was expanded to provide support for any application that wants to assign an XFA to a file. Currently, both UFS and AdvFS support XFAs, as well as the pax backup utility which has a tar and cpio front-end. XFAs are also supported for remote UFS file systems, to a server which supports a special protocol which currently only Digital supports. For more information on XFAs, see setproplist(2). For more information on pax, see pax(1).


Information for File System Developers

In Digital UNIX Version 4.0, the VOP_READDIR kernel vnode operation interface has been changed to accommodate a new structure, kdirent, in addition to the existing dirent structure.

The new kdirent structure was developed to make file systems other than UFS work properly over NFS.

Note, however, that if you implement a file system under Digital UNIX, you do not need to make any changes to your VOP_READDIR interface routine for Digital UNIX Version 4.0, and applications see the same interface as before the addition of the new kdirent structure.

Unlike the dirent structure, the kdirent structure has a kd_off field that subordinate file systems can set to point to the on-disk offset of the next directory entry. Arrays of struct kdirent must be padded to 8-byte boundaries, using the KDIRSIZE macro, so that the off_t is properly aligned; arrays of struct dirent are only padded to 4 bytes.

Each mounted file system has the option of setting the M_NEWRDDIR flag in the mount structure m_flag field. If the M_NEWRDDIR flag is set, then the routine calling VOP_READDER expects the readdir on that vnode to return an array of struct kdirent; if the M_NEWRDDIR flag is clear (the default), then the the readdir on that vnode returns an array of struct dirent.

In terms of NFS, if the M_NEWRDDIR flag is not set, then the NFS server uses the dirent structures and then calculates the necessary offset to pass back to the server. Thus, to ensure proper operation over NFS, any file system that does not have the M_NEWRDDIR flag set must be prepared to have VOP_READDIR called with offsets based on a packed array of struct dirent, which may be in conflict with the offsets on the on-disk directory structure. However, if the M_NEWRDDIR flag is set, then the NFS server uses the kd_off fields of the kdirent structures to generate the necessary offsets to pass back to the server.

A new vnode operation VOP_PATHCONF was added to the kernel in order to return filesystem-specific information for the fpathconf() and pathconf() system calls. This vnode operation takes as arguments the pointer to struct vnode, the pathconf name int, return value pointer to long and error int. It also sets the return value and ERRNO. Note that each filesystem must implement the vnode operation by providing a function in the vnodeops structure after the vn_delproplist component (at the end of the structure). This function takes as arguments the pointer to vnode, the pathconf name, and the return value pointer to long. The function sets the return value and returns zero for succes or an error number.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.3    UNIX File System

The UNIX File System (UFS) is compatible with the Berkeley 4.3 Tahoe release. UFS allows a pathname component to be 255 bytes, with the fully qualified pathname length restriction of 1023 bytes. The Digital UNIX Version 4.0 implementation of UFS supports file sizes which exceed 2 GBs.

Digital added support for file block clustering which provides sequential read and write access that is equivalent to the raw device speed of the disk and up to a 300% performance increase over previous releases of the operating system; file-on-file mounting (FFM) for STREAMS; and integrated UFS with the Unified Buffer Cache. UFS also supports Extended File Attributes (XFAs). For more information on XFAs, see Section 4.2.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.4    Network File System

The Network File System (NFS) is a facility for sharing files in a heterogeneous environment of processors, operating systems, and networks, by mounting a remote file system or directory on a local system and then reading or writing the files as though they were local.

Digital UNIX Version 4.0 supports NFS Version 3, in addition to NFS Version 2. NFS Version 2 code is based on ONC Version 4.2, which Digital licensed from Sun Microsystems. The NFS Version 3 code supersedes ONC Version 4.2, although at the time that NFS Version 3 was ported to Digital UNIX, Sun Microsystems had not yet released a newer, public version of ONC with NFS Version 3 support.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.4.1    NFS Version 3 Functionality

NFS Version 3 supports all the features of NFS Version 2 as well as the following:

Since Digital UNIX supports both NFS Version 3 and Version 2, the NFS client and server bind at mount time using the highest NFS version number they both support. For example, a Digital UNIX Version 4.0 client will use NFS Version 3 when it is served by a Digital UNIX Version 4.0 NFS server; however, when it is served by an NFS server running an earlier version of Digital UNIX, the Digital UNIX Version 4.0 NFS client will use NFS Version 2.

For more detailed information on NFS Version 3, see the paper NFS Version 3: Design and Implementation (USENIX, 1994).


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.4.2    Digital Enhancements to NFS

In addition to the NFS Version 3.0 functionality, Digital UNIX supports the following Digital enhancements to NFS:


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.5    CD-ROM File System

Digital UNIX Version 4.0 supports the ISO-9660 CDFS standard for data interchange between multiple vendors; High Sierra Group standard for backward compatibility with earlier CD-ROM formats; and an implementation of the Rock Ridge Interchange Protocol (RRIP), Version 1.0, Revision 1.09. The RRIP extends ISO-9660 using the system use areas defined by ISO-9660 to provide mixed-case and long filenames; symbolic links; device nodes; deep directory structures (deeper than ISO-9660 allows); UIDs, GIDs, and permissions on files; and POSIX time stamps.

This code was taken from the public domain and enhanced by Digital.

In addition, Digital UNIX Version 4.0 also supports X/Open Preliminary Specification (1991) CD-ROM Support Component (XCDR). XCDR allows users to examine selected ISO-9660 attributes through defined utilities and shared libraries, and allows system administrators to substitute different file protections, owners, and file names for the default CD-ROM files.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.6    Memory File System

Digital UNIX Version 4.0 supports a Memory File System (MFS) which is essentially a UNIX File System that resides in memory. No permanent file structures or data are written to disk, so the contents of an MFS file system are lost on reboots, unmounts, or power failures. Since it does not write data to disk, the MFS is a very fast file system and is quite useful for storing temporary files or read-only files that are loaded into it after it is created.

For example, if you are performing a software build which would have to be restarted if it failed, the MFS is a very appropriate choice to use for storing the temporary files that are created during the build, since by virtue of its speed it would reduce the build time. For more information, see the newfs(8) reference page.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.7    /proc File System

The /proc file system enables running processes to be accessed and manipulated as files by the system calls open, close, read, write, lseek, and ioctl. While the /proc file system is most useful for debuggers, it enables any process with the correct permissions to control another running process. Thus, a parent/child relationship does not have to exist between a debugger and the process being debugged. The dbx debugger that ships in Digital UNIX Version 4.0 supports attaching to running processes through /proc. For more information, see the proc(4) and dbx(1) reference pages.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.8    File-on-File Mounting File System

The File-on-File Mounting (FFM) file system allows regular, character, or block-special files to be mounted over regular files, and, for the most part, is only used by the SVR4-compatible system calls fattach and fdetach of a STREAMS-based pipe (or FIFO). With FFM, a FIFO, which normally has no file system object associated with it, is given a name in the file system space. As a result, a process that is unrelated to the process that created the FIFO can then access the FIFO.

In addition to programs using FFM through the fattach system call, users can mount one regular file on top of another using the mount command. Mounting a file on top of another file does not destroy the contents of the covered file; it simply associates the name of the covered file with the mounted file, making the contents of the covered file temporarily unavailable. The covered file can be accessed after the file mounted on top of it is unmounted, either by a reboot or by a call to fdetach, or by entering the umount command. Note that the contents of the covered file are still available to any process which had the file open at the time of the call to fattach or when a user issued a mount command that covered the file.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.9    File Descriptor File System

The File Descriptor File System (FDFS) allows applications to reference a process's open file descriptors (0, 1, 2, 3, and so forth) as if they were files in the UNIX File System (for example, /dev/fd/0, /dev/fd/1, /dev/fd/2) by aliasing a process's open file descriptors to file objects. When the FDFS is mounted, opening or creating a file descriptor file has the same effect as calling the dup(2) system call.

The FDFS allows applications that were not written with support for UNIX I/O to avail themselves of pipes, named pipes, and I/O redirection.

The FDFS is not mounted by default and must either be mounted by hand or by an entry placed in the /etc/fstab file.

For more information on the FDFS, see the fd(4) reference page.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.10    POLYCENTER Advanced File System

The POLYCENTER Advanced File System (AdvFS), which consists of a file system that ships with the base system and a set of file system utilities that are available as a separate, layered product, is a log-based (journaled) file system that is especially valuable on systems with large amounts of storage. Because it maintains a log of active file-system transactions, AdvFS avoids lengthy file system checks on reboot and can therefore recover from a system failure in seconds. AdvFS ensures that log records are written to disk before data records, ensuring that file domains (file systems) are recovered to a consistent state. AdvFS uses extent-based allocation for optimal performance.

To users and applications, AdvFS looks like any other UNIX file system. It is compliant with POSIX and SPEC 1170 file-system specifications. AdvFS file domains and other Digital UNIX file systems, like UFS, can exist on the same system and are integrated with the Virtual File System (VFS) and the Unified Buffer Cache (UBC). AdvFS file domains can also be remote-mounted with NFS and support extended file attributes (XFAs). For more information on XFAs, see Section 4.2.

In addition to providing rapid restart and increased file-system integrity, AdvFS supports files and file systems much larger than 2 GBs and, by separating the file system directory layer from the logical storage layer, provides increased file-system flexibility and manageability.

In addition to the Advanced File System that ships as part of the base operating system, the POLYCENTER Advanced File System Utilities are available as a layered product. The AdvFS Utilities enable a system administrator to create multivolume file domains, add and remove volumes online, clone filesets for online backup, unfragment and balance file domains online, stripe individual files, and establish trashcans so that users can restore their deleted files. The AdvFS Utilities also provide a Graphical User Interface for configuring and managing AdvFS file domains. The AdvFS Utilities require a separate license Product Authorization Key (PAK). Contact your Digital representative for additional information on the AdvFS Utilities product. For more information on AdvFS, see the System Administration guide and the POLYCENTER Advanced File System Utilities Technical Summary.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.11    Logical Storage Manager

Digital UNIX Version 4.0 supports the Logical Storage Manager (LSM), a more robust logical storage manager than Logical Volume Manager (LVM), which it has replaced. LSM supports all of the following:

Mirroring, striping and the graphical interface require a separate license PAK. The LSM code came from VERITAS (the VERITAS Volume Manager) and was enhanced by Digital.

For each logical volume defined in the system, the LSM volume device driver maps logical volume I/O to physical disk I/O. In addition, LSM uses a user-level volume configuration daemon (vold) that controls changes to the configuration of logical volumes. Users can administer LSM either through a series of command-line utilities or by availing themselves of an intuitive Motif-based graphical interface.

To ensure a smooth migration from LVM to LSM, Digital has developed a migration utility that maps existing LVM volumes into nonstriped, nonmirrored LSM volumes that preserves all of the LVM data. After the migration is complete, administrators can mirror the volumes if they so desire.

Similarly, to help users transform their existing UFS or AdvFS partitions into LSM logical volumes, Digital has developed a utility that will transform each partition in use by UFS or AdvFS into a nonstriped, nonmirrored LSM volume. After the transformation is complete, administrators can mirror the volumes if they so desire.

Note that LSM volumes can be used in conjunction with AdvFS, as part of an AdvFS domain; with RAID disks; and with the Available Server Environment (ASE), since LSM supports logical volume failover. For more information on LSM, see the Logical Storage Manager.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.12    Overlap Partition Checking

The enhancements related to Overlap Partition Checking are described next.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.12.1    Partition Overlap Checks Added to Utilities

Partion overlap checks were added to a number of commands in Digital UNIX Version 4.0. Some of the commands which use these checks are: newfs, fsck, mount, mkfdnm, swapon, voldisksetup, and voldisk. The enhanced checks require a disk label to be installed on the disk. Refer to the disklabel(8) reference page for further information.

The checks ensure that if a partition or an overlapping partition is already in use (for example, mounted or used as a swap device), the partition will not be overwritten. Additionally, the checks ensure that partitions will not be overwritten if the specific partition or an overlapping partition is marked in use in the fstype field on the disk label.

If a partition or an overlapping partition has an in-use fstype field in the disklabel, some commands inquire interactively if a partition can be overwritten.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Section] [Next Chapter] [Index] [Help]


4.12.2    Library Functions for Partition Overlap Checking

Two new functions, check_usage(3) and set_usage(3) are available for use by applications. These functions check whether a disk partition is marked for use and set the fstype of the partition in the disk label. See the appropriate reference pages for these functions for more information.


[Return to Library] [Contents] [Previous Chapter] [Previous Section] [Next Chapter] [Index] [Help]


4.13    Prestoserve File System Accelerator

The Prestoserve file system accelerator is a hardware option that speeds up synchronous disk writes, including NFS server access, by reducing the amount of disk I/O. Frequently-written data blocks are cached in nonvolatile memory and then written to disk asynchronously.

The software required to drive the board ships as an optional subset in Digital UNIX Version 4.0 and once it is installed can be accessed with a PAK that comes with the board.

Prestoserve uses a write cache for synchronous disk I/O. Prestoserve works in a way that is similar to the way the system buffer cache speeds up asynchronous disk I/O requests. Prestoserve is interposed between the operating system and the device drivers for the disks on a server. Mounted file systems and unmounted block devices selected by the administrator are accelerated.

When a synchronous write request is issued to a disk with accelerated file systems or block devices, it is intercepted by the Prestoserve pseudodevice driver, which stores the data in nonvolatile memory instead of on the disk. Thus, synchronous writes occur at memory speeds, not at disk speeds.

As the nonvolatile memory in the Prestoserve cache fills up, it asynchronously flushes the cached data to the disk in portions that are large enough to allow the disk drivers to optimize the order of the writes. A modified form of Least Recently Used (LRU) replacement is used to determine the order. Reads that hit (match blocks) in the Prestoserve cache also benefit.

Nonvolatile memory is required because data must not be lost if the power fails or if the system crashes. As a result, the hardware board contains a battery that protects data in case the system crashes. From the point of view of the operating system, Prestoserve appears to be a very fast disk.

Note that there is a substantial performance gain when Prestoserve is used on an NFSV2 server.

The dxpresto command allows you to monitor Prestoserve activity and to enable or disable Prestoserve on machines that allow that operation. For more information on Prestoserve see the Guide to Prestoserve and the dxpresto(8X) reference page.