1 Understanding Prestoserve

The Prestoserve product is a combination of the Prestoserve NVRAM hardware and the Prestoserve software. This manual assumes that the Prestoserve hardware is already installed in your system.

This chapter explains how Prestoserve improves disk I/O performance by caching synchronous disk writes. It also describes the disk operations that can utilize Prestoserve and describes how Prestoserve can alleviate Network File System (NFS) performance problems.

1.1 Prestoserve and Synchronous Write Operations

Prestoserve speeds up any application that requires synchronous writes to ensure data reliability. A file modification is synchronous if it must be immediately written to disk before the application can continue. Synchronous writes ensure data reliability because the writes are not stored in volatile memory and then later written to disk. For example, all UFS and NFS file system modifications due to creating or deleting files are written synchronously. In addition, all NFS data writes are written synchronously. Many database or transaction systems require synchronous writes and can show significant performance improvements with Prestoserve.

Applications that require synchronous writes are sometimes implemented by opening files requiring synchronous update with the O_FSYNC synchronous write flag. This flag can also be set by using the fcntl system call. An alternative to making every write synchronous is to commit a series of write operations with the fsync system call. This call synchronously writes all modified blocks of a file to disk. See fcntl(2) and fsync(2) for more information on the system calls.

In addition, the mount -o sync command causes all file system writes to be synchronous. Refer to mount(8) for more information.

1.2 How Prestoserve Works

Prestoserve uses the Prestoserve buffer cache (NVRAM hardware) to temporarily, but securely, store synchronous disk I/O. Instead of immediately writing the I/O to disk, Prestoserve stores the data in the cache's nonvolatile memory and then writes the data to disk when appropriate. Nonvolatile memory is used to ensure that data is not lost because of a power failure or a system crash. To the operating system, Prestoserve appears to be a very fast disk.

Prestoserve accelerates synchronous writes to mounted file systems by making synchronous disk writing more efficient. The Prestoserve software allows you to specify which file systems you want to accelerate.

Prestoserve works in a way that is similar to the way the system buffer cache speeds up asynchronous disk I/O requests. The Prestoserve buffer cache is interposed between the operating system and the device drivers for the disks on a server. When a synchronous write request is issued to a file system that has been accelerated with Prestoserve, the write is intercepted by the Prestoserve pseudodevice driver, which stores the data in the cache's nonvolatile memory instead of on the disk. This causes the synchronous write to occur at memory speed, not at disk speed.

As the nonvolatile memory fills up, the cache asynchronously flushes the data to disk in portions that are large enough to allow the disk drivers to optimize the order of the writes. A modified form of Least Recently Used (LRU) replacement is used to determine the order. Reads that hit or match blocks in the Prestoserve cache's nonvolatile memory can also realize performance benefits because the data does not have to be read from disk.

Note
Note that some database applications use raw character device disk partitions to manage their own file system data structures. Prestoserve will neither accelerate nor interfere with raw character device I/O.

There are several reasons why reliable write caching can boost performance:

A single UFS file system write operation causes two or three writes to disk because each write must update not only the data block but also the file definition blocks (inodes and indirect blocks). Because the same file definition block is updated for each data block in the file, 50 percent to 65 percent of all disk writes can be eliminated by rewriting the definition block cached in the Prestoserve nonvolatile memory buffers. Data blocks can also be found in the Prestoserve cache, although the frequency of these cache hits is significantly less than the frequency of hits on the file definition blocks.
The data in the Prestoserve nonvolatile cache can be flushed asynchronously to optimize disk I/O performance. This allows blocks of data to be scheduled in order to take advantage of disk arm position. Because disk seek times are significant, this represents a major performance improvement.
Because read caching is already effective, operations that modify file data account for a disproportionately large amount of actual disk traffic. However, read operations that are not utilized by the traditional system buffer cache are essentially synchronous (some read-ahead is possible) and must compete with the heavy write traffic. Altogether, operations that modify data typically make up about 20 percent of a normal operation mix, but about 60 percent of the requests for disk I/O are due to these data modification operations.

1.3 NFS Environment and Performance Problems

The Network File System (NFS) allows users to access files transparently across networks. The NFS supports a spectrum of network topologies, from small and simple networks to large and complex networks. To gain the maximum advantage that Prestoserve can provide, it is necessary to understand how different network design defects affect performance.

Figure 1-1 shows a typical NFS environment: one server supporting several clients connected by the Ethernet. The server manages the shared resources, such as data files and applications, and is responsible for the multiplexing of its resources among the various clients. The server also must maintain and protect the data within these shared resources.

Figure 1-1: Example of NFS Environment

NFS performance problems can be broken down into three basic areas: client, network, and server problems. The following sections describe each of these areas and show why the server and, in particular, the server's I/O subsystem are usually the primary causes of poor NFS performance.

1.3.1 Network Problems

The network used to communicate between the client and server does not normally cause a performance problem. There are, however, two conditions to look out for: network delays and high retransmission rates. If the Ethernet is overutilized, clients experience long delays waiting for a free slot to send requests. Ethernet utilization over 50 percent often indicates excessive network delay.

Network topology often contributes to excessive delay. If clients are located across many gateways from the servers that they use often, their requests experience long delays. You may be able to solve the problem by restructuring the network topology to distribute the load more evenly.

Excessive retransmissions can cause poor performance because the client must wait for the server to respond before it retransmits a request. Excessive retransmissions can be caused by the following problems:

Overloaded servers that drop packets due to insufficient buffering
Inadequate Ethernet transceivers that cause packets to be dropped under busy conditions
Physical network errors, such as those caused by a noisy coaxial cable

You can use the nfsstat -c command to measure the NFS retransmission rate on client machines. You can then determine the rate of retransmissions. Refer to nfsstat(8nfs) for more information.

The average NFS response time to a client request under a low to medium load is approximately 30 milliseconds. Most clients retransmit a request after approximately 1 second. If a 10 percent reduction in performance is acceptable, then a 3 millisecond increase in response time is an acceptable limit. This reduction gives an acceptable NFS retransmission rate of 0.3 percent. The calculation is as follows:

  .003 sec/request
-----------------------  =  0.003 retransmission/request
1.0 sec/retransmission

Because the worst case NFS request (read or write 8 kilobytes over the Ethernet) requires seven packets (one request and six fragmented replies), the error rate of the network must be less than 0.04 percent. The calculation is as follows:

  0.3 percent
---------  = 0.04 percent
    7

The calculation shows the overall acceptable error rate for both the client and the server, so the acceptable error rate measured at either machine is half of this rate (0.02 percent).

You can use the netstat -i command to measure the network error rate. If this rate is unacceptably high, determine if an individual machine is generating an excessive number of errors. If the problem appears to be pervasive, analyze the cabling technology that is being used. For example, if you have difficulties with noisy nonstandard coaxial cable, you could switch to a twisted-pair Ethernet. Refer to netstat(1) for more information.

1.3.2 Client Problems

Adding disks or memory to a client can improve performance in two ways: by improving access time and by reducing the overall load on the server and network. A client can avoid NFS performance problems for files that are not shared (such as root, swap, and temporary files) by using local disks for these files. For diskless clients, increased memory can make a big improvement in performance by allowing the client to swap and page less often. By adding local resources, the demands on the server and the network can be reduced.

While it is easy to improve client performance by adding memory or disks, these improvements may not be cost effective because of the additional administrative tasks that are needed to maintain the operating system. For example, if you store valuable data on local disks, you must ensure that the disks are backed up. If the data is shared, you may also have to ensure that other systems have access. If you add resources to the server, the additional administrative costs are less than if you add the resources to the client.

1.3.3 Server Problems

On most NFS servers, the limiting factor is the speed of the disk. Most high-speed disks can sustain from 30 to 40 disk operations per second. Most of the time spent waiting for a disk operation occurs during head seeks or rotational delay. If you use a faster disk or disk controller and if you spread the load over multiple disks, you can obtain a small improvement in I/O performance. However, the best way to improve I/O performance is to reduce the number of disk operations.

To alleviate performance problems, you should concentrate your resources on the server. If you have already added memory to your server to increase the size of the buffer cache and the server is still too slow, you could obtain another server and split the load between the two servers. However, not only does this solution have a large direct cost, but there is a significant administrative cost associated with supporting an additional server. Prestoserve is an alternative solution that can increase the performance of the NFS server without an additional server and its added administrative cost.

1.3.4 NFS Server Performance

Digital UNIX uses a buffer cache in memory to avoid disk operations whenever possible. This memory is effective in reducing the client waiting time for relatively slow disk I/O. It also makes disk I/O more efficient by allowing the staging and scheduling of disk operations.

You can improve performance by allowing the disk device driver to schedule several requests at a time to take advantage of the position of the disk arm. The total amount of disk I/O is reduced, because repeat requests may be found in the cache. If NFS read activity is high, then adding more memory to your server can improve server performance because the size of the buffer cache is a percentage of the size of memory.

Performance problems at the server make the system buffer cache inefficient when serving remote write requests. NFS uses a simple stateless protocol, which requires that each client request be complete and self-contained and that the server completely process each request before sending an acknowledgment back to the client. If the server crashes or if an acknowledgment is lost, the client retransmits its request to the server. Because of this, the following events occur:

The server cannot acknowledge the client's request until data is safely written to stable storage.
The client knows exactly how much modified data has been safely stored by the server.
The server cannot cache modified data in volatile storage because the data may be lost if the server crashes.

You cannot use the system buffer cache to improve performance with NFS requests that modify data. If a server writes modified data only to volatile memory, a server crash would jeopardize the data integrity. The client may assume that its data is safely stored, but if a crash occurs and the data was stored only in volatile memory, the data may be lost. Because a single server stores data for many clients, many clients can be affected. However, if modifications are always synchronously written to disk, data will not be lost, and you can recover from server crashes.

Client operations that modify data, such as file creation, file removal, and attribute modification must be written synchronously to disk before the server responds to the client. For example, when the client creates a new file, the server may have to update the data and file definition blocks for the directory that contains the file. To ensure file system integrity in the local case, these operations are also written synchronously to disk.

Because NFS operations are synchronously committed to disk, a server can survive system failures because data integrity is ensured. However, performance is degraded because these operations take place at disk speeds and not at the memory speeds available to cachable operations. In addition, because these operations are processed serially, there is no opportunity to optimize the scheduling of the disk arm. Modifications to the cache are written synchronously to disk, so there is no opportunity to decrease write-disk traffic.

Unless your server is only supplying read-only access to files, some NFS operations must be synchronously committed to disk. Because disks are much slower than memory, this is a large burden. Prestoserve stores synchronous writes to nonvolatile memory; therefore data is secure without a corresponding decrease in performance.

1.3.5 Prestoserve's Impact on NFS Server Performance

Prestoserve's performance impact on any particular server can vary widely as a result of the demands placed on the NFS server by its client systems. Heavily loaded NFS servers (those performing more than 10 percent of NFS writes, creates, and deletes) will benefit the most from Prestoserve. Conversely, lightly loaded NFS servers (those performing less than 4 percent of NFS writes) may have no noticeable benefits from Prestoserve.

In addition to increased response time, Prestoserve uses the server's disk more efficiently. For example, in many cases, Prestoserve allows you to double the number of diskless clients that a single NFS server can support if it has the necessary disk capacity and a sufficient amount of main memory. Prestoserve's improvement to an NFS server is most noticeable when the server is busy.