F Advanced File System Issues

This appendix provides information about how to solve problems you may encounter with the Advanced File System (AdvFS).

F.1 Recovering from AdvFS Domain Panics

Typically, an AdvFS domain panic signals a hardware problem. However, it also may indicate a software failure, particularly in a domain that contains critical system files, such as the root file system.

The benefit of an AdvFS domain panic is that it prevents further access by users to a single AdvFS file domain and allows the filesets in that domain to be unmounted and examined. In addition, all other AdvFS domains remain on line and in normal operation, unaffected by the domain panic.

To recover an AdvFS domain from a domain panic, you need to collect as much information as you can about the condition of AdvFS filesets and metadata, in case there are software problems you need to correct or document for a problem report. Then, you need to take the steps that will let you run the verify utility on the domain in an attempt to check the integrity of its filesets.

To recover from an AdvFS domain panic, perform the following steps. If you cannot successfully complete steps 1-6, go to step 8:

Use the mount command to obtain a list of all the filesets in the domain. Then, use the umount command to unmount all of the file sets in the domain. For example:
# mount -t advfs
# umount fileset_name

Note that the filesets in a domain must be unmounted to run the verify utility that checks them. See the restrictions noted in the verify(8) reference page for more information.
Use the ls -l command on the /etc/fdmns directory for the domain in order to obtain a list of the AdvFS volumes in the domain. For example:
# ls -l /etc/fdmns/staff_projects1
Use the vfile command to collect information about the metadata files for each volume in the domain, in case you cannot recover the domain with the verify operation. You need to record information about the bitfile metadata table (BMT), the storage bitmap (SBM), the root tag directory, and the transaction log file for each disk. See the vfile(8) reference page for additional information. For example:
# /sbin/advfs/vfile 0 0 rz3c > bmt_rz3c
# /sbin/advfs/vfile 0 1 rz3c > sbm_rz3c
# /sbin/advfs/vfile 0 2 rz3c > tag_rz3c
# /sbin/advfs/vfile 0 3 rz3c > log_rz3c
Use the dia utility to extract information about the domain panic from the binary error log, as documented in the dia(8) reference page.
If the problem is a hardware problem, fix it before going to step 6.
Run the verify utility on all of the filesets in the domain. For example:
# verify staff_projects1
If the verify utility exits successfully, mount all of the file sets you had unmounted in step 1. You can resume normal operations. If the verify utility indicates that there is a problem, go to step 8.
If there is a failure that prevents complete recovery, you must first re-create the domain and restore the domain's data from backup media. Then, mount all of the restored filesets in the domain and resume normal operations. You should also file a problem report with DIGITAL; include the information you collected during the procedure.

F.2 Correcting Overlapping Frag Data Corruption

In versions of DIGITAL UNIX prior to Version 4.0D, a bug existed in AdvFS that could result in data corruption of user files. Only sparse files, those files that contain offsets at which no data has been stored, were affected by this bug. The bug has been fixed, but files created using older versions of DIGITAL UNIX may still be corrupted. In these files, AdvFS may have stored two different versions of a particular page (an 8k segment). User intervention is necessary to correct this problem. The files must be recreated or fixed with the verify command.

The verify command, located in /sbin/advfs, can detect files that have been corrupted by this bug. You should run verify, as in the following example, to see if any files are corrupted. Then, optionally, execute the command again with the -f flag to enable retrieval of the missing data.

# /sbin/advfs/verify test_domain

+++ Domain verification +++

 

Domain Id 32d3e638.000a46a0

 

Checking disks ...

 

Checking storage allocated on disk /dev/rz1a

 

Checking mcell list ...

 

Checking mcell position field ...

 

Checking tag directories ...

 

+++ Fileset verification +++

 

+++ Fileset test_fileset +++

 

Checking frag file headers ...

 

Checking frag file type lists ...

 

Scanning directories and files ...
Overlapping frag data corruption detected in:
File: <mount point>/50226.file.4
Page: 1
Run verify -f on this domain to enable recovery of this data.

 

Scanning tags ...

 

Searching for lost files ...

 

#

The verify utility has detected a corrupted file in the test_fileset fileset. The name of the file is 50226.file.4 and it is located in the uppermost directory of the fileset when it is mounted. The corrupted page is page 1. The verify utility also suggests running verify again, using the -f flag to enable recovery of the hidden data for page 1.

At this point, you have two choices:

Delete the file and recreate it. The corruption problem has been fixed on the system. The newly created file will not exhibit the unwanted behavior.
Execute verify -f to identify the corrupted data, as in the following example:

# /sbin/advfs/verify -f test_domain

+++ Domain verification +++

 

Domain Id 32d3e638.000a46a0

 

Checking disks ...

 

Checking storage allocated on disk /dev/rz1a

 

Checking mcell list ...

 

Checking mcell position field ...

 

Checking tag directories ...

 

+++ Fileset verification +++

 

+++ Fileset test_fileset +++

 

Checking frag file headers ...

 

Checking frag file type lists ...

 

Scanning directories and files ...
Overlapping frag data corruption detected in:
File: <mount point>/50226.file.4
Page: 1
Temporary files created representing the two versions of
page 1 of file <mount point>/50226.file.4
Refer to the Release Notes for a description
of how to use these temporary files to recover
from this overlapping frag corruption problem.
Scanning tags ...

 

Searching for lost files ...

 

#

The verify utility reports that it has created two temporary files in the same directory as the corrupted file. Mount the fileset to identify these two files:

# mount test_domain#test_fileset /test
# ls -l /test

total 169
drwx------   2 root     system      8192 Jan  8 13:23 .tags
-rw-r--r--   1 root     system     24576 Jan  9 12:27 50226.file.1
-rw-r--r--   1 root     system     40960 Jan  9 12:27 50226.file.2
-rw-r--r--   1 root     system     32768 Jan  9 12:27 50226.file.3
-rw-r--r--   1 root     system     24576 Jan  9 12:27 50226.file.4
-rw-------   1 root     system      8192 Jan 13 14:32 50226.file.4.page_1.ext
-rw-------   1 root     system      8192 Jan 13 14:32 50226.file.4.page_1.frag
-rw-r-----   1 root     operator    8192 Jan  8 13:23 quota.group
-rw-r-----   1 root     operator    8192 Jan  8 13:23 quota.user
#

The .ext and .frag files contain the following information from the corrupted area:

50226.file.4
This is the original corrupted file.
50226.file.4.page_1.ext
This file contains the hidden version of page 1 of the corrupted file. A read() system call cannot retrieve this data.
50226.file.4.page_1.frag
This file contains the fragmented version of page 1 of the corrupted file. This is the same data that a read() of page 1 would return.

To fix the corrupted file:

View the .ext and .frag files to determine which to keep. Note that you may want to merge the two files. If the 50226.file.4.page_1.ext file contains the data you want, enter:
# ln -s 50226.file.4.page_1.ext desired_page_1

If the 50226.file.4.page_1.frag file contains the data you want, enter:
# ln -s 50226.file.4.page_1.frag desired_page_1

If you must merge the two files, do the merge and put the result into a new file called desired_page_1.
Next, use the corrupted file and the new file (desired_page_1 in this example) to create a new fixed version of the corrupted file. Copy page 0 from the corrupted file into a new file:
# dd if=50226.file.4 of=newfile bs=8192 count=1 > /dev/null 2>&1
Append the page 1 that you selected to the new file:
# dd if=desired_page_1 of=newfile bs=8192 count=1 seek=1 > = /dev/null 2>&1
Append the remainder of the original file to the end of the new file:
# dd if=50226.file.4 of=newfile bs=8192 seek=2 skip=2 > = /dev/null 2>&1

Run the diff command on the new and the original file to confirm that only page 1 has changed and that the difference is what you want.
Rename the new file and remove the temporary files:
# mv newfile 50226.file.4
# rm 50226.file.4.page_1.ext 50226.file.4.page_1.frag desired_page_1

If you want, you can now run the verify command on the domain again to confirm that the data corruption problem is gone.