This appendix provides information about how to solve problems you may encounter with the Advanced File System (AdvFS).
Typically, an AdvFS domain panic signals a hardware problem. However, it also may indicate a software failure, particularly in a domain that contains critical system files, such as the root file system.
The benefit of an AdvFS domain panic is that it prevents further access by users to a single AdvFS file domain and allows the filesets in that domain to be unmounted and examined. In addition, all other AdvFS domains remain on line and in normal operation, unaffected by the domain panic.
To recover an AdvFS domain from a domain panic, you need to collect as
much information as you can about the condition of AdvFS filesets and
metadata, in case there are software problems you need to correct or
document for a problem report. Then, you need to take the steps that
will let you run the
verify
utility on the domain in an attempt to check the integrity of its
filesets.
To recover from an AdvFS domain panic, perform the following steps. If you cannot successfully complete steps 1-6, go to step 8:
mount
command to obtain a list of all the filesets in the domain. Then, use
the
umount
command to unmount all of the file sets in the domain. For example:
#
mount -t advfs
#
umount
fileset_name
Note that the filesets in a domain must be unmounted to run
the
verify
utility that checks them. See the restrictions noted in the
verify
(8)
reference page for more information.
ls -l
command on the
/etc/fdmns
directory for the domain in order to obtain a list of the AdvFS
volumes in the domain. For example:
#
ls -l /etc/fdmns/staff_projects1
vfile
command to collect information about the metadata files for each
volume in the domain, in case you cannot recover the domain with the
verify
operation. You need to record information about the bitfile metadata
table (BMT), the storage bitmap (SBM), the root tag directory, and the
transaction log file for each disk. See the
vfile
(8)
reference page for additional information. For example:
#
/sbin/advfs/vfile 0 0 rz3c > bmt_rz3c
#
/sbin/advfs/vfile 0 1 rz3c > sbm_rz3c
#
/sbin/advfs/vfile 0 2 rz3c > tag_rz3c
#
/sbin/advfs/vfile 0 3 rz3c > log_rz3c
dia
utility to extract information about the domain panic from the binary
error log, as documented in the
dia
(8)
reference page.
verify
utility on all of the filesets in the domain. For example:
#
verify staff_projects1
verify
utility exits successfully, mount all of the file sets you had
unmounted in step 1. You can resume normal operations. If the
verify
utility indicates that there is a problem, go to step 8.
In versions of DIGITAL UNIX prior to Version 4.0D, a bug existed
in AdvFS that could result in data corruption of user files. Only
sparse files, those files that contain offsets at which no data has
been stored, were affected by this bug.
The bug has been fixed, but files created using older versions of
DIGITAL UNIX may still be corrupted. In these files, AdvFS may have
stored two different versions of a particular page (an 8k segment).
User intervention is necessary to correct this problem. The files
must be recreated or fixed with the
verify
command.
The
verify
command, located in
/sbin/advfs
,
can detect files that have been corrupted by this bug. You should run
verify
,
as in the following example, to see if any files are corrupted. Then,
optionally, execute the command again with the
-f
flag to enable retrieval of the missing data.
#
/sbin/advfs/verify test_domain
+++ Domain verification +++
Domain Id 32d3e638.000a46a0
Checking disks ...
Checking storage allocated on disk /dev/rz1a
Checking mcell list ...
Checking mcell position field ...
Checking tag directories ...
+++ Fileset verification +++
+++ Fileset test_fileset +++
Checking frag file headers ...
Checking frag file type lists ...
Scanning directories and files ... Overlapping frag data corruption detected in: File: <mount point>/50226.file.4 Page: 1 Run verify -f on this domain to enable recovery of this data.
Scanning tags ...
Searching for lost files ...
#
The
verify
utility has detected a corrupted file in the
test_fileset
fileset. The name of the file is
50226.file.4
and it is located in the uppermost directory of the fileset when it is
mounted. The corrupted page is page 1. The
verify
utility also suggests running
verify
again, using the
-f
flag to enable recovery of the hidden data for page 1.
At this point, you have two choices:
verify -f
to identify the corrupted data, as in the following example:
#
/sbin/advfs/verify -f test_domain
+++ Domain verification +++
Domain Id 32d3e638.000a46a0
Checking disks ...
Checking storage allocated on disk /dev/rz1a
Checking mcell list ...
Checking mcell position field ...
Checking tag directories ...
+++ Fileset verification +++
+++ Fileset test_fileset +++
Checking frag file headers ...
Checking frag file type lists ...
Scanning directories and files ... Overlapping frag data corruption detected in: File: <mount point>/50226.file.4 Page: 1 Temporary files created representing the two versions of page 1 of file <mount point>/50226.file.4 Refer to the Release Notes for a description of how to use these temporary files to recover from this overlapping frag corruption problem. Scanning tags ...
Searching for lost files ...
#
The
verify
utility reports that it has created two temporary files
in the same directory as the corrupted file. Mount the fileset to
identify these two files:
#
mount test_domain#test_fileset /test
#
ls -l /test
total 169 drwx------ 2 root system 8192 Jan 8 13:23 .tags -rw-r--r-- 1 root system 24576 Jan 9 12:27 50226.file.1 -rw-r--r-- 1 root system 40960 Jan 9 12:27 50226.file.2 -rw-r--r-- 1 root system 32768 Jan 9 12:27 50226.file.3 -rw-r--r-- 1 root system 24576 Jan 9 12:27 50226.file.4 -rw------- 1 root system 8192 Jan 13 14:32 50226.file.4.page_1.ext -rw------- 1 root system 8192 Jan 13 14:32 50226.file.4.page_1.frag -rw-r----- 1 root operator 8192 Jan 8 13:23 quota.group -rw-r----- 1 root operator 8192 Jan 8 13:23 quota.user #
The
.ext
and
.frag
files contain the following information from the corrupted area:
50226.file.4
This is the original corrupted file.
50226.file.4.page_1.ext
This file contains the hidden version of page 1 of the corrupted file. A read() system call cannot retrieve this data.
50226.file.4.page_1.frag
This file contains the fragmented version of page 1 of the corrupted file. This is the same data that a read() of page 1 would return.
To fix the corrupted file:
.ext
and
.frag
files to determine which to keep. Note that you may want to merge
the two files. If the
50226.file.4.page_1.ext
file contains the data you want, enter:
#
ln -s 50226.file.4.page_1.ext desired_page_1
If the
50226.file.4.page_1.frag
file contains the data you want, enter:
#
ln -s 50226.file.4.page_1.frag desired_page_1
If you must merge the two files, do the merge and put the result
into a new file called
desired_page_1
.
desired_page_1
in this example) to create a new fixed version of the corrupted file.
Copy page 0 from the corrupted file into a new file:
#
dd if=50226.file.4 of=newfile bs=8192 count=1 > /dev/null 2>&1
#
dd if=desired_page_1 of=newfile bs=8192 count=1 seek=1 > = /dev/null 2>&1
#
dd if=50226.file.4 of=newfile bs=8192 seek=2 skip=2 > = /dev/null 2>&1
Run the
diff
command on the new and the original file to confirm that only page 1
has changed and that the difference is what you want.
#
mv newfile 50226.file.4
#
rm 50226.file.4.page_1.ext 50226.file.4.page_1.frag desired_page_1
If you want, you can now run the
verify
command on the domain again to confirm that the data corruption
problem is gone.