http://www.debian-administration.org/users/Grimnar/weblog/27#comment_5
With drives disappearing/reappearing I'd be tempted to say you have a pending drive failure, or other hardware problem.
Run dmesg and look for disk errors with something like this:
dmesg | grep hd
Steve
I agree with Steve, above, about pending (or existing) disk failures. You might want to look into the smartmontools suite as well. You can schedule tests for your disks, and query them directly about their status.
write back with any info from smartctl when you've got it.
to run a bunch of short tests on all your disks, just do:
for disk in /dev/hd? ; do smartctl -t short "$disk" done
There should be some messages printed about what time you should expect the tests to complete. once the tests have completed (probably 5 minutes max for most short tests, but depends on the disks), you can read the info with:
for disk in /dev/hd? ; do
echo "===${disk}==="
smartctl -a "$disk"
done > diskreports.txtYou should then be able to read the diskreports.txt file to see what the disks have to say for themselves.
For a single disk, of course, the commands are even simpler:
smartctl -t short /dev/hdX ## wait until the suggested time smartctl -a /dev/hdX
documented on: 29 Sep 2007, dkg
% smartctl -t short /dev/sda; sleep 2m; printf '\a' smartctl version 5.37 [i686-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen Home page is http://smartmontools.sourceforge.net/
Sending command: "Execute SMART Short self-test routine immediately in off-line mode". Drive command "Execute SMART Short self-test routine immediately in off-line mode" successful. Testing has begun. Please wait 2 minutes for test to complete. Test will complete after Sat Mar 29 15:46:13 2008
Use smartctl -X to abort test.
% smartctl -a /dev/sda
Short self-test routine recommended polling time: ( 2) minutes.
SMART Error Log Version: 1 No Errors Logged
SMART Self-test log structure revision number 1 Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error # 1 Short offline Completed without error 00% 27420 -
documented on: 2008-03-29