[TriLUG] Help with a "broken" LVM drive
Brian McCullough via TriLUG
trilug at trilug.org
Wed Oct 20 15:40:06 EDT 2021
Folks,
The more that I work on this, the more that I feel that paranoia is the
better part of valour.
I recently ( a week ago or so ) had a drive go bad on me, bad blocks,
unreadable sectors, the works.
Because of that, the machine will not boot, and stops during that
process and asks for the root password.
I removed the drive, bought a "work" drive and started with ddrescue.
According to fdisk, there are three partitions on this drive: a Windows
partition of more than half the size ( which is a bit strange, since it
would have had no use in one of my machines ), a 300GB LVM partition,
and a Linux partition of about 30G. OK, it has been so long since I
first set up this drive that I have to accept this, I guess. It is a
3TB portable Seagate, so I can accept that it was initially
pre-formatted as Windows.
Using ddrescue, I was apparently able to recover both the LVM and Linux
partitions ( separately ).
When I ran ddrescue on the LVM partition, it ran for some hours and then
completed, recording zero errors.
I found some data at the beginning for the LVM partition that indicated
that it had been an LVM partition. I found the string " LVM2 " at just
after 0x01000, and something that looks a lot like a ( partial )
vgconfig file at 0x01200, even if that seems to start part-way into that
file.
I don't find any evidence of LVM on either of the other two partions.
I did not find the "LABELONE" tag which should be at the beginning of a
PV, just zeros. ( possibly part of the ddrescue process??? )
OK, that's the preamble. ( Sorry )
Since I am able to start to boot the machine, I was able to get to
/etc/lvm/backup and retrieve the appropriate vgconfig file that this
partition should be part of. In the data on the partition itself, I
found what should probably be the UUID for this PV. What bothers me is
that I do not find that UUID in the current vgconfig backup file. (
Well, one of the things that bothers me. ) That UUID does show up in
archive versions of that file. However, if that UUID is no longer used,
why does the machine not boot? Oh, well.
I also purchased a drive to replace that partition, and used ddrescue to
copy the contents to it. However, at the moment, the partition is not
recognized as a PV.
If I use the UUID that I found on that partition and run "pvcreate
--uuid" should I use "--restorefile" or "--norestorefile"? I tried this
with --restorefile, using the data that I dd'd from the partition, which
turned out to be an incomplete vgconfig file, and it overwrote what was
on the partition at 0x01200, so I stopped at that point. However, now
that I have been examining things, including the vgconfig backup file
from the machine, that have been properly successful if I was using the
proper vgconfig file.
Now that I have talked through all of that, I am wondering whether I am
possibly heading down the correct path, and have some hope of restoring
this machine to operation?
Do you have any further suggestions for tests or things that I should be
doing, or whether I should just go ahead with pvcreate? If I run
pvcreate on a different machine ( where I am doing all of my recovery
work ), I suspect that I should not run vgcfgrestore until I re-attach
the new drive to the original machine, correct?
( The instructions that I have been reading say to run pvcreate followed
by vgcfgrestore. )
If there anything more that I can say, answer questions or perform any
more experiments, just say so.
Thank you,
Brian
More information about the TriLUG
mailing list