Too Busy For Words - the PaulWay Blog

Tue 21st Nov, 2006

Not learning the hard way

It's now mid Sunday, and I'm mentally and physically wrecked. This is partly due to a throat infection thing that's going around, and partly due to weird LVM stuff, and partly due to sheer bloody-minded stupidity.

It started a couple of days ago, when I took a day off because of a sore throat. I decided, finally, to upgrade my MythTV machine to Fedora Core 6, and in the process remove the old boot drive and change over to a new Logical Volume (LV) under LVM. After several attempts at this problem before, I'd decided to use a mirrored LV to store the root volume on. Luckily I had three disks - mirroring in LV requires a disk per mirror and an extra for the 'transaction log' - and I set it up, copied the old root file system to the new mirror, and that was enough for Fedora Core to recognise in order to install.

But, after a couple of mysterious crashes that ended up in file system checks throwing pages and pages of errors, I started really wondering. LVM is wonderful and stable and allows you to agglomerate disks in ways you would otherwise pay lots of money for hardware solutions to achieve, but my experience so far is that when it goes bad, it starts getting rather difficult to recover. Having the root file system stored in a way that I wasn't sure I could ever recover if one disk went bad - all FAQs and HowTos to the contrary - I decided to go back to plain old partitions.

I bought a 400GB disk for $199 at Aus PC Market with the intention of pensioning the 160GB drive off in the MythTV machine, and giving it a bit more recording headroom. But for various otiose reasons Friday knocked me out and I started feeling very congested and the sore throat had returned from Wednesday. Unable to sleep, I put the new drive in the MythTV machine, partitioned it, copied all the files from the old root file system across, and booted it - it came up fine. In a fit of what seemed at the time to be inspiration but I now know to be a madness brought on by addiction to Lemsip, I also decided to move the data off one of the 250GB drives temporarily so I could partition it.

Long ago when I was setting up the system, I had realised that LVM PVs can be created on the raw disk device as well as in partitions. This sounded like a brilliant idea - no partition to worry about, LVM could put LVs on it anyway, and one less command to perform. Interestingly, you also get about 96MB of extra space. However, this decision has come back to haunt me.

Firstly, back when I was first trying to eliminate the old 40GB disk, I wanted to have a three-way RAID. LVM doesn't do that, but MD does. But you need three partitions the same size. I couldn't repartition /dev/hdb because, well, there wasn't a partition on there to alter. So that idea eventually went out the window.

Now, I thought, I could lay the problem to rest. I had used pvmove before to move space off a SATA disk that I'd bought without knowing that (at the time, at least) the way SATA drives are accessed also causes my DVB cards to stutter (I think it's something to do with DMA, but I haven't traced this down). So, innocently, I issued pvmove /dev/hdb /dev/hda3.

Nothing happened. It wouldn't respond to Ctrl-C or Ctrl-Z (although other characters, uselessly, came up fine). Then every process that tried to access the LVM also seized up. "OK," I thought, "reboot and it'll be fine." But no: rebooting threw up a bunch of errors about a bad LVM state and kernel panicked. It's 5AM and I'm not feeling well and I have a dead MythTV machine - brilliant.

Of course, to add to my complications, I had returned to the old four-drive problem - I had to unplug one of the LVM drives (and thus render the LVM inoperable) in order to plug in the DVD drive to install something. I had the old MythTV partition still backed up in LVM (hopefully), so I reinstalled Fedora Core 6 from scratch (after a bunch of fruitless searching about how to disable the LVM checks at boot-up - it's possible, but you have to edit the nash init script and repack your initrd image and even then it didn't work perfectly; I was hoping for a nice kernel command-line option). Oh, and I have to install in Text mode because I didn't feel like lugging the monitor from downstairs, and even though the NVidia GeForce 5200 will display boot-up on all monitors and TV sets you have plugged in, it won't thereafter show any graphical modes on the TV without options in the Xorg config. Yay.

The new Fedora Core install allowed me to do a pvmove --abort, which then allowed me to see the storage VG and the old root VG. "Hell," I thought, "while I'm here I'll just rebuild the thing from scratch - I've got too much ATRPMS kruft in there anyway." That merrily ate up the hours from six until nine - copying config across, setting daemons to start, turning unwanted services off, updating the repository config with local mirrors, getting the video drivers working again, and so forth.

That night, I woke up for otiose reasons at about four in the morning. Unable to get back to sleep, I decided to look at the config again. The wool in my head and the nettles in my throat made me decide that retrying the pvmove command would be perfectly reasonable - it must have been a temporary glitch. This time, just in case, I dd'd the entire newly-created partition over to another system on my network, created a new 'old root' LV that wasn't striped, mirrored or afraid of water, copied the old 'old root' LV over to that, and removed the old one just in case it was something to do with the mirroring that had caused LVM to bork out. Now secure in my preventative measures, I issued the pvmove command again.

Same result. System locked up.

I rebooted, this time using the System Rescue CD, which allowed me to see the network and the partitions. Right, copy the dd image back again, and reboot... Nope, same problem. Worse, now the LVM partition on /dev/hda3 doesn't exist. Hmmmm. This is bad. Hmmmmm. /dev/hda3 sounds familiar - with that growing horror that computer problems specialise in, I realise that I copied the 20GB partition to /dev/hda3 (the LVM PV) rather than /dev/hda2 (the ext2 file system). Bugger. I can boot, and everything runs, but now the VG won't come up because one of its PVs is AWOL.

I tried grabbing the first couple of sectors of another PV, inserting the correct UUID (which, fortunately, the VG still knows about and includes in its complaints) in the correct spot (after a bit of guesswork - thank Bram Moolenaar for the binary editing capabilities of vim). Nup, no luck - didn't think I could fool it that easily. No-one in any of the IRC channels I was in could offer any assistance (#lvm on freenode is usually quiet as a grave anyway).

One of my worst habits is the way I avoid any problem that's stumped me a bit. Several games of Sudoku, Spider and Armagetron and a lot of idle chatting on various IRC channels later, I was still no nearer a solution. Then, realising that no-one was going to help me and I had to do it myself, I probed around in the options of pvcreate, and found I could specify a UUID. Brilliant! Suddenly the PV, VG and LV was back on the air. Five hours after I'd woken up, I collapsed back into bed. It was Sunday. (At this point, LVM hadn't put anything permanently in the /dev/hda3 PV, so it was merely a question of making sure it was included.)

That afternoon, I made sure that MythTV was going to update its programme guide and relaxed, watching a few TV shows. It seemed an uncommon luxury.

Last updated: | path: tech / fedora | permanent link to this entry

All posts licensed under the CC-BY-NC license. Author Paul Wayper.

Main index / tbfw/ - © 2004-2016 Paul Wayper
Valid HTML5 Valid CSS!