Too Busy For Words - the PaulWay Blog

Jumping into a new project

(Nearly all of this was written after the PSIG meeting on the 9th of November; then I got too busy and didn't finish it off. So "tonight" is two weeks ago as of this posting.)

Tonight at the Programmer's SIG we were 'supposed' to be having a sort of round-table discussion, with people with ideas meeting up with people who know how to implement them. Or, at least, have more knowledge into the way that Linux is organised and may be able to recommend language choices, libraries to look for and people to speak to. If any of those people had actually turned up, this would have happened. But they didn't.

After the usual early round of "Hey have you seen this cool stuff / weird shit" as meals were served (amazingly quickly, this time), I tried to jump start the thing by asking what people's ideas were. Maybe it's just me - this didn't seem to get any real discussion started. Conversation kept revolving around Pascal Klein's idea for rewriting the Linux kernel in C#, and the multivarious reasons why this would be a Bad Thing. As amusing as it is to discuss bad language choices, the things we hate about customers, and what's new on Slashdot, this wasn't really doing it for me as someone who a) has ideas and b) is a programmer.

Despite the good nature of Steve Walsh's teasing, I do worry that I'm talking too much about my own ideas. I say this because we then had a long and quite spirited discussion about how to solve a problem with my backup process. It started with me noting that I'd thought of an improvement to rsync:

At the moment, rsync will only try to synchronise changes to a file if the destination directory has a file with that name. If you've renamed the file, or copied it into a new directory, then rsync (AFAIK) won't recognise that and will copy the entire file again. However, rsync already has a mechanism to recognise which files are the same - it generates a checksum for each file it encounters and only copies the checksums if the file is different. So the idea is for the receiver to check if it already has a file with that checksum somewhere else. There's more to it than this, but I'll develop that in another post.

This all supports my partner's method of backing up her PhD - every once in a while, she takes all the files so far and copies them into a directory named 'Backup date'. Separately to this, I then rsync her entire directory up to my brother's machine in Brisbane, as an off-site backup. While I'm not especially worried about the time it takes or the amount of data transferred, since rsync's principle aim is to reduce both of these I thought it would be a useful improvement to optimise for the case where a file has been renamed on the client - why transmit the whole file again if you can just copy and delete on the server?

I suppose the thing I enjoyed was the idea of co-operatively solving a problem using the tools at everyone's disposal. Several people suggested that Revision Control Systems would be better in this scenario, because they would only store the diffs and would give instant reversion to any point in time. Other people suggested automated folders that would pick up the files in a 'drop' directory, put them in an appropriately labelled directory, and then start a remote copy of the appropriate folder on the remote server. Other people suggested that having two backups was overkill - that as long as I had the remote server updated I could retrieve backup copies should anything go wrong locally. All of these were good suggestions, and despite the problem that they didn't really solve the problem the way I wanted it to be solved, I did really appreciate the new ideas and approaches.

That led me to my next question, which was: rsync is a largish and complicated piece of software. The philosophy of Open Source says that if you have an idea, you should modify the source rather than ask someone else to do it; and I can program in C so the source of rsync wouldn't be foreign to me. So where do I start? One approach suggested was to generate a tags file and start tracing through the execution of the main routine; another was to find the printed text messages that are generated at the time that I want my revision to be used, and start reading from there. A further approach was to draw a concept map - sketch out the top-down design of rsync in order to narrow down the code I had to read. All excellent suggestions, and when I have some spare time I shall try them.

Then we had some real nuts-and-bolts stuff; I showed Hugh how to do Doxygen documentation, and Daniel showed me a bit about autoconf/automake and how to integrate them into my coding. He also suggested a technique of checking for the existence of a library at runtime (e.g. libmagic) in order to determine whether we should call the libmagic routines to check file type; unfortunately I can't now remember what this magical call was. I should have been writing this nine days ago.

It started out not looking so good, but I think it was one of the better Programming SIGs I've been to.

P.S. I've also learnt tonight that, if my WiFi is connecting and then almost immediately disconnecting after showing now signal strength, unloading and reloading the kernel module (after stopping the ipw3945d service) will reset it; starting the ipw3945d service again will get things back on track. Or so it would seem from this initial test.

Last updated: | path: tech / clug | permanent link to this entry

Too Busy For Words - the PaulWay Blog

Thu 9th Nov, 2006

Jumping into a new project