Too Busy For Words - the PaulWay Blog

Fri 1st Sep, 2006

Rocket and Jifty

Work on my Rocket module continues slowly. Slowly, because I've discovered yet another project - a database of music, instructions and notes on Irish Set Dances. It's an extrapolation of the spreadsheet I had that showed which sets were on which CDs and whether I had instructions for them; now it's fully relational so you can have multiple CDs and multiple instruction sources for each set, as well as recording the other tracks on each CD so that if you need waltzes (for example) you know which CDs they're on. Now that I've got the database structure set up, I've started using Rocket again to do the basic CGI work, so I've got back into working on the module again.

However, a friend mentioned Jifty, something he calls "Perl On Rails". A quick look at the Jifty website (which uses a wiki called Wifty - guess what it's based on) shows that, yes, indeed, it does have a lot of similarities to Ruby On Rails - Model/View/Controller structure, centralised config, templates with overrides - without the hassle of learning a new language that's irritatingly similar to one I already know and without a name that's an irritating simile for a long length of steel used to support things. I'm installing it on my home server. My does it need a lot of Perl packages, though...

Last updated: | path: tech / perl | permanent link to this entry

Insert witty rejoinder here

Michael Ellerman[1] has noticed that I let a bit of hyperbole creep into my piece about the reliability of Pebble Bed Modular Reactors. OK, so they're not completely fault-tolerant: they won't withstand this decade's popular Grand Failure Mode (i.e. hit it with a plane) or an earthquake - at least not without a big containment facility. Which most designs include as a matter of course. The point I was trying to make in saying 'completely fault tolerant' was that disasters like Three Mile Island, Sellafield, and Chernobyl were caused by human error and designs which relied on some level of mechanised control to stop a full scale criticality incident. The Pebble Bed design means that the fuel spheres heat up as they expand, thus forcing them apart and reducing the amount of nuclear heating. The system is thus self-limiting - you can't get a criticality because the fuel particles never gets close enough.

And I think you need a bit of checking on your opinions about the shutdown of the German AVR plant. The operators accidentally damaged a defective fuel sphere in removing it from the reactor, and, yes, the actual fuel pellet (which is about as big as a grain of wheat) was released. And, yes, this incident led to the plant being shut down. So, yes, an accident occurred - just like the many thousands of accidents that occur in coal-fired power stations every year without the press getting agitated. But the design is not inherently unsafe. And when you're dealing with a new technology, there's always going to be some teething troubles. This shouldn't discount further improvements to the technology, any more than the cotton mill fires in the early Industrial Era caused us to not use cotton as a clothing fibre.

I'll go further in the Full Disclosure quest: the major problem with the design of the Pebble Bed Reactor is that the pebbles must be kept away from oxygen. The reactor's typical all-control-removed idling temperature is around 1500 degrees, and the spheres are made from a compound including activate carbon, which will burn quite well at this temperature if it ever got near oxygen. All the reactor designs use either Helium or Nitrogen, or possibly carbon dioxide (although I would consider this a bad design idea), as a coolant medium. The entire reactor 'core' and its heat exchanger area is filled with this inert gas, so no contact with air or anything that can produce oxygen can happen. So, if you really did spear through it with an ultraviolet laser[2] and let some air in, it would probably be about as bad as Chernobyl.

Don't do that, then.

Ultimately, this leads to my overall point, which I'd hoped I'd made in my previous post but I fear I may have to reiterate. I'd love to see the entire world powered by solar power in some form or another (by 'solar power' here I mean all energy derived, directly or indirectly, from the sunlight falling on our planet now). But looking at the inertia of the fossil fuel industry and the governments that are implicitly supporting it with tariffs and tax breaks, I don't see this happening in my lifetime. But the current projections have oil and coal running out (as in 'almost completely stopping') in about thirty to forty years anyway, and we have an exponentially increasing requirement for electricity and fuel, particularly from places like China and India. As I see it this means two options:

  1. Entire areas of the globe run out of power as the fuel fails.
  2. We use nuclear power until solar power can meet our entire energy needs.

We can't just continue to turn on more computers, more homes, more cars and more offices, with our power supply coming from a dwindling supply of fossil fuels. I don't have the time nor energy (hah) to research how much money it would take to convert the entirety of the Australian power generation - 1485 Terawatt-hours (a third of which is lost in the process of sending the power to where it's used, by the way) - to solar power. But I bet it would be measured in GDP-years - spending the entire GDP of Australia for years and years just to get to the point where we're generating enough power to run Australia today.

It's just not going to happen yet.

I should also address the many cries of "solar power is too expensive" - things like solar panels using more energy to build than they produce, or wind farms costing more money to build than conventional power generation, or even that wind-power doesn't completely eliminate the need for existing power generation and therefore it's useless. Again, these arguments miss the point - they justify the existing technology as if its fuel was unlimited, where it clearly is not. Sure, it costs a lot to build, and maintain, a wind farm, or a solar furnace generator, or a solar panel farm the size of the Great Victoria Desert. But they will be producing energy and emitting minimal harmful wastes long after the coal power stations have shut down and the oil and gas burning power stations are silent. And we will come up with new, better technologies - even power from space - as we improve these new technologies, which won't happen if we pull our heads into our shells and pretend that coal is infinite.

And I'd just like to say, to all those people who don't want a wind farm in their neighbourhood or don't want a solar farm cluttering up the beautiful rolling countryside of Mildura: "Fine then. Also do us the favour of switching off your power and your car. Because we can't afford to keep on supplying electricity to you if you don't want to help in its creation. And, while you're about it, tell your "Not In My Back Yard" complaints to the people who live near all the coal-fired power stations, oil refineries, service stations, trucking companies, and, in fact, every other industry that supplies you with your goods and services and power, because it was their back yard too and you don't mind them sacrificing a bit."

*pant pant pant* Insane ... rage... subsiding...

So to me the only alternative to global industrial collapse as the power gets shut off and things cease to move is to use nuclear power, and try like hell to make it safe, and try like hell to move to having power that's generated from the sunlight falling on our planet now rather than on radioactive decay or the frozen sunlight in oil and coal. I wish it were otherwise.

BTW, I believe that we may discover how to correctly harness the 'ladder-down' process that Wil McCarthy talks about in his book "Bloom" - which could be used to transmute the radioactive elements in reactors into harmless, non-reactive elements. ("Bloom" talks about the streets being paved with gold, because it's so easy (in the novel) to transmute heavier elements into gold that it's become a waste material.) This might mean that current nuclear reactors can eventually be made totally safe, even if the technology doesn't exist now. (Hey, people are already having their heads or bodies cryogenically frozen in order to get to a medical science that can cure their diseases.) But I'm not betting on that to justify nuclear power.

And, also incidentally, I heard from a friend of a talk that refuted a lot of the myths surrounding wind power. One idea that they talked about was to have entire hedgerows of wind generators no larger than an average tree. So they don't create the same eyesores as a gigantic hundred-metre tower, they produce hundreds of megawatts, and they even believe that this contributes to slowing down the winds that cause soil erosion and other environmental problems. So there are benefits to 'green' power that go outside just the production of electricity without the consumption of non-renewable fuels.

[1] - if I'd known it was going to be that easy to instantly achieve lasting fame and congratulation for a myriad kernel developers, I would have started sooner :-) But I fear it may take a bit more than just a patch; it also requires that developers change some of their coding standards to not use constructs that will break on maintenance, and then come up with crufty hacks to get around these problems. And it seems to me that this is not only going to be sooooo much harder, but is going to lead to infamy and ridicule rather than fame and fortune. :-) Seriously, though, it's not a bad idea - I'll think about coding it up and submitting the patch and I'll let you know how I get on.

[2] - because you presumably have put the reactor in a building that will withstand earthquakes, planes being dropped on it, and other fairly well imaginable disaster scenarios.

Last updated: | path: society | permanent link to this entry

A new approach to the problem

I've revisited a program of mine that I've written for work. The basic idea is to read a set of DNA sequences and produce different files which list all the different 'subsequences' (jargon: oligonucleotides) of different lengths, and how many we saw of each. I presented this idea at a Programmers SIG a while back and the reply came back: write them out and use sort(1) and uniq(1) -c. In my vanity I hadn't thought of this, and was instead making up great complex data structures and slab memory allocators and all sorts of junk. Then I was diverted to work on other projects; when I came back all this looked like so much gibberish.

So I wrote a simple test and showed that, for an average set of virus sequences, I get about files ranging between 24MB (six-letter words, called 'six-mers' in molecular biologist's jargon) to 77MB (21-mers). Sure enough, sort and uniq produce what I want in a not inefficient manner. But I'd like to run this as a single standalone executable, much as that's against the Unix piping ethos. For efficiency reasons I generate all the files simultaneously, and the thought of forking off fifteen separate sort | uniq -c pipes make me shudder. There must be a better way, I think.

The first obvious improvement is to keep the lists in memory and use some kind of in-memory sort function. The files contain about three and a half million words apiece, so it would be possible using qsort(3) to fill a 21MB array with six-mers (since you wouldn't have to store the null at the end). There's a setting in talloc to allow grabbing chunks of memory greater than 64MB in size, so doing even the 21-mers (77MB in size) would be possible using this method.

The problem, though, is that the way I generate the sequences is to generate all the different ranges simultaneously - doing it in one pass through the sequences. Storing all of them in arrays simultaneously would require 812MB (roughly), and this seems to not be much better than my previous dodgy tree implementation.

Then I realised: all the six-mers are just the prefixes of all the seven-mers, plus all the six-mers that didn't fit in seven-mers. This process applies right up the scale. So I could generate an array which contained the longest strings (null-terminated) available at each location (up to the maximum length required), and sort that. If I did that with fixed-length 32-byte strings (more than enough for all the things we've been doing so far) you'd use 112MB or so. That now contains an ordered list of all the strings of lengths between the minimum and maximum we're using. In order to extract all the N-mers, ignore all strings of length less than N, and take the first N characters of the remaining strings. They're still in order, so counting their frequency is a simple matter. You could even do this in parallel, in one pass across the array (which would increase cache read performance).

Consider, for a moment, though, if you can't allocate huge arrays like that. So you have to break the single array into smaller, more manageable arrays, sort each in turn, and do a merge-condense to recombine each block. Which lends itself to parallelisation: each slave is given a contiguous chunk of the sequence (i.e. one with no non-sequence characters in it - like being given a paragraph), and breaks it up into words, sorts them, and then returns the data to the master (either in single messages, or better yet in chunks) for merging.

But think the sort process through: most common sort routines recursively descend through successively smaller chunks of the file, sorting each and then combining the sorted bits back together into larger sorted chunks. Could this process not also combine elements that were the same? So we might sort records of a string and a count (e.g. 30 byte strings and a two-byte count), initially starting each count at one but as each sort finds duplicated strings they be combined and added together? This would also compact the data storage as we go, which might well mean that it might be good to read each sequence into a separate block, which is then sorted and compacted independently (giving a certain granularity to the process) and then the final merge process happens N-way rather than two-way. If that's too complex, make each merge two-way but just do as many two-way merges as necessary to get all the data into one sorted array, which would now be also compacted and contain the counts anyway.

The best part of this AFAICS it it's also neatly parallelisable. I can even write an application, which, if it finds a PVM machine set up, can distribute itself amongst the hosts neatly, but if it doesn't it can still operate in 'single thread' mode and do all the work itself. Which, in turn, means that as long as I keep in mind that at some stage I could distribute the workload, I won't end up having to substantially rewrite the program (as it was looking with the last method).

So, you see, I do do things other than rant and rave sometimes.

Last updated: | path: tech / c | permanent link to this entry

All posts licensed under the CC-BY-NC license. Author Paul Wayper.

Main index / tbfw/ - © 2004-2016 Paul Wayper
Valid HTML5 Valid CSS!