Too Busy For Words - the PaulWay Blog

Why I Love Computers Part 001

At my work I've written a program which is designed to find 'conserved sites' in DNA - places in a set of sequences that don't change much. It uses a variety of different methods of ranking sites and chooses the top N based on which ranking method you pick. In order to test this I wrote another program to pick random subsamples of a set of viruses, and Mark asked me to compare how many sites chosen by each of the methods were in the full sequence, for each of the five random samples and four random sample sizes (5%, 10%, 20%, and 50).

So naturally I wrote another program to do that calculation for me (after doing about half of a page of 20 by hand, throwing the pencil across the room and saying "Why am I doing it the hard way?"). (This took half an hour to write the program, versus more than two hours to do it by hand - always a good sign.) Then I wrote another program that expanded further on that and did the entire statistical summary for me, being able to tell the difference between each data file by its name (which I had kept to a good pattern, based on long experience). This output its results in CSV format, which I loaded into OpenOffice Calc and used to made nice graphs.

We took the plots of three of these comparisons to a statistician at ANU who said that it sounded like we were being reasonable with numbers and agreed that the method that I didn't like didn't make a lot of sense. Later, Mark asked me to produce the plots for the other two comparisons. Rewrite the command line to do the other two plots and run the program took ten seconds. That's the sort of work I like to do.

Last updated: | path: tech | permanent link to this entry

Too Busy For Words - the PaulWay Blog

Wed 12th Apr, 2006

Why I Love Computers Part 001