Too Busy For Words - the PaulWay Blog

Junk DNA Is Watching You

I note Leon has linked to a story about the IBM research into repeated bits of the human genome. Coincidentally, I've got that very paper on my desk as it's in a field that I'm working on. But I'd like to tell you about a lecture I heard from a professor from UQ on the issue of what all this 'Junk DNA' is really used for. If I could remember names, or technical details, I'd use them, but for now you'll just have to cope with my limited memory.

A quick revision for you non-biologists: DNA is transcribed and translated inside the cell to produce proteins, which then go out of the nucleus and do their work in the body. The shape of the protein, as it folds up in three dimensions as it's being created, is what gives the protein its particular abilities - it binds to the three-dimensional shape of whatever it's supposed to work on. There's a fairly direct mapping between the DNA and the protein which it produces it, so it's relatively easy to find the bits of DNA that 'code for' a particular protein. These are called 'coding regions', and the 'non-coding regions' are where this 'junk DNA' lies.

Firstly, junk DNA is not restricted to humans. Anything past prokaryotic stage - anything complex enough to have a cell wall and a nucleus to contain its DNA - has junk DNA. The more 'complex' (in an arbitrary, non-technical sense) the organism, the more of this 'junk DNA'; the actual number of proteins that the DNA codes for stays roughly the same (in fact, some viruses express more proteins than our DNA does). And the most interesting thing is that large tracts of the non-coding regions are still transcribed perfectly across generations, which implies that there's a lot of positive pressure for them to be there. More mutation occurs in coding regions than some of these non-coding regions!

This Professor's hypothesis is that the 'junk DNA' codes for proteins (or even RNA - roughly, single-stranded DNA) that stay inside the nucleus and regulate when various proteins are made, and possibly even how it's folded. This would explain a lot of what we don't know about protein production, which is mostly in the are of why the body produces some particular protein at some times and not others. They do have evidence to show that RNA within the nucleus affects and regulates protein production. Research continues, as far as I know.

It's like observing a machine from the other side of the internet. You have the source code but it's not in any language you understand, and you're trying to deduce what parts of the code do. You can map what inputs and outputs it has, and from those you can pick up what bits of code might produce those messages. But the memory management stuff? The swapping? The disk IO routines? Even the process management code is never physically represented by a single packet sent from the machine. So you write off all that code as 'junk code' and don't worry about trying to understand it.

Stupid, eh?

One final challenge that the Professor offered, which I think is worthy of the minds of Open Source: Try to come up with a way of encoding a picture such that the picture contains the instructions to build itself, and the machinery to execute those instructions, at any scale, and is still a recognisable picture (i.e. simple quines don't count - it has to look like something.)

Last updated: | path: tech | permanent link to this entry

Too Busy For Words - the PaulWay Blog

Sun 30th Apr, 2006

Junk DNA Is Watching You