Too Busy For Words - the PaulWay Blog

CFile now does bzip2

Over the last couple of days, I've actually been getting to do a fair bit of actual coding. It started with me adding the basics of support for bzip2 compression into my CFile library. Then I decided to redo my subversion repository for the libraries I've written so far (cfile, progress and pwlib) into separate directories, rather than the more standard but somewhat restrictive trunk|branches|tags structure that the Subversion book recommends.

Today I've added the rest of the bzip2 support, namely being able to read lines from them. It involved me copying an implementation of fgets that I found in stdio.c and implementing my own fgetc for bzip2 using array buffers. The hard part is detecting EOF, because it seems that the BZ2_bzerror routine doesn't actually return BZ_STREAM_END when the stream is at an end, it just returns BZ_OK. But BZ2_bzread will return 0 bytes if you're at the end of file, so I detect that and return EOF accordingly.

This also gave me the impetus to correctly detect EOF in the rest of the code, something that I hadn't implemented correctly. I'm still not sure I'm obeying whatever ANSI or POSIX guidelines there are on this subject, but the test-cat program I've written reports no differences between the original uncompressed file and being fed through CFile, so I'm assuming I'm doing something right.

My experience, given that I can be reading 1.5GB uncompressed sequence files, is that compressing the inputs and outputs saves not only space, but time (to read and write the files). I noticed the other day that The Gimp also allows you to read and write files with .bz2 or .gz extensions as if they were the uncompressed images. Hopefully the CFile library will give that functionality to people who only want a dependency on Talloc, rather than on half of the Gimp, an external compression program, and enough spare filesystem space for the uncompressed file...

Last updated: | path: tech / c | permanent link to this entry

Too Busy For Words - the PaulWay Blog

Mon 9th Oct, 2006

CFile now does bzip2