Today I've added the rest of the bzip2 support, namely being able to read lines from them. It involved me copying an implementation of fgets that I found in stdio.c and implementing my own fgetc for bzip2 using array buffers. The hard part is detecting EOF, because it seems that the BZ2_bzerror routine doesn't actually return BZ_STREAM_END when the stream is at an end, it just returns BZ_OK. But BZ2_bzread will return 0 bytes if you're at the end of file, so I detect that and return EOF accordingly.
This also gave me the impetus to correctly detect EOF in the rest of the code, something that I hadn't implemented correctly. I'm still not sure I'm obeying whatever ANSI or POSIX guidelines there are on this subject, but the test-cat program I've written reports no differences between the original uncompressed file and being fed through CFile, so I'm assuming I'm doing something right.
My experience, given that I can be reading 1.5GB uncompressed sequence files, is that compressing the inputs and outputs saves not only space, but time (to read and write the files). I noticed the other day that The Gimp also allows you to read and write files with .bz2 or .gz extensions as if they were the uncompressed images. Hopefully the CFile library will give that functionality to people who only want a dependency on Talloc, rather than on half of the Gimp, an external compression program, and enough spare filesystem space for the uncompressed file...
All posts licensed under the CC-BY-NC license. Author Paul Wayper.