Too Busy For Words - the PaulWay Blog

Wed 12th Jul, 2006

Shorter isn't always better!

I'm reading lines in a format I invented that uses a simple run-length compression on zeros: a line can be something like '1 4 2 8 5=7 3 6' and that means 'the values 1, 4, 2, 8, five zeros, 7, 3 and 6'. My code was:

foreach $count (split(/ /, $_)) {
    if ($count =~ /(\d+)=(\d+)/) {
        push @counts, (0)x$1, $2;
        $csubnum += $1 + 1,
    } elsif ($count =~ /(\d+)/) {
        push @counts, $count;
        $csubnum++;
    } elsif ($count =~ /\s*/) {
        next;
    } else {
        warn "bad format in WAIF $vers line: $_ (part $count)\n";
    }
}

"No, wait!" I thought. "I can do all that in a mapping:"

@counts = map (
    {$_ =~ /(\d+)=(\d+)/ ? ((0) x $1, $2) : $_}
    (split / /, $_)
);
$csubnum += scalar @counts;

Testing, though, proved another thing. Reading a file with a reasonable number of zeros per line (lines like '114 0 3 6=3 27=3 3=1 10=3 79=1 8=1 0 1 4=3 16=1 0 1 43=7 15=12 36=16 27=2' are typical) took 21 seconds with the foreach code and 28 seconds with the map code. So that's an extra 33% time per file read. I can only assume this is because Perl is now juggling a bunch of different arrays in-place rather than just appending via push. Still, it's an interesting observation - the regexp gets tested on every value in both versions, so it's definitely not that...

Last updated: | path: tech / perl | permanent link to this entry


All posts licensed under the CC-BY-NC license. Author Paul Wayper.


Main index / tbfw/ - © 2004-2023 Paul Wayper
Valid HTML5 Valid CSS!