20110524

grep 2.8

So I was sitting around today, lamenting the fact that grep isn't multithreaded, and decided to actually do something about it.  Since all of the appropriate Google searches just seem to list other people lamenting the same fact, I started messing around with things.  Turns out, I can get a 20-50% speedup just by doing this:

find $files -type f -print0 | xargs -0 -n10 -P8 grep -H $ARGS "$pattern"

(Word to the Wise: There is also a similar, but very wrong way to do it...)

find $files -print0 | xargs -0 -n10 -P8 grep -H -r "$pattern"

 So after tuning the threading numbers there, and feeling pretty good about myself, I decided to look up the mainline GNU Grep source, and see if there wasn't something I could do to help.  So I pull up the current source, and upgrade my box to use grep 2.8 instead of Gentoo's stable 2.5.4, and Holy Snapdragon!  (Yes, I actually said that...)  I got a 10x-60x speedup on the workloads I was testing.  Yes, you read that right.  My tests that were taking 48-80 seconds, were now finishing in 1-2 seconds.  I dunno what kinda magic they threw into grep 2.8 but good work guys.  Still no multithreading, but with the new version, I have real trouble just finding a workload that needs it.

So, my Gentoo countrymen, if ever there were a time to use unstable, this is it!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.