find $files -type f -print0 | xargs -0 -n10 -P8 grep -H $ARGS "$pattern"
(Word to the Wise: There is also a similar, but very wrong way to do it...)
find $files -print0 | xargs -0 -n10 -P8 grep -H -r "$pattern"
So after tuning the threading numbers there, and feeling pretty good about myself, I decided to look up the mainline GNU Grep source, and see if there wasn't something I could do to help. So I pull up the current source, and upgrade my box to use grep 2.8 instead of Gentoo's stable 2.5.4, and Holy Snapdragon! (Yes, I actually said that...) I got a 10x-60x speedup on the workloads I was testing. Yes, you read that right. My tests that were taking 48-80 seconds, were now finishing in 1-2 seconds. I dunno what kinda magic they threw into grep 2.8 but good work guys. Still no multithreading, but with the new version, I have real trouble just finding a workload that needs it.
So, my Gentoo countrymen, if ever there were a time to use unstable, this is it!