One of our biggest CPU hogs is image processing library ImageMagick - and fortunately it supports OpenMP, a system for doing using multiple cores. Since it sees 24 cores, it tries to use all of them.
Perhaps unsurprisingly, this is non-optimal without tuning; but what I did find surprising was how it failed. Here's what happened when I benchmarked four iterations of our more expensive operations - resizing a 9MB, 3507x2480 PNG file to 920×651 - using from 1 to 24 threads:
When ImageMagic can't use 21+ cores, it seems to fall back to using around three. This is decidedly non-optimal. It gets worse: for smaller images, real time starts to increase around 11 cores, or even 7. And there's more overhead than actually using three; compare the larger user time values for  vs. .
This could be because it's trying to do 11*2=22 or 7*3=21 threads because of the sizes, or it could be splitting memory/cache access over multiple processors. Or it could be something completely different.
As a result, I'm limiting it to six threads on our main server, and four on our secondary - in each case, the number of physical cores on one of its CPUs. This consistently gives a 2-2.5x speedup over one thread. In some cases, it's non-optimal in terms of wall-time, but only a little - and it leaves the other cores free.
Going past that burns up excessive CPU cycles as overhead. If nothing else, this heats the system up, reducing its lifetime… it also means the cores can't be used to process web pages or database queries. This'll be more important once we deploy PostgreSQL 9.6, supporting parallel execution within a query.
tl;dr Upload processing of large images should be snappier now, whilst using less CPU time. Yay!