A month or so ago, while reviewing performance graphs, one thing popped out at us - a curious pattern in database row access and CPU usage that couldn't be explained by slow queries. [Database terminology.]
We determined that it must be a lot of small queries running one after the other. Our logs confirmed it: someone was using our API to index Inkbunny's artists and submissions, four times a day.
You might think our next step was to stop them doing that. But hold on! Access to public data is exactly what the API is designed for - while it was significant, and perhaps excessive, it shouldn't be causing such massive CPU usage. So what was up? The problem turned out to be a missing index.
Using an index is like having a desk clerk check a rolodex rather than having to go back to the stacks and rifle through the files. In this case, we check for adjacent pool submissions every time you view a submission - a process that's about 200 times slower (60ms vs 0.3ms) without an index, which we swiftly added. Certain submissions are looked at thousands of times each day, so this adds up!
Now, you don't want too many indexes; they take up space, and you have to keep them up to date. We've recently removed over 150 that weren't being used, cutting database size by 15% (about 1.8Gb). But when you can use them, they're vital - and in this case, improving the API helped regular users too.
Postscript: It might seem strange to be bothered about speed. We've got two 6-core CPUs, yet we're only using about four or five cores even at the busiest times. So what's the deal?
First, most of Intel's modern chips can work faster when using fewer cores because they know it won't overheat the CPU (see the green line on page 14 of this report). Less CPU usage means less power used, and less heat - we like to keep our data cool, too!
Next, most CPU graphs average out the last five minutes; but web server traffic isn't a constant workload, it's bursty. If you visit when other users are being served, you might have to wait in line. Speeding up submission views (and the related API queries) doesn't just gives us more headroom for expansion; it allows us to serve existing Inkbunny users faster - and that's a feature in itself.
We've done a fair few things to make it faster! I'll be summarizing them in our next development post, but more/better indexes (in some cases specific to certain queries) and tweaking queries so as not to inadvertently do silly things (like processing all 10000 items in a gallery list to order it) are 90% of it.
We've done a fair few things to make it faster! I'll be summarizing them in our next development pos
Hum, does an index work like a local cache? I mean.... if you can cache all the most frequently viewed submission into local memory or use a QoS system, then I suppose viewing popular submissions would be no problem. Viewing other data files like audio and Flash files which are larger would benefit from QoS. But then again, I have no idea how sophisticated the server setup is really.
By the word index I just automatically assume the id number in the PHP submission URL header like the inkbunny*.php?id=XXXXXX
Hum, does an index work like a local cache? I mean.... if you can cache all the most frequently view
I think you may be confusing this with the concept of a disk cache (which we also use).
Indexes reduce the reading needed to lookup information. You know indexes in books? Like that.
The benefit is even more noticeable when your database does not fit in memory, but your indexes do. Fortunately right now we can fit all the database and a good chunk of submissions in memory.
I think you may be confusing this with the concept of a disk cache (which we also use). Indexes ( h
That is actually really awesome. I'm jelly of guys who can get ahold of good database hardware. One time I may have had an opportunity to try an manage a test database.... You know, a server room for students at my college. But I turned it down because they wanted me to sit at a help desk with a phone and do preliminary stuff first. They need to learn what the difference between someone who works in IT and someone who wants to manage a database. Also a CS/CE major cannot be sitting down answering phone calls and solving others' problems all the time. Phhhf. Eh, oh well.
That is actually really awesome. I'm jelly of guys who can get ahold of good database hardware. One
I'm not sure the exact situation you were in, but sometimes it can help to get experience on the front lines before working in the rest of a business. I did customer support for four years before getting my most recent job at Inkbunny - and I still do it. :-)
Most hardware nowadays is powerful enough to handle databases of quite significant size. You could run Inkbunny off a laptop, although it might not be the smartest thing to do.
I'm not sure the exact situation you were in, but sometimes it can help to get experience on the fro