This week we expanded Inkbunny's usable submission storage by 50% without paying a penny. This journal details why, and how, for the curious. (This diagram of RAID levels may help.)
Three years sounds like a long time. But traffic is increasing, as is the number of active users. We'd like to get as far ahead of ourselves as possible with our current hardware.
Up 'til now, Inkbunny's submissions have been stored in a RAID 1+0 array of four 1TB 7200-RPM disks; its database is stored on two 64Gb SSDs, in RAID 1. Both of these RAID systems perform "mirroring" - storing two copies of each bit to provide redundancy against hardware failure.
In theory, RAID 5 is slower than RAID 1+0 for general use. In practice, our RAID controller stores writes in 1GB of RAM and reports immediate completion, giving similar write performance regardless of RAID level. It's got a backup capacitor; if you pull the plug out, it stores in-flight data in flash until power is restored.
[RAID controllers can get very fancy. We could plug in more SSDs and cache our hard disks with them, but that'd be overkill at our size.]
RAID 5 has become less popular as disk capacity has increased faster than write speeds. Having to read each disk during a restore stresses them; if any more fail, you face having to restore at least part of your data from backups - and parity calculations limit rebuild performance (to around 33MB/sec in our case). But we only have four disks, and they're not too big. They've lasted several years and average a 25-35% utilization rate at 33°C, which research suggests is close to ideal.
As a bonus, RAID 5 can be faster for reads. Due to their construction, the start of a hard disk is normally faster than the end - this is why they get slower as they fill up. With data striped across three disks rather than two, the "distance" from the start is reduced by 33%. We get many requests for little files, and we can't cache all of them, so read access time is important.
----
That's why we moved to RAID 5. But how? On the RAID side, it took just two steps:
* Transforming from RAID 1+0 to RAID 5 (the longest part; the controller had to rewrite every bit) * Increasing the size of the logical disk (almost instant, with some background work)
The transformation was timed to fall outside our peak period - a good thing, since it took 15 hours. The controller was now presenting a 3TB "logical disk" - and once we'd poked Linux, it knew that as well.
Unfortunately, our system was set up with a master boot record for its partitions. MBR only works up to the 2TB mark. That's fine when your disk is 2TB, but poses problems after that.
What we had to do was convert to GPT, allocate a BIOS boot partition to store our bootloader's second stage, reinstall said bootloader, replace the old partition record with one indicating that the partition was now larger, reboot to get Linux to accept the new partition values, and then expand the filesystem.
As one staff member noted, it felt "like open heart and brain surgery all at once". Inkbunny's server is in a datacenter hundreds of miles away; if we'd got it wrong, we'd be left to fix it in rescue mode (the equivalent of a Live CD) over a remote connection. But it all worked out in the end.
From what we can see, there's essentially no difference in performance - we just have more space. If anything, disk utilization has decreased. As a bonus, if we have to increase storage beyond 3TB, it'll take one more disk, rather than two. Hopefully we won't have to do that for about five years, though!
Technically if you use a native "advanced format" drive with 4KB physical and logical sectors, you could get it up to 16TB, since the issue is the maximum logical sector. We're not doing that, though.
Did you say you're using an SSD for Database work? Last time I heard is that SSDs don't like many writes at all (providing they're just large EEPROMs with write cycles of 10,000 to 100,000 write cycles). So putting a database with a lot of write cycles on it sounds rather counterproductive to me.
Did you say you're using an SSD for Database work? Last time I heard is that SSDs don't like many wr
Actually, SSDs are preferable in environments where high IOPS is important - such as a database server. Concerns about SSD write-cycles are usually unfounded outside of enterprise use cases - unless of course you're seriously abusing disk storage in ways that it shouldn't be used in. Most marketable SDDs use a even-wear algorithm to ensure the individual cells wear down at a mostly even pace - though such doesn't mean much for InnoDB databases, which do not release storage space if a row is erased. Regardless of that fact, the cell life of an SSD may ultimately exceed 2 petabytes of writing depending on make and model.
I doubt Inkbunny's database will create that many writes within the hardware life-cycle.
Actually, SSDs are preferable in environments where high IOPS is important - such as a database serv
In over a year, we've only made 92 TB of writes to the database filesystem - though we put swap on it as well. The SSDs are Transcend SSD 320 2.5". Not sure it counts as "enterprise", but if one of them fails, then it shouldn't take long to mirror it back from the other - and our host will have to pay to replace it. The hope is that they don't both fail at exactly the same time!
Looking at the SMART information on the individual drives, the SSD "wear-out indicator" is 0 (raw value: 43117 cycles), but that really doesn't mean all that much - drives can fail before or after that. There are no reallocations or uncorrectable errors on any of the drives we're using.
In over a year, we've only made 92 TB of writes to the database filesystem - though we put swap on i
As for databases - we're using PostgreSQL for the site; MySQL only for statistics. PostgreSQL works in a similar way, writing new information while keeping the old rows around. We do a VACUUM FULL every few months which completely rewrites the tables (we have TRIM/discard enabled). Of course, it's a Sandforce drive, and much of the database is compressible.
As for databases - we're using PostgreSQL for the site; MySQL only for statistics. PostgreSQL works
"like open heart and brain surgery all at once" ..... been there, done that, picked up the pieces afterwards, more than once :p RAID is hell to deal with sometimes, but when things go right it's pretty awesome stuff :)
"like open heart and brain surgery all at once" ..... been there, done that, picked up the pieces a