Home News Archive
Public Sources
Git repositories Electronics Arduino RFID Door Lock XBee network controller Hardware LS30 Burglar Alarm Online store Software logrun math-trainer mp3cd-tools fakepop aunic-modify ausadmin sms-notify apt-cacher video-capture wav-fixer misc
TRS-80 software I wrote
Hats I wear
Philosophy
Technical
Unclassified
Your donation gives me time |
Big Server(or, how I learned to stop worrying and love the hardware)First off, an apology. This document is as rough as guts. It will attempt to explain in brief, how I configured my new "big" server to be redundant and scaleable. This document is for home users who want to build a Linux box which they can grow with more storage, more CPU capacity, essentially forever. Oh, and cheaply too. I've been on a quest for scalability for several years. One can achieve a lot of scalability by using rackmount servers (Intel 1RU, 2RU, etc) and deploying a new machine for each function. This costs a lot -- too much for the home user. My solution provides some scalability with lower cost and some putting together of pieces required. I spent many hours designing this server and testing its performance and reliability. This document may save you some time. The Hardware
Configuring the hardware
Filesystem architectureThis is where the system design becomes interesting. We're using Software RAID, in particular RAID-1, which mirrors the device. On top of that, we use the LVM (Logical Volume Manager) to provide partitions which can grow or shrink on the fly. On top of LVM we use the ext3 filesystem, for its journalling (for filesystem integrity after a crash) and for its ability to resize an existing filesystem (although the filesystem has to be unmounted to do that, which sorta makes it a problem to resize the root filesystem). PartitioningPartition both devices identically, namely 2 primary partitions. First one about 100 megs, second one the rest of the disk (79.9 gigs or so). Set the type of all partitions to type 0xFD (Linux RAID autodetect). Partition 1 will be for /boot, partition 2 will be used by the logical volume manager as a volume group called rootvg.RAIDInitialise 2 RAID-1 devices (md0 for partition 1 on both drives, and md1 for partition 2 on both drives).Logical Volume ManagerInstall the most recent sistina.com LVM tools package and patch your kernel sources to include the same level of the LVM driver in the kernel. Create a volume group rootvg containing one device, /dev/md1. What does this do now? It means any LVM operations you perform, creating logical volumes for example, or filesystems on those logical volumes, will all be done through /dev/md1 which will use the Software RAID system to mirror the data onto your second drive. Thus, your whole system will be automatically mirrored, and if a device dies, your system will still work in single device mode, while you obtain a replacement drive.Partitions/dev/md0 is a fixed-size partition. mkfs it as type ext3. Make sure your kernel has ext3 compiled-in, it is easier than messing around with initrd to boot. When you configure boot-from-hard-disk, LILO understands RAID-1 devices and it will setup correct MBRs on each device so you can boot from either.Create some logical volumes as you require. I created /dev/rootvg/root for the root fileystem, which also includes /usr, /var and /tmp. If you are running a big mail server for example, you might wish to make /var a separate filesystem. I used to configure linux boxes with separate filesystems for /, /tmp, /var, /usr, /usr/local and /home - but these days it is more sensible to share the available space. However, if you are running a multi-user server or have untrusted users on this machine, it is vitally important to have no world-writable directories on the root filesystem. That means at least /tmp and /var must be on a different partition. I also created /dev/rootvg/home for my home directory. I took the opportunity to combine various directories I was using on the old server and on other servers into a single gluggy mess. There are a few gigs of duplicated files, but at least it's all in one place now. Complicated thingsAs mentioned, LILO knows about RAID-1. At least, I hope it does. I haven't had a failure of one of the disks yet.Because the root filesystem is on LVM, you need to use an initial ramdisk. The LVM tools package (or Sistina's website docs) provide a method to create such a ramdisk. The ramdisk has to use the LVM tools to start the LVM system after the kernel has loaded from disk (/dev/hda1 or /dev/hdc1 which is /dev/md0 at runtime) but before the root filesystem is loaded. This ramdisk thing works fine. I had the hardest time setting up my server. I was booting it over NFS for much of the setup phase, while testing the RAID-1 and LVM and filesystem growing and other things. NFSroot kernels can _only_ load the root filesystem over NFS, so you'll have to build both variants. Scaling the beastThe first thing is, that you can scale the storage to as much disk as you can fit into the chassis. When you add more disk, if you want it RAID-1, then do the same thing as the main system: create twin Linux Raid Autodetect partitions (type 0xFD), create a raid device, and add that device to rootvg in LVM. Or if you are not using raid, then create 1 partition for the whole disk and add that. I don't know what partition type to use in that case. Now you can put as much storage as you like into rootvg. You can then add new logical volumes or extend existing logical volumes to use the new storage on the new device(s). When you extend existing logical volumes, you need to unmount your filesystem and do a forced fsck on it to extend. Assuming you have run out of drive bays in your chassis (it happens pretty quickly with desktop chassis), the next step is to setup another server. Make it identical to your first server - you will really appreciate being able to run the same kernels and whatnot. Make all your servers NFS servers, and rearrange your storage so that the new server has some files to share also. Try to balance the storage so 50% of the file data is on the first server and 50% on the second. If you want to scale CPU power, you need to setup a cluster. I haven't done this myself, but I believe the application to use is called OpenMOSIX. Install OpenMOSIX on all servers you are going to combine and let them share processor power. Distributed FilesystemsI have spent many hours researching distributed filesystems online. I have checked out CODA, AFS, OpenAFS, Frangipani, Inter-Mezzo, GFS, Lustre and Berkeley xFS. My conclusion is that the state of play of distributed filesystems is still immature, and so consequently my big server runs NFS and will do for a while yet. What you really want is a cluster filesystem where you can add new machines (with local disk storage) and have that disk storage become added to an existing pool of storage, which combines to present a single image of the filesystem's contents. A serverless filesystem is a type of cluster filesystem where there's no "master" or "primary" server, where all existing servers are peers. That sounds good, but I don't think it's mandatory. I think that Lustre is the filesystem to use when it's time for me to start distributing my storage. OpenMOSIX is apparently looking at providing some kind of cluster filesystem. One of the concerns of using OpenMOSIX is that all I/O is performed through the "home processor", which means that it is travelling over the network, maybe twice when we are using a distributed filesystem. It seems like a key requirement that a clustered processing system should support a single filesystem image across all processors, and not require the "home processor" to do all I/O work. The end of the documentThat's an overview of the state of play and the design. A future revision will identify key topics which I have utterly failed to address and clean up the descriptions and provide configuration examples. |