Skip to content

Latest commit

 

History

History
155 lines (90 loc) · 12.4 KB

recommended_cluster_config.md

File metadata and controls

155 lines (90 loc) · 12.4 KB

The recommended configs of host computers and management console running Debian GNU/Linux within clusters.

Partitioning and file system for hosts.

I personally suggest to use a 24~32GiB SSD to store the code and static data (both could be provided by the package management system), and a much larger HDD to store data generated by daemons and users.

It is needless to say that the most capacity-consuming storage on a virtual machine host is the storage pool for disk images, but in a cluster it usually is a specific infrastructure shared between every single host (see below for further discussion), so there will not be so much data needed to store inside one host's OWN (not shared) storage. Because of that, using a single large SSD to store all data needed for a host is also feasible.

From now on, I assume the small SSD is sda, and the large HDD is sdb. It only needs small adjust to adapt to single-disk scheme.

Using GPT on both disk is recommended, and a BIOS boot partition had better be put just behind the first GPT, usually between the [34, 2047] sectors.

On sda, behind the BIOS boot partition lie /boot/efi, /boot, /var/lib/dpkg, / and /usr/local in order.

I usually leave 34MiB for /boot/efi, /boot could be 512MiB to store kernels and initramfses, /var/lib/dpkg is where to store essential metadata for debian package management system (The dpkg), which needs around 1GiB and should be kept in the same disk with root partition so that they are easy to migrate along side in order to keep integrity, and the remaining capacity is to be shared between the root partition and /usr/local. Those two partition lie along side, so it is easy to adjust the boundry between them.

On sdb, behind the BIOS boot partition lie the swap partition, /var/lib/dpkg/updates, /var and /home.

/var/lib/dpkg/updates is used to store temporary files and contexts for dpkg(1) to perform package management. Its contents must survive till the next time dpkg(1) being called, in case of system crash during package management, for dpkg(1) to recover, so it MUST not be a tmpfs, and had better be separated from /var/lib/dpkg if double-disk scheme is used to avoid unnecessary wearing of SSD. (On single-disk systems there is no need to separate /var/lib/dpkg/updates from its parent directory, and their capacity should be combined.) I usually leave 1GiB for it.

The remaining capacity is to be shared between /var, /var/log and /home.

In production environment, logs are very important, so /var/log should be separated from /var. Its capacity depends on the host's usage.

Every partition should mounted with the option user_xattr if supported, in order to make it easy to deploy grsecurity later.

Storage devices based on flash memory such as SSDs need to erase a block with old data FIRST before writing new data to it. The easiest way to perform is to mount the filesystem DIRECTLY on an SSD with discard option, to erase blocks occupied by a file which is being deleted. The other way is to run fstrim(8) periodically on every filesystems with TRIM features, erasing every unused block.

The implementation needed to perform discard is uneven among SSD's firmware, and all TRIM command were synchronous before SATA 3.1, so periodic trimming is more recommended than using discard. You can config it after installation in the way described in this article.

Configs applied after installation but before reboot.

There will be a notification for the completion of installation, after bootloader is properly installed. If you confirm this notification, the computer will reboot into the newly install operating system.

At this point, you are still able to switch to an unoccupied virtual terminal (tty1 is occupied by the UI of debian installer, and tty4 is used for logging) to get a shell for debian installer environment, and the whole installed system is mounted at /target.

You can cd or chroot to /target NOW, to apply any config you want, before the first boot of the installed system.

For example, if you want to mount a tmpfs at /tmp, you had better config it NOW, for the installed system will start to write to /tmp after its first boot, so it will be hard to recycle the occupied space within the covered-mounted /tmp directory if you set up this after the first boot.

Add the following instruction to /etc/fstab to mount /tmp as a tmpfs:

tmpfs     /tmp tmpfs     nodev,nosuid,size=20%,mode=1777    0    0

Or, on Debian GNU/Linux, you can uncomment RAMTMP=yes within /etc/default/tmpfs (This method may be outdated in Debian GNU/Linux versions with systemd as init).

/media had also better be mounted as a tmpfs, in order not to wear / unnecessarily. Since /media should only contain other temporary mount points for removable storages, its reserved space could be minimized to 1%:

tmpfs     /media tmpfs     size=1%,mode=0755     0    0
Repositories setup.

I recommend using the following repositories in /etc/apt/sources.list for Debian GNU/Linux 8:

deb http://httpredir.debian.org/debian jessie main
deb-src http://httpredir.debian.org/debian jessie main

deb http://httpredir.debian.org/debian jessie-updates main
deb-src http://httpredir.debian.org/debian jessie-updates main

deb http://security.debian.org/ jessie/updates main
deb-src http://security.debian.org/ jessie/updates main

deb http://httpredir.debian.org/debian jessie-backports main
deb-src http://httpredir.debian.org/debian jessie-backports main

According to Debian GNU/Linux's policy, functional improvements and new program will be added to unstable and testing branch, but not be added to published stable branch, and only security enhancements and bugfix are to be added, but Debian provides "backports" to provided packages specifically built against the corresponding stable branch for those who are using stable but want to use new software added to unstable.

The source server can be replaced with a nearer mirror, and http is able to be replaced with https after package apt-transport-https is installed, if said mirror supports so.

Install necessary packages needed for virtual mechine hosts.

Install the qemu-kvm package with apt-get(8) or aptitude(8), e.g. using this command:

# aptitude install qemu-kvm libvirt-bin

In current stable branch jessie, libvirt-bin is already a transitional package with no real content, only to make it easy to install its dependency libvirt-clients and libvirt-daemon-system, and has been removed in testing and unstable, so you should replace libvirt-bin with libvirt-clients and libvirt-daemon-system for future Debian versions.

The daemon libvirt-daemon-system daemon will start automatically at boot time and load the appropriate kvm modules, kvm-amd or kvm-intel, which are shipped with the Linux kernel Debian package. If you intend to create VMs from the command-line, install virtinst, or you can use GUI tool virt-manager.

Use a shared storage pool to store disk images of virtual machines.

libvirt's "migrate" action can only migrate the definition, as well as the whole state when performing a live migration. In reality, libvirt assumes that the STORAGE POOL is shared between the source and the destination hosts, and mounted to the same path, the images should remain accessible via the very same path when the migration is done, otherwise the migration will fail to start. So, the easiest way to config the hosts is mounting a shared storage (NFS and the like) on the path /var/lib/libvirt/images, where the pool default is defined, of each hosts, making them effectively a cluster.

For example, if the shared storage is an NFS, mount it to /var/lib/libvirt/images with the following instruction:

$hostname_of_nfs_server.local:/the/exported/path/for/nfs /var/lib/libvirt/images nfs auto 0 0 Source images (e.g. isos for installation) can be put into the pool using vol-upload sub-command of virsh(1), and you can get a backup of one image inside the pool using vol-download sub-command.

Use a normal user to perform libvirt-related maintenance.

There are a lot of documents related to libvirt in which root user is used to perform guest-related maintenance, but in reality, those are bad practices.

In Debian GNU/Linux, permissions needed to manage virtual machines are assigned to group libvirt and kvm, so you should create a user to be specialized to manage vms, e.g. virtmgr, and add it to those two group above:

# adduser virtmgr
# usermod -aG libvirt virtmgr
# usermod -aG kvm virtmgr

libvirt defaults to qemu:///session for non-root. So from virtmgr you'll need to do:

$ virsh --connect qemu:///system list --all

You can use environment variable LIBVIRT_DEFAULT_URI to change this.

Such user account is feasible to perform remote management via ssh. You can use virsh, virt-manager or other tools based on libvirt to conect to the host via this account to perform everything libvirt provides, e.g.

$ virsh --connect qemu+ssh://virtmgr@$HOSTNAME_OF_THE_HOST.local/system ...

Use mdns to make it possible to access computers via domain names derived from their hostname instead of ip address.

By deploying mdns (one of whose famous implementation is the Bonjour of Apple Inc) server on each computers within the same subnet, they can contact each other by using domain names in a format like "$HOSTNAME_OF_THE_TARGET_HOST.local".

I believe it is needless to say that domain name derived from hostname is easier to remember than ip address.

The major implementation of mdns on most Unix-like operation systems is Avahi. In Debian GNU/Linux, it is divided to a lot of packages. To make use of the most basic function of mdns, you could just install avahi-daemon as mdns server and libnss-mdns to interface mdns name resolution to Name Service Switch:

# apt-get install avahi-daemon libnss-mdns

Enjoy mdns name resolution after deploying them on every host within your subnet!

Making the management console of the cluster able to mount the shared storage pool may be benefitting.

A management console of a cluster is a computer for the cluster administrators to log in, able to access and manage hosts within the cluster, through which the guests living upon the hosts get managed. But how could it become if the console also mounts the shared storage pool used by worker hosts, and becomes a functional host itself?

Virtual machines can then be created and calibrated on the management console, and can be migrated onto an appropriate worker host once it is feasible for production use. Malfunctional but running guests can also be migrated onto the management console to check and repair.

Disks images can be uploaded and downloaded "locally" if the management console directly mounts the shared storage pool, eliminating the communicational expense between the console and a worker host.

Specific recommended configs for the management console.

The configs for the management console could be based on those for ordinary hosts, but since maintenance could be performed on management consoles, it may need additional local storage capacity, so I suggest using double-disk scheme mentioned above for a management console.

Besides, the management console is for administrators to work upon, so some software ease their working, such as GUI tools (e.g. virt-manager) using X window protocol, which could use an X socket forwarded by ssh(1) from the computer with X server directly used by an administrator, displaying its GUI on the screen just in front of them.

######Reference: ######[1] man page tmpfs(5) ######[2] https://wiki.archlinux.org/index.php/Solid_State_Drives ######[3] https://wiki.debian.org/SourcesList ######[4] https://wiki.debian.org/Backports ######[5] https://backports.debian.org/Instructions/ ######[6] https://wiki.debian.org/libvirt ######[7] https://wiki.debian.org/KVM ######[8] https://libvirt.org/virshcmdref.html ######[9] https://docs.fedoraproject.org/en-US/Fedora/13/html/Virtualization_Guide/chap-Virtualization-KVM_live_migration.html#sect-Virtualization-KVM_live_migration-Live_migration_requirements ######[10] https://docs.fedoraproject.org/en-US/Fedora_Draft_Documentation/0.1/html/Virtualization_Deployment_and_Administration_Guide/App_Migration_Disk_Image.html