How We Build Private Clouds
Approach
We have been providing IT infrastructure for our customers since 2005 and building private clouds since 2009. We specialize in dedicated setups. Our definition of a private cloud is one that is running on hardware dedicated to a single customer, not a group of VMs running on shared hardware. This allows our customer to maintain complete control over their environment and retain the flexibility to allocate resources as they see fit. It is our customer’s decision to oversubscribe physical resources if they like and to plan resource allocation on the physical servers.
Our customers are generally smaller IT firms that build and manage applications for their customers; mid-size websites that need the performance, capacity, reliability and dedicated infrastructure; and various other small businesses that need a professionally managed infrastructure for their business. We’ve built private clouds that are a single physical server, those that include 20+ physical servers and almost 100TB of storage, and everything else in between. Our customers demand reliability and performance while being sensitive to the cost.
To meet the demands of our customers, we provide enterprise grade hardware from vendors such as Supermicro, Intel, LSI, Western Digital, Seagate, Netgear, Cisco and others. We only utilize well-tested, open-source solutions.
Hardware
Our customers tend to be very sensitive to price while requiring reliable solutions that perform well. For that reason, we find it beneficial to utilize hardware that is one or two generations old when building solutions. It doesn’t matter all that much if the CPUs are a little old when you consider how they will be utilized in a cloud configuration. You can allocate CPUs and threads to the VMs that need the most processing power and end users will not notice a difference. When it comes to enterprise CPUs, a 10-20% boost in power may cost an additional 30-100%. For price conscious customers, that will make a big difference especially when you start building a private cloud with multiple servers.
The one place where we will not use older equipment is with hard drives. That is really the only place that we see hardware failures. There is redundancy built in everywhere, but we still don’t like losing hard drives.
Proxmox
We have found Proxmox to be a great tool for building private clouds. Proxmox is an open-source project that ties together other leading open-source solutions for building and managing a private cloud. At the heart of Proxmox is KVM and LXC; two leading, open-source technologies for running virtual machines. KVM is a hypervisor that emulates a physical machine, allowing VMs to run any operating system. LXC is container solution for running independent linux instances.
Integrated into the Proxmox environment are all of the features you would need to run your private cloud. We utilize local storage, LVM, ZFS, iSCSI, NFS and Ceph, depending on the requirements of each configuration. All of these file systems are integrated into Proxmox, and there is an integrated firewall to protect the entire cloud as well as each individual VM. This includes built-in monitoring of all aspects of your cloud, as well as CPU/RAM/disk utilization for each physical server and VM.
All of this is easily managed from a full-featured GUI available through a browser or mobile device.
Storage
The biggest decision in setting up a private cloud is how you are going to configure the storage. How much do you need? Do you need it shared? What kind of fault tolerance? How fast? How much do you expect it to grow?
Local Storage
The only time we really encourage using local storage is when we setup a single server cloud for a customer. In that case, we’ll usually set up at least two drives with a hardware RAID controller and then add an extra drive that can be used for backups. This configuration works really well in this environment, providing added configuration options, extra speed, redundancy, and plenty of storage and backup space. Expansion and flexibility are challenging when using local storage, but migrating to a bigger cloud is a lot easier than migrating physical servers.
There are situations where running a VM on local storage may be desired within a multiple node configuration. For instance, we have a current client that has an extra disk setup on one node, and they have a VM using that disk whose job is to run some custom backup scripts and store data locally. They don’t want that VM to share physical disks with the rest of the VMs. There may also be a special case where a VM may want a local SSD or NVMe drive for a job that needs additional speed.
Shared Storage
To utilize all of the advantages of running a private cloud, some kind of shared storage is a necessity. You could run multiple nodes that were configured as single cloud servers with independent, local storage on each node. In that case, the VMs will all run in a private network and can be easily managed from a single GUI.
With shared storage, you can invest more cost-effectively by building boxes designed specifically to meet your data storage demands. You will have processing nodes optimized for processing and storage nodes designed for data speed, capacity and redundancy. You can take advantage of live migration, greater flexibility, and more efficient allocation of resources.
Live migration allows you to move VMs between nodes with no downtime. With the virtual hard disk available on any node that has access to the shared storage, the RAM of the VM just needs to be copied over to another node, and the VM will then be running on the new node without missing a beat.
You have greater flexibility when it comes to future expansion when you have a central storage system. Depending on which configuration we set up, expanding the capacity of the cloud could be as easy as adding a few more drives, adding another node in the storage network to share storage, or adding a new storage device that is available on all the nodes.
When we set up clouds with multiple nodes, we dedicate two local disks for the cloud OS and configure a software RAID (z-RAID) for those two disks. The additional cost of hardware RAID usually isn’t justified for drives that will be lightly used. We’ll dedicate resources to the shared storage.
SAN – iSCSI
Our simplest shared storage is an iSCSI SAN. On the hardware side, we’ll use a high quality RAID card with battery backup and RAID edition enterprise hard drives (at least six configured with RAID 10). With this hardware, we’ve seen faster disk access from the SAN than the same disks running the locally. This speed improvement comes from the RAID card (with caching) and having data spread across all of the drives. This configuration is best when you want a simpler setup.
We have had the best experience running LIO on Debian. We’ve run other iSCSI packages on other operation systems, and they have all performed equally well as far as performance and reliability goes, and we haven’t had any issues. We prefer Debian for our iSCSI servers as they seem to communicate better with the Proxmox nodes, as those are based on Debian. With that said, we have more SANs running CentOS, but recommend Debian for new installs.
SAN – Ceph
Ceph is a storage platform that distributes and replicates storage across multiple disks/multiple servers. In a virtualized environment, the VM disks will be stored in a Ceph cluster as a series of blocks with the cluster. The blocks will be distributed across all of the servers/disks in the cluster to provide fast access to the data from any processing node with access to the cluster. Each block will also be replicated to a different server/disk in the cluster to provide reliability in the event of a server/disk failure. Each block can be replicated any number of times. The more copies of each block increases the fault tolerance and the performance of each virtual disk provided there are more servers/disks than there are copies.
There is no limit to how big you can grow a Ceph cluster. Adding more disks or servers can be done at any time and is seamless to the processing nodes running your VMs. If a disk or server fails, the cluster is self-healing and will start to move data around to maintain the desired redundancy.
An entry-level Ceph cluster would be three servers with multiple disks allocated for storage, fast journaling disk(s), two fast networks, and economy CPUs. You would also need three monitoring servers that could be VMs that are not stored in the cluster.
Shared – Ceph
A cost-effective way to get all of the benefits of a Ceph SAN would be to have the Ceph storage servers and the cloud processing nodes reside on the save server. Ceph doesn’t require a lot of processing power, so it wouldn’t have a big impact on the processing power of the server.
In this configuration, you get all of the benefits of Ceph without the cost of setting up separate hardware. Upgrading/expanding any aspect of the configuration is easy to accomplish without any limits on how it can grow. From the initial setup, you can add processing only nodes or storage only nodes.
Backup
A backup strategy is a critical part of any private cloud setup. We need to prepare for any hardware failures as well as any corruption within a particular VM. Making a backup of a VM is pretty simple in that you are just making a copy of the virtual hard drives. The only questions on that point is where do you want to store them and how often do you want to make backups? The backup can be made locally to a separate physical disk, or it can be stored on a dedicated NFS server or space on our shared SAN. From there, they can be copied off remotely as needed.
The other backup strategy that you can employ is to use snapshots to record the state of your VM at a specific point in time. The virtual disks are not copied anywhere else, but the storage system records the state of the VM so that the VM can be rolled back to a specific point in time. Taking these snapshots is extremely fast since data is not copied and doesn’t use a lot of storage space. The downside to using snapshot backups is that it resides on the same hardware as the original image, so if there is some catastrophic failure that causes you to lose your storage system, you would also lose your snapshots. Snapshots are not available with every storage type, but we employ them on Ceph, ZFS and LVM-thin.
An ideal strategy would be combination of these two strategies, depending on the requirements for each piece of the cloud. Critical machines could have a snapshot taken every hour with more persistent image backups taken every night.
Security
We’ve implemented security in a few different ways. We have clients who have a hardware firewall in front of their cloud, while others use the firewall build into the cloud software. There are also some who just secure each server. In the past we have implemented a virtual firewall running on a virtual server, but we don’t recommend that. It was quite complicated to set up, and the performance wasn’t that great.
Hardware Firewalls
Hardware firewalls provide an easy solution for security and offer more peace of mind. We typically use Sonicwalls, as we have found them to be reliable and feature rich for the price. We also find their interface to be easier to use than some of their competitors, and all of their devices use the same GUI. On the plus side, the firewalls provide one place to manage all of your security and can provide all of the port filtering, content filtering, virus protection, VPNs and many more features to protect your network.
The main downsides to a hardware firewall are the cost, the possible single point of failure, and the limited ability to protect individual VMs within the cloud.
Cloud Firewall
There is a firewall built into Proxmox that is controlled at the cloud level. The access rules can be applied to the entire cloud or to individual VMs. Some of the advantages of this configuration are access rules can be applied to the entire cloud or to an individual VM, the access rules stay with the VM regardless of which node it is running on, and the access rules are implemented on each node providing greater throughput and reliability.
With the ability to apply access rules to each VM, you can build firewalls between all of your nodes and VMs. With entry/SMB/mid-level firewalls, access rules are applied to sections of your network (WAN, LAN, DMZ, etc.) and traffic needs to flow through the firewall to protect each zone. If you have a database VM that needs to be protected from everything (including other VMs), it is easy to apply those rules to the VM without having to segregate that off on another network segment or install another piece of hardware or upgrade to a more expensive box.
Implementing the access rules at the cloud layer provides some performance advantages. If you have a hardware firewall, you usually have a single point through which all of your traffic flows. With this cloud firewall, you essentially have a firewall running on each node. If you have a five node cloud setup, you have five firewalls running. That is five times the bandwidth available vs. a single hardware firewall. This also eliminates a single point of failure. You also have the advantage of moving the access rules off of a VM so that you can configure each VM to meet the requirements of that VM without having to devote resources to managing a local firewall.
The biggest limit to the cloud firewall is that it doesn’t provide features beyond port/IP filtering. If you need to setup VPNs or want your firewall to do deep packet inspection to guard against viruses/malware/spam, you’ll need to implement another solution. That solution could be a VM within your cloud. OpenVPN could be setup on a VM to provide VPNs to/from your cloud.
Solutions
Bringing all of the pieces together in a cost effective way is what we specialize in. From our experience over the past decade, we have been a valuable resource for the companies that we work with. Feel free to contact us with any questions that you have about setting up a private cloud for your enterprise.
If we can help you out in any way, please give us a call at 888-749-7067 or send us a message below. We would be happy to discuss how we can build a solution to meet your requirements.
[contactformgenerator id=”18″]