Bye-bye SAN, Hello Tintri

I first came across Tintri over a year ago, had a look at their product on their website and thought “this sounds rather good”. I’ve chatted to them at IP Expo for the last two years, and recently had a WebEx with a few of their guys and a handful of my colleagues, where they went into some detail about what the product does, plus a quick demo.

If you’re currently running VMware VMs from VMFS volumes presented from something like an EMC Clariion SAN then you really want to look very seriously at the Tintri T540.

What is it? 13.5TB of storage in a 3U box. Fast storage, 50k-75k IOPS. Designed for VMware (Hyper-V support coming soon).

How do you get that amount of performance from 3U? You combine SSD and HDD. The T540 has eight 300GB SSDs in RAID 6 (striped with two parity disks) and eight 3TB HDDs, also in RAID 6. Nothing new there, except that Tintri have written their own optimised filesystem called VMstore featuring inline dedupe and some clever MLC flash optimisations. Without this, the Flash wears out quite fast due to the high random writes caused by your VMs hammering away all the time. Even if your VMs are writing mostly sequential data, when you combine many sequential writes together the component disks in the array see a random workload. SSD loves random workloads.

The data within the Tintri is structured such that all writes go to SSD, the data is then migrated to HDD where appropriate in 8k blocks – this is the size they found worked best during testing, and conveniently works nicely with most enterprise apps, including SQL Server and Exchange. Tintri claim to be able to service around 99% of IO from SSD. You can also pin entire VMs or just individual .vmdk files to sit only on SSD, should you want to.

Let’s contrast this to how a Clariion works. The Clariion was designed to be able to service defined, fairly consistent workloads, e.g. a single database on a set of disks. Sure you can make more than one LUN from a set of disks, but you need to understand the workload of each LUN such that one doesn’t negatively impact another. Try doing that for hundreds of VMs all on the same pool/RAID Group… On the Clariion, all writes smaller than the write-aside value (default 1MB) go via the storage processor RAM cache. On a CX4-480 this is about 4GB. There is (usually) a tiny amount of read-ahead cache allocated too, but it’s small because it hardly ever gets used. Back to the write cache. If this fills above the high watermark (default 80%) the SP initiates flushing to disk until the cache falls below the low watermark (default 60%). If you perform a read and the data you want is still held in the write cache, the read will be serviced from the cache, otherwise it’ll come from disk. In reality this means basically all writes come from disk of one form or another. Think about how much provisioned storage you have in your Clariion (it’ll be TB) vs a tiny 4GB write cache.

The Clariion has a couple of data tiering features that you might think would help, and they might, but possibly not as well as you might be led to believe, especially if your workload is servicing VMs.  These are FAST and FAST Cache:

  • FAST (Fully Automated Storage Tiering) moves data around between different types of disk within a Pool, so you could have SSD, FC HDD and SATA HDD. It does not work on RAID Group LUNs as you can only have one type of disk within a RAID Group. The tiering chunk size is 1GB – which is nowhere near granular enough for the jumbled mass of data that you’ll have if you’ve been thin-provisioning your vmdk files. Writes will go to whichever tier the 1GB chunk that holds the block you’re addressing is currently sat on – so possibly SATA.
  • FAST Cache is a misnomer, it’s not a cache at all. It’s enabled per LUN, and allows up to 2TB of SSD to hold promoted 64kB blocks of data. If you enable FAST Cache for some LUNs, the FAST Cache algorithm monitors the blocks on those LUNs and gradually promotes the most active blocks to SSD. How busy a block needs to be to determine if it’ll be promoted is based on it’s relative “busy-ness” compared to all other blocks on FAST Cache-enabled LUNs across the entire SAN. Writes to otherwise untouched blocks do not go direct to SSD, and writes to relatively quiet blocks do not go direct to SSD. Only writes to already busy blocks that have been promoted will go direct to SSD.

I’m not as familar with other SAN vendors tech as I am with the stuff from EMC but a lot of the above will be similar, if not the same. Check out the granularity of the tiering mechanism, and see if the SSD “cache” is actually a cache.

The Tintri T540 is presented to your ESXi hosts as a single large NFS datastore, via dual controllers (active/standby config) each with two 1/10Gbps Ethernet interfaces (also in active/standby config). Due to the use of NFS and thus Ethernet, you could potentially not only ditch your SAN, but also your FibreChannel fabric(s) too. (Oh, and your storage team… oops)

There’s no more creating LUNs then adding them as datastores, no upgrading VMFS versions, managing SAN RAID Groups or Storage Pools. The Tintri T540 also understands what the files in the NFS datastore are – it knows what makes up a VM and allows you to see how busy a VM is in a variety of different ways, and apply Quality of Service to it.

As you can probably tell, moving your VMs to a T540 will massively reduce complexity and management overhead. Yes – you can adjust the placement of VMs and set QoS, but unless you have some specific requirements this box is “set it and forget it”. Because the T540 integrates with vCenter and understands the data it’s holding, it’s not “just another” SSD+HDD box. It has a load of performance overview and reporting features to make the VM admin’s life easier. Because of the massive amount of performance it can soak up and throw out think how nice it’d be to run SQL Server or Exchange from it.

The T540 makes provisioning a new VM from the vSphere Client super-fast. We’re talking seconds. This is achieved via supporting VAAI. It also has built-in cloning and snapshot capabilities, the latter can be scheduled. Think how handy that speed could be for devs who need a constant supply of new servers or who want to snapshot a VM before they roll out an update.

Check out the features for yourself.

At the head of Tintri the company are people from VMware, Sun, and Data Domain. In addition to those companies the engineering team has people from Citrix, NetApp, Google and Brocade. They know their stuff, and know the storage issues that face anyone who’s taken advantage of server virtualisation whilst having to use legacy SAN/NAS technology.

Clearly, you need to have some idea of what your environment is doing to know if your storage workload would be suitable, but I reckon it would be perfect for a lot of people who’ve had no choice apart from traditional SAN or NAS until now. And if you have a VM-based VDI solution then Tintri could solve all your storage provisioning and boot storm issues.

Oh, and they’re winning awards all over the place. I want one. No, I want two.

This entry was posted in Hardware, Storage, vSphere and tagged , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

4 Responses to Bye-bye SAN, Hello Tintri

  1. Pingback: vSphere: Convert RDM to VMDK and vice-versa on Windows VM with no downtime | Robin CM's IT Blog

  2. Wernher says:

    There is a vendor that has been doing this for longer and does it far better than Tintri – have a look at X-IO (www.x-io.com)

  3. Pingback: vSphere: Convert RDM to VMDK and vice-versa on Windows VM with no downtime | Robin CM’s IT Blog | Интересные заметки

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s