Datastore Groups Explained
"Why can't I fail over a single VM using array-based replication?"
I get this question a lot. The truth is, you can, as long as you understand how datastore groups work. The smallest "thing" that SRM can fail over using array-based replication is a datastore group. Now in VMware terms, a LUN is almost always a 1-to-1 mapping with a datastore or an RDM so for the purposes of this article, we will ignore the rare cases as they fall outside of the scope of what I am trying to explain. I give to you, the test environment.
In the test environment, we have 3 VMs: Alpha, Beta and Charlie, and 3 datastores: datastore1, datastore2 and datastore3. We start the test with Alpha's disk on datastore1, Beta's disk on datastore2 and Charlie's 2 disks on datastore3.
So remember how I said the smallest thing you can fail over is a datastore group? Well, what is a datastore group you might ask. A datastore group is the smallest number of LUNs that satisfy the dependencies of all the VMs on those LUNs. I know, that's confusing, don't worry I will explain.
So let's look at just VM Alpha and datastore1.
As you can see, the only VMDK file on datastore1 is VM Alpha's only disk. Because VM Alpha doesn't depend on any other datastores and because datastore1 doesn't hold any other VMs, they make up a datastore group. In this way, you could fail over a single VM.
So what about VM Charlie with 2 VMDKs?
That's a good question. Well let's think this one through. VM Charlie has no dependencies on any other datastores other than datastore3 and datastore3 has no other VMs in it besides VM Charlie so they make up a datastore group as well! As you can see, datastore groups don't care how many disks a VM has, just what datastores they are on. Consequently, datastore groups also don't care how many VMDKs are on a datastore but rather what VMs they belong to.
So if that is the case, then why do datastore end up lumping themselves together?
Here is where things get a little more tricky. Let's say that you move one of VM Charlie's VMDKs to datastore2.
So now, we can't fail over just datastore3 because to fail over datastore3 we MUST fail over all VMs on that datastore. To do that, we have to fail over VM Charlie but to fail over VM Charlie we MUST fail over all of the datastores that it depends on. Because of this rule we must also fail over datastore2. In this way, now the datastore group that is created contains datastores 2&3. Note that in this example, VM Beta from the above picture is unregistered. So what happens if we register it?
If you guessed that it also has to be failed over, you're right!
In this example, we see that the datastore group now includes VM Beta. VM Beta will have to be failed over because its VMDK file lives on datastore2, which also holds VM Charlie, which has another disk dependency on datastore3. In this example, you can't fail over just VM Beta or just VM Charlie because of the dependencies of other VMs on the datastores that the VMDK files live on.
Ok last example. What if I have a 2 VMs that live on 2 datastores and don't share any disks but another VM shares both of those disks? Well, that would look like this:
So you can see, in this example, we gave VM Beta another VMDK that lives on datastore1. VMs Alpha and Charlie share no cross dependencies but are grouped into the same datastore group. Why you might ask? Because VM Beta links them together.
In this example, to fail over VM Alpha you must fail over datastore1. To fail over datastore1 you must fail over VM Beta. To fail over VM Beta you must fail over datastore2. To fail over datastore2, you must fail over VM Charlie and to fail over VM Charlie, you must fail over datastore 3. Whew!
In the end, it all has to do with the dependancies.
As you can see, careful planning of your SRM storage can really help to keep from having the "all or nothing" failover plan. When deploying an SRM environment using array-based replication, try and create datastores specifically for VMs that you want replicated and only put VMs on the same datastore that you know you want to fail over together. Another thing to think about is RDMs. If your VM is using RDMs, those will also need to be failed over any time the VM is failed over and that will also add to the number of LUNs in your datastore groups.
Obviously, there are literally an infinite number of combinations and scenarios to go over but if you understand the definition of a datastore group and understand why certain datastores get lumped together with other datastores, you can figure out the dependencies for yourself. I hope this helps to shed some light on what seems to be one of my most common questions! Don't forget to follow me on Twitter @SRM_Guru and if you have questions, please put them in the comments below!
**********************************************Disclaimer**********************************************
This blog is in no way sponsored, supported or endorsed by VMware. Any configuration or environmental changes are to be made at your own risk. Casey, VMware, and any other company and/or persons mentioned in this blog take no responsibility for anything.
Also the vswap datastore doesn't count as a tie. You can have the virtual machines vswap file on any datastore and it doesn't matter.
ReplyDeleteIf you are positive only the vswap file is on that vswap datastore and still seeing the vswap datastore in VC under the virtual machine try restarting the management agents on the ESXi host and check again. If it is still there then I would assume there is a another thing using that datastore