Just a few things to keep in mind when dealing with clusters using MSCS:
STORAGE:
Arguably the most important component of the cluster is the shared storage device. If there’s any single point where you want to spend money to make sure you’re data is safe, this is it. And yet, I have seen many companies skimp on the storage, either by going with cheap, or non-redundant devices. The bottom line is, if you’re not going to be able to gaurantee the uptime of your storage, you’re defeating the purpose of clustering in the first place. In any case, if you’re in a bit of a budget crunch, there are numerous less-expensive (notice the lack of the word cheap) solutions. ISCSI devices can give you great performance at a much lower cost than traditional cluster storage devices. For my part, that’s what I use exclusively.
ACTIVE/ACTIVE vs ACTIVE/PASSIVE
A simple online search will provide you with thousands of opinions on this topic. What I’ve found in my travels, is that it all comes down to the situation. Most of the time, going with the recommended Active/Passive approach is best. But there will always be exceptions. For example, I’ve seen a two-node exchange cluster that was running into client-side performance issues. The problem wasn’t with the hardware, but the switching infrastructure. The client facing interfaces on these boxes were on a 10mbps switch (with 100mbps uplink), and there was no budget for a switch upgrade. So a plan was put in to make room in the budget for a switch infrastructure upgrade in the next fiscal year. In the meantime, the cluster was changed to be Active/Active with the understanding that performance would suffer in the event of a node failure. Keep in mind though, that this is the exception, not the rule. I put this in to illustrate a point to the “never use Active/Active” crowd.
OTHER THOUGHTS
If you are planning a cluster that requires a DTC, I’ve always found it easier to give the DTC it’s own disk, IP, and Name resources, to me it seems a tidier way to go. Also, if you have the DTC tied to a group that fails for some other reason and there’s a second group relying on it, then you’ve just brought on an extra failure.
If you’re planning on clustering, most people (and MS) recommend having a cluster for SQL, and a cluster for Exchange, etc…. Unfortunately, out here in the real world, many of us have found reasons that we are required to put multiple services on a single cluster (note, single cluster, not single node, but that happens too). When this happens, spend time planning your failover scenarios, more complex is not necessarily worse, just more complex.
For something that seems to be an important server technology, MS doesn’t seem to have much available for those wishing to learn about clustering. Considering that I’ve heard from CTECs that the barrier to offering MSCS classes is the cost of the hardware for the class, and that MS has Windows Storage Server, I’d like to suggest that they combine a class on the two (especially since their classes are run with VMs these days anyway). Given 3 VMs, you could cover clustering in depth, and Windows Storage Server (something I’d love to get my hands on, I’ve been having to build Linux ISCSI targets in VMs for my testing).
Finally, if you notice a strange behavior in your cluster, where a node fails taking ownership of a resource, try opening regedit, going to HKEY_LOCALMACHINE\Cluster\Resources and finding the GUID of the resource. From there, give the Network Service account full controll access to the key.
Tags: MSCS
e08ee743-d458-4d9f-9b64-b1feebd53711|0|.0