2

So when updating a two node cluster, our procedure is this:

  1. Backup all databases.
  2. Failover all SQL cluster roles to Node A.
  3. Install the patches on Node B (passive).
  4. Restart Node B.
  5. Failover all SQL cluster roles to Node B.
  6. Install the patches on Node A (passive).
  7. Restart Node A

My understanding is that in step 5, the updated node, will perform some updates before taking on incoming connections, as can be seen in the log.

What happens if another failover occurs before Node A is updated? If, for example, I have updated to a higher CU on Node B. Will the failover fail or just fail over and start itself on a lower CU level?

2 Answers 2

2

When the database is opened on Node B after Node B has been patched with a higher revision of SQL Server code, that database is upgraded via a series of scripts. Those scripts are managed using the same ACID-compliance as any other activity done in the database. The upgrade is an all-or-nothing action controlled via transactions.

If there is a cluster failover during upgrade of the database on node B, any uncommitted changes made will roll backwards or forwards as part of the recovery process when the database is opened on node A. This should allow Node A to open the database reliably. That being said, if the database has completely upgraded to the new version prior to the failover, Node A will not be able to open the database if it hasn't been upgraded to the same engine code as Node B. You can avoid this scenario by having a third member in the cluster such that you upgrade the unused Node C first, then upgrade the unused Node B, then failover to Node B, then upgrade Node A. If there is a problem at Node A preventing upgrade, or a failover needs to happen for any reason while the update of Node A is happening, you simply failover from Node B to Node C, and rectify the issue with Node A at your leisure.

FYI, for business-critical data, part of your backup process in step 1 should probably involve testing that backup via a restore to some other machine just to ensure you have a reliable backup.

0
-1

My coworker argued that the following statement was wrong in Hannah's answer above:

Node A will not be able to open the database if it hasn't been upgraded to the same engine code as Node B

After testing it out on a two node cluster it seems my coworker was right. The SQL server downgrades back to a lower CU when failing over to unpatched node. See below:

  • Node A, Active node, unpatched
  • Node B, Passive node, unpatched
  1. Node B, Patched up, from CU22 to CU25

Node A, Active, before failover:

select @@VERSION;

Microsoft SQL Server 2017 (RTM-CU22) (KB4577467) - 14.0.3356.20 (X64)   
Aug 20 2020 22:33:27   
Copyright (C) 2017 Microsoft Corporation  
Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2019 Standard 10.0 <X64> (Build 17763: ) (Hypervisor) 
  1. Failover to Node B

Node B, Active, after failover

select @@VERSION;
    
Microsoft SQL Server 2017 (RTM-CU25) (KB5003830) - 14.0.3401.7 (X64)   
Jun 25 2021 14:02:48   
Copyright (C) 2017 Microsoft Corporation  
Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2019 Standard 10.0 <X64> (Build 17763: ) (Hypervisor) 
  1. Second failover back to Node A

4. Node A, Active, after failover

select @@VERSION;

Microsoft SQL Server 2017 (RTM-CU22) (KB4577467) - 14.0.3356.20 (X64)   
Aug 20 2020 22:33:27   
Copyright (C) 2017 Microsoft Corporation  Enterprise Edition: Core-based Licensing (64-bit) on Windows Server 2019 Standard 10.0 <X64> (Build 17763: ) (Hypervisor) 
3
  • All you've illustrated is that you patched the instance on node B, and not patched the instance on node A. The databases themselves are affected by the instance patch level, and using mixed versions in an AG is not supported. Don't rely on this behavior working the way your coworker claims. Commented Jul 13, 2021 at 21:32
  • I also illustrated that a failover in a cluster to an unpatched node is possible without any failures. My question did not include AG, only HA clustering. I realize the databases are affected and will suffer becuse of a lower patch level but in my testing they seem to work normally. Commented Jul 15, 2021 at 16:39
  • Apologies, rare to see FCI without HA these days. I still maintain its a bad idea, "As you upgrade each node, it is left out of the possible owners of the failover cluster instance. If there is an unexpected failover, the upgraded nodes do not participate in the failover until Windows Server failover cluster role ownership is moved to an upgraded node by setup." based on the docs and not worth doing at all if you care about your data. Commented Jul 15, 2021 at 19:27

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.