Equallogic takes time to kick in the additional paths under Windows MPIO
I’ve spent almost 4 hours on-phone from mid-night to 4am in the morning trouble shooting with Dell Equallogic Consultants in US via WebEx today.
As we found the EQL I/O testing performance is low, only 1 path activated under 2 paths MPIO and disk latency is particular high during write for the newly configured array.
It was finally solved because we forgot the most fundamental concept after all that is Equallogic takes time to kick in the additional paths under MPIO!!! You need to wait say at least 5 mins to see the rest paths kick in.
The followings are my findings and mostly email exchange with Equallogic support. yes, it is long and boring to many, but it’s extremely useful for some who are seeking the same solutions for this problem, I wish someone put it on their blog previously, then I could sleep much better last night.
Timeline as in Descending Order:
- 2pm
We found a very interesting fact that the 2ND LINK WILL ONLY KICK IN AFTER THE 1ST LINK BEING SATURATED/OVERLOADED for a period of time, see 1.gif and 2.gif, So MPIO with Dell EqualLogic DSM (not using Microsoft Generic DSM) is actually working perfectly now and before!
1.gif showing both links are activated, I saw the 2nd link (EQL Mgt 2) suddenly kicked in (may be we opened more copy windows to iSCSI target) and it dropped out again and then come back again when needed.
2.gif shows the performance of the two active ports on EQL Iscsi target also increased by a lot. (From 45% to 80%)
So I can pretty sure the issue doesn’t exist right from the beginning, it just TAKES TIME FOR THE REST NICs (LINKs) to be activated gradually over the testing period and according to loading situation automatically. Previously, we only tested for less than 2 mins, in other words, we didn’t give enough time for MPIO intelligent logic to kick in additional paths for throughput or I/O.
- 12pm
See attached TR1036-MPIO_EQLX-DSM.pdf PS Series Best Practices
Configuring and Deploying the Dell EqualLogic™ Multipath I/O Device Specific Module (DSM) in a PS Series
MPIO DSM Load-Balance Policy
Microsoft MPIO DSM allows the initiator (server) to login multiple sessions to the same target (storage), and then aggregate that into a single device. Multiple target sessions can be established using different NICs to the target ports.
If one of the sessions fails, then another session continues to process I/O without interrupting the application.
Dell EqualLogic MPIO DSM supports following balancing policies.
• Fail Over Only: Data is sent in one path, while other paths are standby. This connection is used for routing data until it fails or times out. If the active connection fails, then one of the available paths is chosen until the former is available. This load balancing policy is the default configuration when MPIO DSM is disabled.
• Round Robin: All available paths are used to perform I/O in a rotating sequence (round robin sequence). There is no disruption in sending I/O even if any of the paths fails. Using this policy, all paths are used effectively.
• Least Queue Depth: I/O is sent to the path that has least queue length. The performance analyses for the above load balancing policies are presented in the following sections.
• EQL recommend to use Microsoft DSM with “Least Queue Depth” load balancing policy on Windows Server 2003/2003
• To fully utilize Microsoft’s MPIO capabilities, Dell EqualLogic provides MPIO DSM that is complementary to ASM for both high availability and performance.
- 11am
I found something very important on google.
Device Initialization Recall that MPIO allows for devices from different storage vendors to coexist, and be connected to the same Windows Server 2008 based or Windows Server 2003 based system. This means a single Windows server may have multiple DSM’s installed. When a new eligible device is detected via PnP, MPIO attempts to determine which DSM is appropriate to handle this particular device.
MPIO contacts each DSM, one device at a time. The first DSM to claim ownership of the device is associated with that device and the remaining DSMs are not allowed a chance to press claims for that already claimed device. There is no particular order in which the DSMs are contacted, one at a time. The only guarantee is that the Microsoft generic DSM is always contacted last. If the DSM does support the device, it then indicates whether the device is a new installation, or the same device previously installed but which is now visible through a new path.
Does this means if we see multiple DSM in MPIO, DELL Equallogic will be always used first or it’s priority is always higher than MS DSM?
- 10am
Some update I found: Even I added back with mpclaim -r -i -d “MSFT2005iSCSIBusType_0×9″
MPIO is still showing Dell Equallogic is the DSM instead of Microsoft, how can I force MPIO to select Microsoft instead of Dell Equallogic as desired? That exactly explained why there is ONLY ONE PATH (or NIC) working at the same time, but not load balancing across two NICs.
I even did a real time test, by Disabling a NIC, then all traffic automatically shifted to the 2nd NIC (or path) and vice versa. So seemed Windows Server 2008 R2 doesn’t understand Dell Equallogic DSM for MPIO. In other words, if Dell Equallogic is the DSM, then only one path is available.
I also find out from Google, that Windows Server 2008 DOES NOT add “MSFT2005iSCSIBusType_0×9″ automatically like in Windows Server 2003, we need to add it manually from MPIO GUI or CLI.
See the output.
C:\Users\Administrator>mpclaim -s -d
For more information about a particular disk, use ‘mpclaim -s -d #’ where # is
he MPIO disk number.
MPIO Disk System Disk LB Policy DSM Name
——————————————————————————-
MPIO Disk0 Disk 2 RR Dell EqualLogic DSM
C:\Users\Administrator>mpclaim -s -d 0
MPIO Disk0: 02 Paths, Round Robin, ALUA Not Supported
Controlling DSM: Dell EqualLogic DSM
SN: 6090A078C06B1219D3C8D49CF188CD5B
Supported Load Balance Policies: FOO RR LQD
Path ID State SCSI Address Weight
—————————————————————————
0000000077070001 Active/Optimized 007|000|001|000 0
0000000077070000 Active/Optimized 007|000|000|000 0
C:\Users\Administrator>mpclaim -r -i -d “MSFT2005iSCSIBusType_0×9″
So the KEY question is how can we FORCE MPIO DSM TO USE Microsoft instead of Dell Equallogic?
- 9am
1. Removed MPIO from W2K8 Feature, reboot, then remove HIT, reboot, and re-installed again, reboot, under MPIO, still no MSFT2005iSCSIBusType_0×9.
2. This time, I changed the NIC’s Flow Control to TX & RX and reading performance of EQL also increased to 99%.
I do think we need to enable Flow Control RX as well, as we saw yesterday, only writing to EQL is working at 99%, but reading from EQL is at 20%, so this proved it’s required.
3. Also, disk latency for read is very small (39ms compares to 350ms for write) when we saturated the link using multiple 16GB files, however, writing to EQL and overloading the link still gives us over 300ms disk latency. Those high number of Re-transmit % all went down from 5-6% to 1-2%.
4. No more MPIO initiator dropping out problem even without MSFT2005iSCSIBusType_0×9 in place, it may not be necessary after all?
As I installed HIT twice, MSFT2005iSCSIBusType_0×9 is not there as always, I suspect manually adding it can actually cause more problem? Or shall I remove MPIO from W2K8 Feature and just install it again manually to see if MSFT2005iSCSIBusType_0×9 pops up?
Extra Notes:
MPIO CLI Comands
mpclaim -r -i -d “MSFT2005iSCSIBusType_0×9″
(Note: HIT installation on Windows Server 2008 R2 DID NOT add this to MPIO)
mpclaim -s -d
mpclaim -s -d device_name
mpclaim.exe –v C:\Config.txt
C:\Users\Administrator>mpclaim -s -d
For more information about a particular disk, use ‘mpclaim -s -d #’ where # is
he MPIO disk number.
MPIO Disk System Disk LB Policy DSM Name
——————————————————————————-
MPIO Disk0 Disk 2 RR Dell EqualLogic DSM
C:\Users\Administrator>mpclaim -s -d 0
MPIO Disk0: 02 Paths, Round Robin, ALUA Not Supported
Controlling DSM: Dell EqualLogic DSM
SN: 6090A078C06B1219D3C8D49CF188CD5B
Supported Load Balance Policies: FOO RR LQD
Path ID State SCSI Address Weight
—————————————————————————
0000000077070001 Active/Optimized 007|000|001|000 0
0000000077070000 Active/Optimized 007|000|000|000 0
C:\Users\Administrator>mpclaim -r -i -d “MSFT2005iSCSIBusType_0×9″