Foundry ServerIron load balancing utilizing DSR with busy windows client
As the title suggests, this article is about load balancing w/ Foundry ServerIrons (in this case a chassis-based GT-C2, however this article likely applies to all ServerIrons at least back to the XL series) when you have windows clients connecting to a pool of servers behind the load blancer(s). This may also apply to other operating systems making many connections to the ServerIron's virtual IP (VIP), however it has been confirmed as a problem with Windows.
The Problem
In our particular case, we have a customer who has a bunch of webservers serving an application from behind a pair of ServerIron's, with a large pool of Windows servers (in this case running Flash Communication Server) connecting very rapidly to this VIP. The problem manifests itself as seemingly "random" connection time outs from the Windows clients, when they are attemptinng to talk to the cluster. These connection failures generally seem to follow only a few patterns. Namely, under test load only (no live connections) we were unable to reproduce the problem. Only when we had live traffic on the Windows machines, and the large number of connections per seconds, would these connection failures happen - and then seemingly following no pattern. Also, it would always take at least a few minutes for these failures to start happening, once enabling live load. Due to this, we spent days working on this problem ripping our hair out - we simply could not track down the cause.
How to tell
If you are experiencing this issue, you will see random connections simply time out. This is as if the ServerIron's are ignoring these connections, and they will simply time out. As you will see, they actually are. A packet dump may look very similar to something like this (IP's changed here of course):
(normal traffic)
19:15:51.098538 IP 10.0.203.179.80 > 10.0.202.43.2538: S 2756722926:2756722926(0) ack 301005437 win 5840
19:15:51.098632 IP 10.0.202.43.2538 > 10.0.203.179.80: . ack 1 win 65535
19:15:51.098738 IP 10.0.202.43.2538 > 10.0.203.179.80: P 1:184(183) ack 1 win 65535
19:15:51.098848 IP 10.0.203.179.80 > 10.0.202.43.2538: . ack 184 win 6432
19:15:51.099099 IP 10.0.202.43.2538 > 10.0.203.179.80: P 184:273(89) ack 1 win 65535
19:15:51.099184 IP 10.0.203.179.80 > 10.0.202.43.2538: . ack 273 win 6432
19:15:51.123312 IP 10.0.203.179.80 > 10.0.202.43.2538: P 1:431(430) ack 273 win 6432
19:15:51.123940 IP 10.0.202.43.2538 > 10.0.203.179.80: F 273:273(0) ack 431 win 65105
19:15:51.124045 IP 10.0.203.179.80 > 10.0.202.43.2538: F 431:431(0) ack 274 win 6432
19:15:51.124137 IP 10.0.202.43.2538 > 10.0.203.179.80: . ack 432 win 65105
(connections that simply are not being responded to)
19:15:51.270042 IP 10.0.202.43.2539 > 10.0.203.179.80: S 972254801:972254801(0) win 65535
19:15:51.645795 IP 10.0.202.43.2542 > 10.0.203.179.80: S 2692840145:2692840145(0) win 65535
19:15:52.922009 IP 10.0.202.43.2530 > 10.0.203.179.80: S 3244129118:3244129118(0) win 65535
19:15:53.223224 IP 10.0.202.43.2532 > 10.0.203.179.80: S 1974753718:1974753718(0) win 65535
19:15:53.424406 IP 10.0.202.43.2536 > 10.0.203.179.80: S 396531796:396531796(0) win 65535
19:15:54.230353 IP 10.0.202.43.2539 > 10.0.203.179.80: S 972254801:972254801(0) win 65535
19:15:54.732833 IP 10.0.202.43.2542 > 10.0.203.179.80: S 2692840145:2692840145(0) win 65535
19:15:59.057864 IP 10.0.202.43.2530 > 10.0.203.179.80: S 3244129118:3244129118(0) win 65535
19:15:59.258913 IP 10.0.202.43.2532 > 10.0.203.179.80: S 1974753718:1974753718(0) win 65535
19:15:59.460188 IP 10.0.202.43.2536 > 10.0.203.179.80: S 396531796:396531796(0) win 65535
19:16:00.263873 IP 10.0.202.43.2539 > 10.0.203.179.80: S 972254801:972254801(0) win 65535
19:16:00.867377 IP 10.0.202.43.2542 > 10.0.203.179.80: S 2692840145:2692840145(0) win 65535
(and finally, the resumption of normal traffic)
19:16:11.028382 IP 10.0.202.43.2644 > 10.0.203.179.80: S 1922631763:1922631763(0) win 65535
19:16:11.028481 IP 10.0.203.179.80 > 10.0.202.43.2644: S 3064614815:3064614815(0) ack 1922631764 win 5840
19:16:11.028560 IP 10.0.202.43.2644 > 10.0.203.179.80: . ack 1 win 65535
19:16:11.028663 IP 10.0.202.43.2644 > 10.0.203.179.80: P 1:184(183) ack 1 win 65535
19:16:11.028759 IP 10.0.203.179.80 > 10.0.202.43.2644: . ack 184 win 6432
19:16:11.028804 IP 10.0.202.43.2644 > 10.0.203.179.80: P 184:269(85) ack 1 win 65535
19:16:11.028890 IP 10.0.203.179.80 > 10.0.202.43.2644: . ack 269 win 6432
What you are seeing here, are a bunch of SYN packets simply not being replied to by the load balanced pool. At first we thought we were either having a bizzare networking issue in between the path of the clients and servers (possibly STP related), or the Serverirons were triggering some form of SYN/TCP flood control. Packet dumps on the real web servers, will show that they are simply never seeing any traffic for the failed connections.
The Cause
The cause is twofold, and we found this out entirely by chance from looking at packet dumps. Windows 2003 server, by default, only uses the TCP ports 1025->5000 for outgoing connections. Windows also has a default CLOSE_WAIT timeout of 4 minutes. If you do some simple math, you will see that you can pretty easily run Windows out of available outbound TCP ports with only marginal connections per second, but that is a different topic for a different day.
In addition to this, ServerIron's default procedure for DSR connections, is to continue to track the session in it's internal session table until it sees both FIN's sent by the client to close a TCP session. This mean, Foundry will not start the session aging process until these two FIN's are seen. Since Windows is holding open the TCP session for 4 minutes, Foundry is only seeing the single FIN.
What is then happening, is Windows is re-using a port within 4-5 minutes or whatnot, and Foundry still has this source IP:source port pair in it's session table. Foundry sees a SYN packet for what is clearly a connection in CLOSE_WAIT, and will silently drop these (as it should) since if they do not it opens up the door for TCP hijacking and all sorts of fun stuff.
The Fix
The simple fix, is to simply add "port XXX dsr fast-delete" to your virtual server configs on the ServerIron. This can be accomplished by:
SI# conf t
SI(config)# server virtual www.mysite.com
SI(config-vs-www.mysite.com)# port http dsr fast-delete
SI(config-vs-www.mysite.com)# exit
SI(config)# wr mem
This, according to Foundry will cause the Foundry to immediately start the aging process of the connection when it sees the first FIN. There is a small chance you could still see the problem on very busy Windows servers, so we also suggest you increase the default port range Windows will utilize for making outbound tcp connections, and possible change the default length of time Windows holds open a tcp session in CLOSE_WAIT.
From Foundry:
Add "dsr fast-delete" to port commands under virtual server config.
This feature places a session in an accelerated session timeout queue
upon seeing the first FIN in DSR (as opposed to the standard two FINs).
The session is timed out in 8 seconds instead of the standard session
age.
By using the port dsr fast-delete command, upon receiving first
FIN from a client, the SI puts sessions in a deletion queue, thus
speeding up the deletion process.
Syntax: [no] port dsr fast-delete
Example:
server virtual vs
port dsr fast-delete
All done!
That's it! Seems pretty simple once you know what's going on. Unfortunately, if you don't have pre-existing knowledge (we're not Windows guys!) of how easy it is to have windows re-use it's outbound port pool, you likely won't even think of this as being a problem. We wasted days of our time checking everything else, which was quite frustrating. Hopefully this helps someone out!
Thanks
Thanks go to Matt Hallacy of Reflected Networks for being the one to go "hey! wow! Windows is sure re-using it's local ports a lot!" - and the rest, you may say, is history. Also, thanks to our client who had to deal with this issue while we worked to track it down.