“A new application server has been installed in the DMZ and the DBA is using SQL*Plus to make a connection to the Database server in the internal network. The connection seems to start OK, but then freezes. Pressing CTRL+C is also ignored and won't get the DBA back to the command prompt.”
During the test, all packets to/from the Database server were captured on the application server. Have a look at the packets and pick the correct answer (click to download trace file).
A. The 'TCP Previous segment not captured' message in frame 20 means that one or more packets from the Database server were not written to the pcap file, even though the application server received them properly.
B. The Database server did not receive packet 18 so the client sends a retransmission after a retransmission-timeout (RTO) of 200ms.
C. There is a mismatch in the configuration of the Ethernet interfaces on the two servers and the firewall.
D. The URG flag in frame 21 is used by the SQL*Plus client to try to speed up the connection process.
E. There is a bug in the SQL*Plus client that makes it send out a packet that the database server does not understand, so it does not respond to the request in packet 18 and 19.
Retransmissions and missing data
The packets were indeed captured on the application server (192.168.0.10). This can be checked by looking at the 3-way handshake. There is a delta of 981 microseconds between the SYN and the SYN/ACK. Then there is an 18 microsecond delta between the SYN/ACK and the ACK. This means the capture was made on (or at least very near) to the sender of the SYN and the ACK.
By looking at the 3-way handshake, we also learned that the RTT between the application server (APP) and the database server (DB) is about 1 millisecond.
Up until frame 18, everything looks fine. There are requests from the APP to the DB and the DB is responding quickly (1-25 milliseconds). But there is no response to the request in frame 18. Not even a bare ACK. ~200ms later the request is repeated by hitting the retransmission-timeout (RTO). The retransmission does get ACKed in frame 20. But it is marked with “[TCP Previous segment not captured]”. This is because in the last frame of the DB (frame 17), the next expected sequence number was 913, but now the sequence number from the DB is 3092.
Wireshark did not see 2179 bytes of data and alerts with “[TCP Previous segment not captured]”. To validate answer “A”, we need some proof of whether the APP saw that data or not. In frame 21, the APP is sending data again (after 12 seconds of silence), with an ACK value of 913. This means the APP also did not see the data and it was not a capturing problem. This means answer “A” is incorrect.
URG flag is not used to speed up the connection
If we look more closely to the ACK in frame 20, we can see a SACK block with a left edge of 1047 and a right edge of 2383. Because the ACK value is 2383, this means this is not a selective acknowledgement (SACK), but a selective duplicate acknowledgement (DSACK). A DSACK tells the sending side it received a part of the data twice. In this case it received bytes 1047-2383 twice. Because of this DSACK we know the DB received both frame 18 and frame 19 in which bytes 1047-2383 were sent. This means answer “B” is incorrect as even though there was indeed an RTO re-transmission, this was not caused by the DB not receiving packet 18.
The fact that there were 2179 bytes being sent from the DB in response to having received packet 18 also means answer “E” is incorrect, as the DB did respond to the request from the APP in packet 18.
Then frame 20 has the URG flag set. This is caused by pressing CTRL+C on the APP. What the URG flag means is that the URG pointer field is significant. And the URG pointer field is used to mark (part of the) TCP segment data as urgent, as more important than other bytes of the segment data. In this case it marks the whole segment as urgent, because it wants the other side to take notice of the CTRL+C before processing other outstanding data in the TCP buffer. So, the URG flag is not used to speed up the connection, it is used to prioritize the closing of the session. And this means answer “D” is also incorrect.
Mismatch configuration between the Ethernet, the servers and the firewall
This leaves answer “C” to be the correct answer. But how does a problem at the TCP layer be caused by a misconfiguration of the Ethernet interfaces. For that, we need to take a second look at the 3-way handshake. Both the SYN from the APP and the SYN/ACK from the DB have an MSS value of 8960. This means both the APP and the DB have an MTU of 9000 configured on their Ethernet interface. The response of the DB to the request from the APP in frame 18 has a length of 2179 bytes.
Because of the MSS of 8960, this response was sent in one segment. The resulting packet was sent over the routers and switches to the firewall. But the firewall is configured with a default MTU size of 1500 on all of its Ethernet interfaces and therefore it drops this over-sized packet. The DB retransmits this packet over and over again, but the firewall drops each and every retransmission too. And that is why the connection freezes, the APP has sent its request (and it did get an ACK on the retransmission), so it waits on the DB and the DB has sent the response, but the TCP layer has a problem delivering it to the APP.
There are several possible solutions to this problem. But since the APP needs to communicate with systems on the Internet with an MTU of 1500, we changed the MTU on the APP to 1500. Now it sends out an MSS of 1460 forcing the DB to segment larger responses into segments of 1460 bytes.
Don't forget to check Wireshark Filters article if you want to learn more about a portable network capture solution that flawlessly integrates with Wireshark or have a look at other Wireshark Filters that our engineers use.