-
Notifications
You must be signed in to change notification settings - Fork 46
SQLNA
SQL Network Analyzer (SQLNA) is a command-line tool that will read a network packet capture file or a series of chained files and produce a report highlighting potential areas of interest.
- Download version 1.5.2197.0
- Extract (unzip) to a folder of your choice. It is ready to use.
This is a C# application and requires the full .NET 4.x framework on Windows. .NET 4.7+ is recommended.
SQL Server Network Analyzer UI provides a graphical interface to select files for parsing and setting options. SQLNAUI.EXE must reside in the same folder as SQLNA.EXE. It will build the command-line and execute SQLNA.EXE from the GUI interface.
SQLNA captureFile [/output outputFile] [[/sql ipaddress,port]...] [/listConv] [/filterFmt NETMON | WireShark | Auto]
captureFile the network capture input file (e.g. .CAP, .PCAP, .PCAPNG, .ETL)
/output an output path (folder and file) where the analysis .log file is created. The file will be named the same as the capture file
/sql this is a hint in case there is little traffic to a SQL Server. This forces SQLNA to recognize it as SQL Server traffic
/listConv lists successful conversations that have executed at least one command. This could generate lots of info and is off by default.
/filterfmt changes the way IP addresses and ports are displayed in tables. By default they are separate columns, but the switch can force NETMON filter strings or WireShark filter strings or have the app auto-detect and decide which one to choose. For example, if you always use WireShark, even for .cap files, then you can force a WireShark filter string for all files. The Auto option applies NETMON filter strings for .ETL and .CAP files and WireShark filter strings for .PCAP and .PCAPNG files
-
Analyzes a .CAP, .PCAP, .PCAPNG, and NETSH and PKTMON .ETL files. The output in all cases is written to c:\temp\mytrace.log
SQLNA c:\temp\mytrace.cap SQLNA c:\temp\mytrace.pcap SQLNA c:\temp\mytrace.pcapng SQLNA c:\temp\mytrace.etl
-
Analyzes multiple chained capture files as if it was a single larger trace file by using a wildcard for the file name. The output is written to d:\temp\trace.log
SQLNA c:\temp\mytrace*.cap /output d:\temp\trace.log
-
Analyzes .CAP file from a capture and highlights traffic for the specified SQL server and port if there isn't enough traffic in the capture.
SQLNA c:\temp\mytrace.cap /sql 10.0.0.2,1433
- NETMON 2.x
- PCAP
- PCAPNG
- ETL (ETL files must end with .ETL)
- Ethernet
- Wi-Fi
- NDIS
- PKTMON
- Windows Firewall Packet Capture (WPF)
- Linux Cooked Capture
- WireShark loopback Capture
- IPV4
- IPV6
- GRE
- ERSPAN II & III
- VNETTag
- 802.1Q
- TCP
- UDP
- SMP
- SSRP
- TDS
- KerberosV5
- DNS
Note: SQL Network Analyzer only analyzes TCP traffic, not Named Pipes or Shared Memory. The client and server need to be on different machines for conversations to show in the network trace or use WireShark loopback tracing.
The .log file lists a rich set of data points:
- All files read including the beginning and ending timestamps found in each file to help line up with other logs.
- The amount of SQL Server traffic found compared to overall TCP traffic.
- All SQL Servers found, regardless of port number, along with statistics about each server.
- All Domain Controllers found and how many DNS, Kerberos, LDAP, and MSRPC conversations are seen for them.
- Optional, off by default: Statistics about all successful connections that execute queries.
- All SQL conversations that end with a reset connection, along with statistics.
- All SQL conversations where the SQL Server closes the connection with an ACK+FIN packet before the client does.
- All SQL Conversations where there is one or more Zero Windows packet.
- All TCP conversations that are SYN failures, probe connections, or otherwise do not contain any packets with a payload.
- All SQL conversations that do not complete a login, or where the LoginAck comes after the connection was closed.
- All connections to the DC that are not responded to.
- All Named Pipes connections found.
- All SQL conversations that had an Attention packet.
- All SQL conversations that did not use TLS 1.2 and whether requested by the client or forced by the server.
- All SQL conversations that were redirected (via Application Intent = readonly or the Azure DB Gateway) to another server.
- Was PKTMON used to take the trace, are there any drops or delays longer than 2ms.
- Ephemeral port usage.
- SQL Browser traffic.
- Failed Kerberos SPN requests.
- Failed DNS requests.
- A CSV file with statistics on each TCP and UDP conversation that you can filter and pivot in Excel.
- A diagnostic file containing the console window output plus additional information.
Many of the reports will contain a packet visualization of either the first 20 or last 20 packets in the conversation.
Scenario | visualization |
---|---|
Probe Connection |
>S <AS >A >AF <A <AF >A TCP open and close |
SYN failure to make TCP 3-way handshake |
>S >S >S or just >S
|
WinsockListenBacklog is full | >S <AR >S <AR >S <AR |
Keep-Alive failure |
>BAT >KA >KA >KA >KA >KA >KA >KA >KA >KA >KA >AR No response from the server |
SQL closes the connection first |
>S <AS >A >PL <PR >CH <SH <CONT >A >KE <AF >A >AF <A Server ACK+FIN comes before the client's one. |
Login progress |
>S <AS >A >PL <PR >CH <SH <CONT >A >KE <AF >A >AF <A Diffie-Hellman stopping point; Server ACK+FIN after the Key Exchange. |
Unknown reset connection |
>BAT <DATA >BAT <DATA >A >BAT <DATA >A >AR Maybe a 3rd-party client that closes using >AR instead of >AF, especially if all the connections end this way without Keep-Alive packets or retransmit packets. |
Reset after Zero Window |
>A >A >ZW <ZWP >ZW <ZWP <ZWP <ZWP <ZWP <AR Client issues Zero Window because the receive buffer is full and the Server probes the client to see if it is consuming data. If not, the connection is reset. |
Legend:
-
>
packet from the client -
<
packet from the server -
S A AS AF AR R AP
– generic packet showing the TCP flags (Syn, Ack, Fin, Reset, and Push) -
PL PR CH SH CE KE AD NC NR SS LA ERR
- SQL login packets: Prelogin and response, TLS Client Hello, Server hello, Cipher Change Spec, Key Exchange, encrypted Login Packet or other encrypted packet (App Data), NTLM Challenge and Response, SSPI, Login Ack, Login Error -
BAT RPC ATTN DATA
- SQL post-login packets: SQL Batch, RPC request, Attention, Tabular Data result -
SmpA, SmpS, SmpF
– MARS flow control packets -
RET
- retransmitted packet -
KA
- idle connection Keep-Alive packet -
ZW ZWP
- Zero Window packet and Zero Window Probe packet -
CONT
- continuation Frame; seen when a large packet spans more than one Frame
This report displays the version of SQL Network Analyzer, the command-line used to run the tool, and the date on which the analysis was performed.
This report displays all the files included in the report, including the start and end time of each file, and the file size. In later reports, the Files column shows which file the conversation starts and ends in to make it easier to locate the beginning or end of the conversation for manual analysis.
Note: Files are ordered by the timestamp of the frames in the file, not by filename and not by the file's timestamp. If the last file of a chained NMCAP capture was terminated improperly, the frame table won't be written and it may appear first in the list.
WARNING SQL Network Analyzer loads all files into memory to perform the analysis. The size of all files should not exceed RAM * 0.8. If you have a large number of chained files, you can partition them in subfolders and analyze each group separately.
While SQL Network Analyzer can read files larger than 2GB, but many analysis tools cannot. To prevent problems and to improve filter performance when performing a manual analysis, generate chained captures of between 100MB and 200MB.
This report shows the total amount of TCP traffic in the trace and the total amount of SQL traffic in the trace. If the amount of SQL traffic is a low % of the overall traffic and there are connection reliability issues, it could be an indication of a large amount of other traffic, such as a backup, flooding the network and reducing reliability.
This report also lists the IP address of the machine the network capture was taken on. This is based on having a "Bad checksum" when the machine is sending packets and Checksum Offloading is enabled. If Checksum Offloading is not enabled, then the IP address cannot be identified.
This report identifies all the SQL Server instances found in the network trace, even if not on port 1433, and various properties of the servers and their traffic statistics. The statistics show how busy each server is and if there are any problems that you should be aware of. Some information is gained from login packets and SQL Browser traffic. If these items are not present in the trace, some columns may be blank. These statistics will be broken out in later reports.
This report displays all domain controllers that could be identified from conversations on ports 53 (DNS), 88 (Kerberos), or 389 (LDAP). MRSPC conversations (ports vary) are also identified as SQL Server uses this service to confirm NTLM logins.
The following Domain Controllers were visible in the network trace:
IP Address Files Clients Conversations Kerb Conv DNS Conv LDAP Conv MSRPC Conv MSRPC Port Frames Bytes
---------- ----- ------- ------------- --------- -------- --------- ---------- ---------- ------ -------
10.10.10.1 0 2 80 0 61 0 5 49673 448 104,027
10.10.10.2 0 1 17 4 0 5 7 49673 292 111,353
This report displays which conversations were seen to have logged in successfully. Conversations that started before the trace was started are omitted, as are conversations that fail to log into the database or are fully encrypted (Encrypt=Yes).
Each SQL Server has a separate table in the report.
This report displays the conversations that contain one or more RESET or ACK+RESET packets, and who emitted the first RESET and when.
Note: In general, RESET packets are emitted upon receiving additional packets after the conversation was terminated by an ACK+RESET or an ACK+FIN. MARS connections always end with a RESET as the client emits an SMP:FIN packet followed immediately by an ACK+FIN packet. When the server responds with its own SMP:FIN packet, the client emits a RESET packet. This particular closing sequence is normal. Some non-Microsoft drivers also terminate normal conversations with an ACK+RESET rather than an ACK+FIN.
The Max RT column shows the most Retransmits of a single packet as opposed to the total number of retransmits in the conversation. Depending on network configuration, somewhere between 3 and 8 generally indicates a bad connection. Default is 5.
The KA Timeout column shows how many Keep-Alive packets in a row were not responded to. When the counter gets to 10, the connection is considered bad and Windows resets it.
Each SQL Server has a separate table in the report.
After each table is a histogram, plotting the resets over time. Each space is 1% of the time span of the total network trace. The vertical axis is a log scale, increasing by 3x each level. This can give a visual feel for whether the resets are randomly scattered or clustered and likely due to some more global event.
This histogram shows clustered resets. This could be due to an overloaded switch dropping a number of conversations at once, or some other similar reason.
This report displays all the connections where SQL Server closes the connection before the client. Normally, the client initiates all closing of connections. However, in certain abnormal circumstances, the server may do so.
This report shows all TCP connections that do not contain any application payload. Examples are SYN failures, which can be problematic if this is the server you intend to connect to, or it can be benign, in the case of an Always-On secondary in a multi-subnet configuration. Since SQL Network Analyzer relies on user application payloads to determine if a connection is for a SQL Server, all TCP failures are shown, so you may see HTTP failures or SMB failures, as well.
Probe connections will also show in this report. These typically consist of a TCP 3-way handshake followed immediately by the closing handshake.
Other visualizations are shown in a table above.
This report lists all conversations for which a login sequence was found, but SQL Network analyzer was not able to find the LoginAck packet, i.e. the server response packet containing the LoginAck token. This could represent a login failure, i.e. the connection was closed before the login could be completed, or perhaps a probe connection.
It also lists any conversations that received a login failure response, along with the error message, and it flags failures due to Diffie-Hellman compatibility issues.
The histogram lets you visually get a feel for the distribution of the login timeouts and failures.
The login progress flags indicate which types of login packets were found in the connections. For example, if you just see S PL PR, then SQL Server has likely run out of available worker threads at the moment, especially if there are a cluster of similar failures.
Note: Some false positives may also be displayed:
- Encrypted logins will have the LoginAck packet encrypted so the application cannot detect it.
- The login continues after the trace stopped.
- Already opened encrypted connections. Note: most of these are suppressed.
This report shows connections where the login took more than 2 seconds and what stage of the login process contributed to the delay. Conversations in this report may be login failures and show in that report, as well, but it also may show some successful conversations that were not fatally delayed. The purpose of the report is to show if there is a pattern to the delays, even if not fatal.
This report has the same breakdown as the Login Failure report, plus some additional columns:
- AS = Time from the initial SYN packet until the server's ACK+SYN packet.
- SS = Time from the Login packet (AD) until the SSPI packet.
- LA = Time from the prior packet (AD, NR, or SS) until the login acknowledgement packet.
- ER = Time from the prior packet (AD, NR, or SS) until the login failed (ERror) packet.
Packets not found in the login sequence are shown as blank in the report. The delay is from the last non-blank prior packet. If all prior packets are blank, then the delay is from the start of the network capture.
This report lists all failed connections (SYN packets with no response) to the identified domain controllers. The start and end offsets can be used to line up any failures with login failures and may point to issues with the DC affecting the timeliness of login processing. Events appearing in the histogram will show slightly earlier than the connection failure as this is based on the time of the final packet in the conversation.
This report lists all Named Pipes connections to SQL Server found in the network trace. It does not perform any further analysis. This is mainly so you can alter the connection strings to use TCP, which allows for a better analysis.
In this example, a connection was made to each of two SQL Server instances on the same server. Note the client port # is the same for both connections.
The client generally raises an Attention to the server if it times out a query and needs to cancel it.
TLS 1.2 is supported by SQL Server 2008 and later, as well as related driver versions. This report can show if any SQL Servers or clients have not been updated to the latest security standards.
In an Always-On cluster, you can offload read-only traffic from the primary server. In SQL Azure databases, the login gateway service will redirect connections to the actual SQL Server IP address and port number. This report displays information about any connections that are redirected to another server.
Normally, Windows 2008 and later allows for 16K outgoing connections from any particular client. If a client makes 70 connections/sec in a sustained manner to SQL Server or any other service, it could potentially exhaust the port pool and fail new connections because a new port could not be granted.
It can also be a sign the application is not using connection pooling. This is time consuming and resource intensive on the server and could result in connection delays or timeouts.
The table is only displayed when 1000 connections/minute are reached.
This report shows whether any SQL Browser traffic is reported. SSRP is a UDP-based protocol. It is connectionless in nature.
It will show one of four states:
- There was no SSRP traffic at all for the SQL Server. This is normal if listening on port 1433 or if the port is hard-coded or specified in a SQL Alias.
- There is SSRP traffic and all requests were responded to.
- There is SSRP traffic and none of the requests were responded to. This typically indicates a firewall issue and UDP port 1434 should be unblocked on the server and/or on a network-based firewall.
- There is SSRP traffic and only some requests were responded to. This can indicate an issue with the SQL Browser service itself, and probably should be restarted at a minimum.
This report shows Kerberos SPNs that could not be found or produced some other error. SQL SPNs normally have the MSSQLSvc service type, but since you can specify a custom SPN that could have any service type in the connection string, or even the SQL Server service account in place of an SPN, the report shows all failures.
The report also shows whether delegation was requested, which is the norm for SQL SPNs.
This report shows all DNS requests that resulted in an error response. This could affect connectivity if your SQL Server name appears in the list.
For anyone interested in checking out the code, the following page goes over the basic application structure.