Network Optimisation for Office 365 and other external or cloud services

These are some notes from a short video I just watched. Scanning through these notes will take less time than watching the whole 13 minute video. Plus I’ve added links and more info

Latency

PsPing is one of the SysInternals tools written by Mark Russinovich. It has a nifty feature that allows you to test latency using TCP rather than ICMP, which is what the regular ping command uses. This is beneficial because you frequently can’t ping out of corporate networks, and some external services block inbound ICMP too. Additionally, network devices often give ICMP traffic low priority, which can skew results. Usage: psping <site>:<port> e.g.
C:\sysint>psping outlook.office365.com:80
PsPing v2.01 – PsPing – ping, latency, bandwidth measurement utility
Copyright (C) 2012-2014 Mark Russinovich
Sysinternals – http://www.sysinternals.com


TCP connect to 132.245.226.34:80:
5 iterations (warmup 1) connecting test:
Connecting to 132.245.226.34:80 (warmup): 35.01ms
Connecting to 132.245.226.34:80: 39.88ms
Connecting to 132.245.226.34:80: 38.83ms
Connecting to 132.245.226.34:80: 39.75ms
Connecting to 132.245.226.34:80: 46.42ms

TCP connect statistics for 132.245.226.34:80:
Sent = 4, Received = 4, Lost = 0 (0% loss),
Minimum = 38.83ms, Maximum = 46.42ms, Average = 41.22ms

Low latency is important as it leads to a much snappier experience. If the latency is high, you can try using PsPing to test various parts of your network that the HTTP traffic is flowing through, e.g. proxy servers.

TCP Window Scaling

This can significantly improve the transfer speed when dealing with large amounts of data (e.g. uploading/downloading files, attaching files to web-based email etc.). TCP Window Scaling allows more data to be sent before an acknowledgement is required from the other end of the connection. This can help a lot on connections with high latency, as the default TCP window size is 65KB, whereas the maximum it can increase to with TCP Window Scaling is 1GB. TCP Window Scaling is defined in RFC1323.

This one is a little more fiddly to test for. You need to use packet capture software, such as Wireshark.

  1. Install and load Wireshark, then go to the Capture menu and chose Options. In the Capture Options dialogue box, type the following into the Filter text box:
    host <some ip address> e.g. host 192.168.1.10
    This helps to reduce the amount of traffic that Wireshark will capture down to just packets that are sent to or from the IP address you specified.
  2. Run the capture, do some stuff that involves network communication with the host you specified, then stop the capture.
  3. To check for TCP Window Scaling, in the main Wireshark window, type into the Filter box: tcp.window_size_scalefactor!=-1
    This will filter out all packets where there is no TCP Window Scaling, so if all the captured packets (in the top pane of the Wireshark window) disappear at this point, TCP Windows Scaling probably isn’t working between your PC and the host you tested against. If it is working, you’ll still have some packets left showing.
  4. To verify, select one of the packets in the top pane, and expand the Transmission Control Protocol line in the middle pane. You should see a few lines saying something like:
    Window size value: 59
    [Calculated window size: 15104]
    [Window size scaling factor: 256]

From Windows Vista onwards, the TCP/IP stack implements Receive Window Auto-Tuning, which allows the TCP Window Scaling value to change dynamically.

Note that on some Windows versions, if your network location is set to “Public”, Windows might be restricting the upper limit of TCP Window Scaling. You can check by using the command:
netsh interface tcp show heuristics
TCP Window Scaling heuristics Parameters
———————————————-
Window Scaling heuristics         : disabled
Qualifying Destination Threshold  : 3
Profile type unknown              : normal
Profile type public               : normal
Profile type private              : normal
Profile type domain               : normal

The above is from Windows 8.1 with default settings, and it’s all looking good – heuristics are disabled and all the profile types are set to normal. The profile to check for is profile type that matches your current network location. You can find your current network location via:
netsh advfirewall monitor show currentprofile
Private Profile:
———————————————————————-
RCMTechWiFi
Ok.

If you want to disable the heuristics (if it’s enabled, and your network location setting is showing as “restricted” or “highlyrestricted”) use the command:
netsh interface tcp set heuristics disabled

Having said all of the above, the people most likely to notice a difference are those with both higher latencies and higher bandwidth (e.g. 50+ms and 100+Mbps).

DNS

Cloud providers often give different DNS responses based on where you’re doing the lookup from. For example, I’m based in the UK and get the following:

nslookup outlook.office365.com
Server: router.asus.com
Address: 192.168.1.1

Non-authoritative answer:
Name: outlook-emeawest.office365.com
etc.

Note how it’s given me the answer as being EMEA (Europe, Middle East & Africa), which is correct for where I am. If you were in e.g. the US, Japan, etc. you should get a different response. The key here is that you want to make sure that your clients are connecting to the correct place for where they’re located. If you’re physically located in Japan, but due to some internal company network or VPN end up getting your cloud application via a boundary internet connection in Europe, all your data is going to be doing a big round trip from Japan to Europe and back, rather than just talking to the lower latency servers in a more local datacentre.

MTU

This operates at a different OSI layer to the TCP Window Scaling. It is the Maximum Transmission Unit, and tends to default to 1500 bytes. It consists of the TCP and IP headers (20 bytes each), plus the data payload. It is viewed via the command:

netsh int ip show int

Idx Met MTU State Name
— ———- ———- ———— —————————
1 50 4294967295 connected Loopback Pseudo-Interface 1
3 25 1500 connected WiFi
6 40 1500 disconnected Bluetooth Network Connection
7 5 1500 disconnected Local Area Connection* 3

I’m using the WiFi interface at the moment, so my MTU is 1500. However, the MTU that my PPPoE ASDL modem can cope with is 1492. So what happens? If Windows starts to communicate with a remote server, a thing called Path MTU Discovery happens, where the two machines try to determine the largest MTU that they can use without the packet being fragmented by a piece of network kit along the way.

Fragmentation leads to reduced throughput due to the “wasted” space taken up the TCP+IP headers, plus potentially some processing time. If the two machines at either end of the link can do 1500 but something in the middle can’t go that high (e.g. my ADSL modem) the device with the limiting MTU should send an ICMP message reporting its MTU. This happens until everything has been reduced to a level at which fragmentation is avoided. If this gets too small, the protocol overhead from the 40 byte header can become significant. There’s some extra detail on this plus calculations here.

However, as I’ve discovered, some network devices don’t work properly and never send the ICMP message (or the message is lost along the way). This leads to what is known as a Black Hole, whereby the “too large” packet is dropped, but the machines at either end of the link aren’t told. The effect to the end user or application is data just not getting through, but often only sometimes. In my case, this exhibited itself by web browsing and most SMTP email being fine, but emails sent from certain hosts were lost, the sender receiving a “bounce” message saying that the connection to my mail server had timed out.

Knowing that the ADSL modem was probably to blame, and knowing that its MTU is 1492, I changed my MTU to 1492 as well and the problems went away. The command to change the MTU is:
netsh interface ipv4 set subinterface “Local Area Connection” mtu=1492 store=persistent

There’s also an option to use Jumbo Frames, which increase the MTU as high as 9000 bytes. Again though, you need to make sure that both machines plus all parts of the network support this, or you’ll be wasting your time.

This entry was posted in Networking, Performance, Windows and tagged , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , , . Bookmark the permalink.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s