Whether IT likes it or not, home networks and local internet service providers (ISP) are now part of the corporate network infrastructure. IT teams must monitor and diagnose performance issues that involve an employee’s home network. The new reality requires IT to shed traditional assumptions that people are in the office and apps are in the data center and use an updated methodology to troubleshoot performance problems.
Digital experiencing monitoring (DEM) solutions meet these new challenges. DEM tools leverage a combination of synthetic transaction monitoring, network path monitoring, and endpoint device monitoring to measure, triage, and diagnose issues from the end-user’s perspective.
DEM tools shed light on the home network so that desktop, network, and security teams can use the following steps to diagnose performance issues:
Step 1: Objectively measure digital experience
Step 2: Rule out the application
Step 3: Examine the endpoint device and the network:
Zscaler, the leading cloud security vendor, has recently introduced its own DEM solution called Zscaler Digital Experience (ZDX), integrated with its Zero Trust Exchange cloud security platform.
The last year has changed how people work, sending most employees outside the traditional corporate perimeter. IT teams are now responsible for maintaining user experience for diverse home network connections—much of which isn’t under their direct control.
Enterprises must have visibility into all the traffic connecting to all the assets in their distributed network. DEM solutions fill in the visibility gaps that traditional monitoring tools overlook and allow for both network teams and security teams to leverage the same data to optimize end-user experience—no matter where those users sit.
This allows IT teams to detect complex home network issues by measuring true digital experience, rule out application issues, and determine if the issues lie on the end-user device or somewhere on the network path between the user and the application host.
When a user complains about poor performance outside the office, IT must objectively verify the claim. Leveraging synthetic transactions from the end-user device is an effective method. Using a synthetic GET to the application’s URL, IT teams can see and measure page load times on the device browser. Constant user-monitoring enables the IT team to establish the necessary baseline performance measurements for comparison when users report issues. Let’s use ZDX to analyze one performance scenario.
Figure 1: Spikes shown for page load time measurements for a critical externally-facing application
Let’s drill down into this page load time data. The page-load time measurements in Figure 1 indicates that the user’s application experience degraded dramatically over several hours and resulted in several outages (noted by red circles).
Page-load time has clearly degraded. And that warrants the next investigative step: determining if the application is to blame for performance issues. We start by measuring application server response time (SRT) to see how long it takes the server to respond to a browser’s initial GET command.
Figure 2: This data correlates the server response time and page load time.
With some caveats, a high correlation between increases in the page load time and the SRT (as seen in Figure 2 above) indicates that the application might be causing performance issues. It’s essential, however, to verify that the end-to-end network latency has remained stable during this time.
Figure 3: This Network latency data over time shows stable performance.
Network latency appears to have been relatively stable during this period of slower page load times. That, coupled with the SRT correlation, offers further evidence pointing to the application itself as the user-experience-impacting culprit.
User experience problems can be triggered at an endpoint, so let’s take a closer look at Wi-Fi and client-device performance.
The health of the user’s home Wi-Fi network can contribute to performance issues. Use the page load time metrics to find when the user performance degraded. Check if end-to-end network latency also rose at the same time. If the rise in page load time times and network latency occur together, something in the user device’s path is causing the performance issue.
Check if a severe drop in the user Wi-Fi access point (AP) signal strength or bandwidth correlates with the high page load time and high latency. If so, the user’s WiFi signal strength is likely the culprit. A simple resolution might be moving closer to the AP, but other issues could be at play: Signal interference or an improperly configured home network.
Figure 4: Data showing a correlation between app performance and Wi-Fi signal strength and bandwidth
Windows and MacOS both provide a “Network Bandwidth” metric that shows estimated wireless bandwidth values for each NIC. This metric can identify fluctuations in available bandwidth, which may be caused by weak signal strength, interference, etc.
Figure 5: Data showing fluctuations in network bandwidth metrics.
Not all home networks are created equal: If performance degradation causes aren’t apparent in the application or user device, check the user’s home gateway. Older home gateways with outdated firmware are a common source of performance problems.
The figure below shows activity for a user’s device “gateway_mac_address”. This is the advertised MAC address of the gateway interface. The user’s MAC address changes often, and a “NA” response correlates with page load time spikes and connection losses.
Figure 6: Intermittent slowness/outages is affecting all applications.
Figure 7: Data examining gateway interface flapping for MAC addresses.
This indicates an unstable interface issue on the home gateway. In this case, an investigation revealed a known issue for the firmware version running on the gateway. A firmware upgrade fixed the problem.
After ruling out the application and user device, it’s time to analyze the network connection. The hops between the end-user device and the application include the home network, local ISP connection, internet backbone connection, and (in some cases) a forward proxy connection. If the local ISP network connection is the source of the performance problem, it’s important to isolate which hop causes the problem.
Outbound path traces collected from the end-user machine combined with path traces collected from a forward proxy provides critical details (an available option for existing Zscaler customers).
In the example below, page load time degradation correlates with network latency issues (peaking at over 500ms). A hop-by-hop analysis shows that most latency came from the ISP last-mile connection (a common source of excessive delay during the pandemic due to uneven or unstable local internet provider services). This same analysis could have shown latency on any internet backbone hop, forward proxy hop, etc.
Figure 8: Page load times showing application performance drop.
Figure 9: Network latency spikes in the last mile
A digital experience monitoring (DEM) tool provides real-time user experience monitoring that helps identify issues that contribute to outages, downtime, or disruptions to the user experience. A DEM helps proactively detect and resolve end-user connectivity issues. A DEM:
By keeping track of end-user metrics and statistics, enterprises can proactively prevent remote employee downtime and ensure productivity for users no matter where they sit.
Zscaler, a leading SASE network security vendor, has recently introduced its own DEM solution called Zscaler Digital Experience (ZDX), closely tied to its cloud security platform. Find out more information on Zscaler’s website.