Gathering Logs
I realized that UtahWISP's problem was deeper than I'd originally expected. Typically, you can track down wireless problems to one culprit, but this network was different. I could pinpoint no single major problem; apparently, a series of smaller problems was responsible for the slowdowns.
I would eventually need as much information as possible from the network, so I decided to install the Network Intelligence enVision HA Series log-collection device. This piece of hardware would let me aggregate logs from various devices across the network and gather all data in one place for analysis.
The Network Intelligence HA appliance, configured with the company's enVision software, is great for collecting, aggregating, and reporting on log data from a variety of platforms. The appliance has built-in templates for interpreting events coming from dozens of routers, firewalls, and other network devices.
The appliance is designed for high-speed log collection and can receive tens of thousands of events per second, so it was especially useful for this busy WISP environment. The appliance let me examine individual events as they came in and let me create aggregate views of the data through reports or custom queries. The appliance also let me create complex correlation rules that trigger alerts based on multiple events from multiple sources.
I configured the Cisco router to send its Syslog data to the Network Intelligence appliance. I also configured enVision to collect Windows event logs from the company's critical servers (e.g., Web, email, and DNS servers). Doing so let me correlate multiple events from multiple locations to get a better view of the network and have the necessary data to easily go back later and track down any problems.
After just a few minutes of collecting log data, I saw in the enVision alert monitor (which Figure 2 shows) at least one possible explanation for the performance problems: The network was flooded with virus, spyware, and peer-to-peer (P2P) file-sharing traffic. In a moment, I'll show you how we tackled these problems.
Wireless Concerns
With log collection in place, I set out to learn more about how wireless network traffic travels through the air. I knew that the wireless traffic was half-duplex, which told me that congestion and packet collision might be a concern. In half-duplex networks, multiple hosts share the same medium, such as a network cable—or, in this case, a wireless channel. Because only one host can transmit on the medium at a time, you need schemes to manage traffic and avoid packet collision. This process involves ensuring that other hosts are transmitting, detecting collisions if packets collide, and retransmitting packets if necessary. With many hosts on the same medium, more collisions and therefore more retransmissions will occur. Therefore, network efficiency on a half-duplex network is largely affected by the number of hosts and the amount of network traffic that those hosts generate.
In a half-duplex wired network, you use a shared hub. In a full-duplex wired network, you have a switched environment. Utah-WISP had the equivalent of a giant hub with more than 200 users plugged in to it.
The problem gets more complex. The radio links of wireless networks traverse a crowded, unreliable, and sometimes hostile spectrum. Radio frames might be lost and require retransmission. Frequent retransmissions might cause enough latency to affect TCP/IP packets in the payload. TCP/IP expects packet loss from congestion but doesn't expect loss from unreliable connections. It thinks that when a packet is lost, it's because the network is too crowded and busy. Consequently, it reacts by dropping its transmission window size before retransmitting packets, initiating congestion-control mechanisms, and backing off its retransmission timer—effectively slowing down to compensate for what it thinks is a crowded network. The result is less network congestion but slower performance.
Furthermore, if the radio link is poor, network performance will suffer much more than necessary. "Our hardware manufacturer brought out some equipment to test the airwaves," Dale said, "and saw more interference in the spectrum in our area than the manufacturer had seen in other larger metropolitan areas." This crowded spectrum, coupled with the inherent weaknesses of the technology, was a big problem to overcome. Some of these problems were insurmountable, but if we could eliminate any unnecessary traffic, I felt that we could certainly improve performance.
Three-Pronged Approach
When I inspected the router, I found plenty of excess bandwidth. My theory was that perhaps it was not just the number of bytes sent on the network but also the number of packets. So, I determined to reduce the number of packets on the network through a strategy of reducing broadcast traffic on the network, securing the network to eliminate any unnecessary malicious traffic, and organizing and managing network resources.
Reducing broadcast traffic. Perhaps the easiest task was to reduce broadcast traffic. Computers use broadcast traffic to locate other systems on the network and to advertise their own locations. A broadcast packet travels across the entire network, so at Utah-WISP, one customer's system would send a packet to the tower, which would send it out to every other customer in the range, then send it back to the main office, which sent it back to the other tower and out to all the other customers in that range. Every broadcast packet unnecessarily hit every system on the network.
To fix this problem, I quickly restructured the network subnets and configured the routers to reduce the broadcast scope. Essentially, I broke the network down into smaller segments so that broadcast traffic didn't travel across the entire customer network. To further reduce the amount of broadcast traffic across the network, I enabled features on the switches to cache Address Resolution Protocol (ARP) entries and control ARP floods. Doing so didn't make a huge difference, but the performance improvement was nevertheless noticeable, with slightly faster download speeds on the wireless network.
Securing the network. Because I was working with a WISP, I didn't have control over customer systems to install hotfixes or personal firewalls, and I couldn't put too many restrictions on the type of traffic I blocked. Although I could recommend that UtahWISP send email messages to all its users, recommending that they secure their systems, I knew that many users would ignore the advice or would lack the skills or knowledge necessary to perform the work.
Fortunately, this WISP had a well-defined usage policy in its contract that forbade running Internet servers within the network and participating in P2P file-sharing networks. Therefore, UtahWISP could start blocking certain traffic such as incoming Web or email traffic. Doing so, the company could greatly reduce the exposure resulting from certain customers unknowingly providing access to spammers or intruders.
In the process of upgrading the network, the company's Intrusion Detection System (IDS) alerted me that some customer systems were rapidly scanning other systems on the network. However, the customer systems had no reason to be connecting to other customer systems. I recognized this activity as a common side effect of a worm infection. One of the most common ways for such malware to spread is through email, so I knew that if UtahWISP could prevent worms from entering the networks, the chances of users clicking malicious attachments would drop to zero.