-
WREN uses a combination of active and passive monitoring techniques by actively monitoring when traffic is low and passively monitoring during high traffic times. It monitors traffic at both the source and destination end host which allows for more accurate measurements. WREN uses packet traces from existing application traffic to measure the available bandwidth. WREN is split into two levels, the kernel level packet trace facility and the user level trace analyzer.
The kernel level packet trace facility is responsible for capturing the information associated with incoming and outgoing packet. Figure 6 lists the information that is gathered for each packet. A buffer was added to the Web100 kernel to collect these characteristics. Access to the buffer is through 2 system calls. One call starts the trace and provides the information needed to conduct it while another call retrieves the trace from the kernel.
-
Figure 6: Information collected by WREN kernel level packet trace
The packet trace facility is able to coordinate measurements between the different machines. One machine will trigger the other machine by setting a flag in the header of outgoing packets to start tracing the same range of packets that it is tracing. The other machine will in turn trace all packets that it sees with the same header flag set. This coordination ensures that the information about the same packets is stored at each end of the connection regardless of what happens in between.
The user level trace analyzer is the other level in the WREN environment. It is the component that begins any packet traces and collects and processes the data returned from the kernel level trace facility. By design the user-level components are not required to read the information from the packet trace facility at all times. It can be analyzed immediately after the trace is completed to make runtime decisions or stored for future analysis.
When traffic is low, WREN will actively introduce traffic into the network in order to maintain a continuous flow of measurements. After numerous studies, it was found that WREN produced the same measurements in congested and un-congested environments.
In the current implementation of WREN users are not constrained to capturing only the traces that were initiated by them. Although any user is able to trace another users application traffic they are restricted to the information that can be obtained from another users trace. They are only able to get the sequence and acknowledgement numbers but not the actual data segments of the packets.
In summary, WREN is a very useful tool that utilizes the benefits of both active and passive monitoring. Although it is in its early stages WREN can provide Administrators with a valuable resource in the monitoring and analyzing their network. Self Configuring Network Monitor (SCNM) is another tool that uses both active and passive monitoring techniques.
Author: Neftaly Malatjie
114054 LG 1.40 Watching Resources from the Edge of the Network (WREN)
114054 LG 1.39 Combinational Monitoring
-
After reading the sections above one can safely come to the conclusion that a combination of active and passive monitoring is better than using one or the other. Combinational techniques utilize the best aspects of both passive and active monitoring environments. Two newly introduced combinational monitoring techniques are described below. Watching Resources from the Edge of the Network (WREN) and Self-Configuring Network Monitor (SCNM).
-
114054 LG 1.38 Passive Monitoring
-
Passive monitoring unlike active monitoring does not inject traffic into the network or modify the traffic that is already on the network. Also unlike active monitoring, passive monitoring collects information about only one point in the network that is being measured rather than between two endpoints as active monitoring measures. Figure 5 shows the setup of a passive monitoring system where the monitor is placed on a single link between two endpoints and monitors traffic as it passes along the link.
-
Figure 5: Passive Monitoring Setup
Passive measurements deal with information such as: Traffic and protocol mixes Accurate bit or packet rates Packet timing and inter-arrival timing
Passive monitoring can be achieved with the assistance of any packet sniffing program.
Although passive monitoring does not have the overhead that active monitoring has, it has its own set of downfalls. [UnivPenn02] With passive monitoring, measurements can only be analyzed off-line and not as they are collected. This creates another problem with processing the huge data sets that are collected.
As one can see passive monitoring my be better than active monitoring in that overhead data is not added into the network but post-processing time can take a large amount of time. This is why a combination of the two monitoring methods seems to be the route to go.
-
114054 LG 1.37 NON-ROUTER BASED TECHNIQUES
-
Although non-router based techniques are still limited in there abilities they do offer more flexibility than the router based techniques. These techniques are classified as either active or passive.
- Active Monitoring
Active monitoring transmits probes into the network to collect measurements between at least two endpoints in the network. Active measurement systems deal with metrics such as:
- Availability
- Routes
- Packet Delay
- Packet Reordering
- Packet Loss
- Packet Inter-arrival Jitter
- Bandwidth Measurements (Capacity, Achievable Throughputs)
Commonly used tools such as ping, which measures delay and loss of packets, and trace route which helps determine topology of the network, are examples of basic active measurement tools. They both send ICMP packets (probes) to a designated host and wait for the host to respond back to the sender. Figure 4 is an example of the ping command that uses active measurements by sending an Echo Request from the source host through the network to a specified destination. The destination then sends an Echo Response back to the source it received the request from.
-
Figure 4: ICMP ping command (Active Measurement)
Not only can a person collect the metrics above from active measurements, one can also determine the network topology. Another common example of an active measurement tool is iperf. Iperf is a tool that measures TCP and UDP bandwidth performance. It reports bandwidth, delay jitter, and loss.
The problem that exists with active monitoring is that introducing probes into the network can be an interference to the normal traffic on the network. Often times the active probes are treated differently than normal traffic as well, which causes the validity of the information provided from these probes to be questioned.
As a result of the information detailed above, active monitoring is very rarely implemented as a stand-alone method of monitoring as a good deal of overhead is introduced. On the other hand passive monitoring does not introduce much if any overhead into the network.
-
114054 LG 1.36 What good data looks like
-
The data you collect should have four characteristics:
- Well-understood. You should be able to quickly determine how each metric or event was captured and what it represents. During an outage you won’t want to spend time figuring out what your data means. Keep your metrics and events as simple as possible, use standard concepts described above, and name them clearly.
- Granular. If you collect metrics too infrequently or average values over long windows of time, you may lose the ability to accurately reconstruct a system’s behaviour. For example, periods of 100% resource utilization will be obscured if they are averaged with periods of lower utilization. Collect metrics for each system at a frequency that will not conceal problems, without collecting so often that monitoring becomes perceptibly taxing on the system (the observer effect) or creates noise in your monitoring data by sampling time intervals that are too short to contain meaningful data.
- Tagged by scope. Each of your hosts operates simultaneously in multiple scopes, and you may want to check on the aggregate health of any of these scopes, or their combinations.
- Long-lived. If you discard data too soon, or if after a period of time your monitoring system aggregates your metrics to reduce storage costs, then you lose important information about what happened in the past. Retaining your raw data for a year or more makes it much easier to know what “normal” is, especially if your metrics have monthly, seasonal, or annual variations.
-
SUMMARY
Collect data via SNMP from:
- Bridges, routers, ethermeters, hubs and switches
- Data collected includes:
- good packets, kilobytes, pkt size distribution
- errors (# of types of errors)
- pkts dropped, discarded, buffer/controller overflows
- top-10 talkers & protocol distributions Collect data via Ping – for response, pkt loss, connectivity from:
- critical servers, router interfaces, ethermeters
- off-site collaborators’ nodes Other Sources:
- Poll critical Unix network daemons & services (e.g. mail, WWW, name, font, NFS …)
- ARP caches
- appearance of new unregistered nodes
-