-
The data you collect should have four characteristics:
- Well-understood. You should be able to quickly determine how each metric or event was captured and what it represents. During an outage you won’t want to spend time figuring out what your data means. Keep your metrics and events as simple as possible, use standard concepts described above, and name them clearly.
- Granular. If you collect metrics too infrequently or average values over long windows of time, you may lose the ability to accurately reconstruct a system’s behaviour. For example, periods of 100% resource utilization will be obscured if they are averaged with periods of lower utilization. Collect metrics for each system at a frequency that will not conceal problems, without collecting so often that monitoring becomes perceptibly taxing on the system (the observer effect) or creates noise in your monitoring data by sampling time intervals that are too short to contain meaningful data.
- Tagged by scope. Each of your hosts operates simultaneously in multiple scopes, and you may want to check on the aggregate health of any of these scopes, or their combinations.
- Long-lived. If you discard data too soon, or if after a period of time your monitoring system aggregates your metrics to reduce storage costs, then you lose important information about what happened in the past. Retaining your raw data for a year or more makes it much easier to know what “normal” is, especially if your metrics have monthly, seasonal, or annual variations.
-
SUMMARY
Collect data via SNMP from:
- Bridges, routers, ethermeters, hubs and switches
- Data collected includes:
- good packets, kilobytes, pkt size distribution
- errors (# of types of errors)
- pkts dropped, discarded, buffer/controller overflows
- top-10 talkers & protocol distributions Collect data via Ping – for response, pkt loss, connectivity from:
- critical servers, router interfaces, ethermeters
- off-site collaborators’ nodes Other Sources:
- Poll critical Unix network daemons & services (e.g. mail, WWW, name, font, NFS …)
- ARP caches
- appearance of new unregistered nodes
Leave a Reply
You must be logged in to post a comment.