The process of monitoring the multi-user networked operating system depends on a monitoring strategy. The following are the key elements that the monitoring strategy must cover;
Metrics
Perhaps the first and most important step in determining the performance of your system is achieving clarity about what you really care about. If you don’t know what you want to measure, you can be quite sure you won’t successfully measure it. (If only mere clarity were enough to ensure that; but it is a vital first step.) What makes a difference in your system? What must go fast, and what is less important to speed up? In a word, what are your goals?
As a rule, if we care about system performance, we must quantify it. Saying your system is “fast” doesn’t mean much. Saying it can perform a complex operation in 10 nanoseconds makes it a lot clearer what’s going on. So any good performance investigation is targeted towards reducing the observed system behaviour to some set of characteristic numbers. These numbers must have useful meanings relative to your goals. So latency is likely to be expressed in some unit of time, and throughput is likely to be expressed in some unit of work that is relevant to your system divided by some unit of time. The numbers we choose to characterize the performance of our systems are called metrics, and obviously their proper choice is of critical importance. Just because some quantity is measurable, however, does not make it a good metric. For instance, you can put your smart phone on a scale and measure its weight. That’s a metric, but it won’t tell you much about whether you can render video at a high enough frame rate to be tolerable. On the other hand, it might be relevant in a usability study. The metrics you choose need to be relevant to your goals.
One more important point about metrics is that they must be practically measurable by you. That’s limited by your ability to probe hardware and alter software. If you have a proprietary operating system whose source code you cannot see, you will have a hard time measuring what’s happening inside it. You will probably need to be satisfied with observing what happens as you enter and leave the operating system, perhaps augmented by whatever information the system itself will divulge to you on request. Similarly, if you are measuring the performance of a web server that you do not run yourself, you probably can’t run any code at all on that server, and you must choose metrics observable from places that you can reach. On the other hand, if you are measuring software running on top of an operating system, you might have more liberty to alter it for measurement purpose. Or you might not.
Another important point is that we are often using the system whose performance we wish to measure to actually capture our results. If the process of performing the measurements itself has a large effect on the performance, we may have obtained false readings for our metrics. Consider, for example, a measurement system that regularly writes records concerning file system behaviour to the disk drive that stores that file system. Chances are that this experimental logging is competing, in a performance sense, with the actual behaviour you are trying to measure. Instead of simply observing how long it would take to perform a set of reads and writes from a group of files in the file system, we are also moving the head to another place on the disk where we are storing our log. Those head movements would make the file system appear to be slower than it actually would be if your experimental framework were not logging data. If your experimental framework interferes with the processes you are trying to measure, you end up with a false reading for your metric that will not accurately describe the system’s behaviour when you aren’t running your experiment.
Leave a Reply
You must be logged in to post a comment.