One important aspect of running a performance experiment is the workload you use. In some cases, you are examining the performance of a particular program or operating system element, in which case you will tailor the workload to exercise that software. In many cases, you are looking for general performance in the face of typical overall system loads. In that situation, you need to generate a realistic workload for your system. Either way, somehow you must provide data sets, background activities, network traffic, and various other types of workload-related effects to test the performance.
There are different aspects of workloads that you need to think about when designing performance experiments. Your system is designed to do certain things: schedule processes, lay out a file system on a flash drive, respond to web requests, and so on. Obviously, one important aspect of the workload is the tasks you provide to your system directly related to its purpose: the set of processes to be scheduled, the files and file accesses to be handled, the web requests that clients generate. An equally important aspect of the workload, however, is based on the fact that operating systems are complex and involve simultaneous interactions of many different components that might affect each other in unpredictable ways. How your file system would perform if the only activity on the operating system was reads and writes to it is not the question you need to answer, as a rule. The important question is how it would perform in the face of all the other ordinary activities that the operating system would be doing in a real world setting. So your workload must also capture those background activities.
There are several different types of workloads typically used for performance measurement.
- Traces – Take or otherwise obtain a detailed trace of the workload of the system in its ordinary activities. What such a trace consists of depends on the nature of what you are testing. For a web server, the trace is likely to be a set of web requests submitted to the server. For a mail server, it is likely to be a set of messages delivered to that server. For a file system, it might be a set of opens, reads, writes, and other file system operations. For an operating system component, it might be a set of applications that are run in a particular order with specified inputs. Whatever the trace might consist of, you capture it from the running system, saving it in a form that will allow you to recreate it in a faithful manner. Then, for each experimental run, you start from the beginning and run it to the end. Traces have good and bad properties for performance experiments. A good property is realism, since they represent realistic activities that you would actually want your system to handle well. Another good property is reproducibility. The same trace can be replayed over and over, identically for each run. There is an issue here if the performance of the system has an impact on what would have happened in the real system. For example, a trace of a network protocol that sends a message and receives an acknowledgement before sending the next message would have run differently if the acknowledgement had been produced in half the time, double the time, or at some other delay than it had been when the trace was gathered. If the system being tested is the one generating the acknowledgements that can result in the replayed trace producing unrealistic results.
A disadvantage of a trace is that it is not easily reconfigurable. If your experiment needs to examine performance under controlled levels of workload, you might not be able to get a trace for each workload level you need. Merely running two copies of one trace in parallel might not realistically represent a true doubled load. Cutting out portions of a trace might not realistically represent a smaller workload, either. Scaling a trace up or down is usually hard. Another frequent disadvantage is availability. Good traces are not easy to come by, and if your system is not yet in production, you might be unable to gather your own. Except for freshly gathered traces of your own, most traces you can find will be somewhat (to very) old. Another disadvantage in some cases is that it might be difficult to gather the information needed to create the trace from the tools available to you. You might not be able to capture all the system calls applications perform, for instance. Also, any particular single trace might or might not represent the typical activity of the system. The moment at which it was gathered might have been unusual, compared to the ordinary activities of your system. Depending on exactly what you are tracing, there may be privacy implications to saving it in a trace. For certain kinds of system, such as those dealing with medical records, you may have legal obligations to handle some of the data in particular ways. Be aware of any such privacy problems before you store data for a trace.
- Live workloads – Sometimes you can perform measurements on a working system as it goes about its normal activities. A production system can also instrumented and data gathered as it does its work. Realism is a clear advantage here. Also, provided you can continue to do tests on the system indefinitely, with enough time you can capture a very wide range of real system behaviour. You are likely to need to take little or no effort to establish realistic background loads, since they establish themselves, in essence.
This approach has its own disadvantages. One is lack of control, which manifests itself both in not being able to reproduce the behavior seen in previous tests, and in not being able to scale loads up and down as desired. Another is that your experimental framework usually needs to have minimal impact, both in performance and functionality, on the running system, since it is presumably more important to complete its live work than to gather your measurements. Unless this impact is essentially nil, you are not likely to be able to run the performance measurements for very long on a working system, since those tasked with getting it to do its job will not appreciate your experiments getting in the way. As with traces, consider whether there are privacy implications to your observation of the live workload.
- Standard benchmarks – These are either sets of programs or sets of data that are intended to drive performance experiments, typically on some particular thing, such as a file system, a database, a web server, or an intrusion detection system. They may have been derived from real traces at some point or they may be built from models of system behavior. They are typically designed to be usable by many developers, so it is often fairly easy to integrate them into your experiments, provided you are working in the same general framework they were designed for. (For example, a file system benchmark might generate Posix-compliant file operations, so any file system that is compatible with Posix can use it for testing.) They allow for easy comparison to other systems’ performance, since the developers of those system can also run the same benchmark, or, indeed, you can yourself, if those other systems are also available for testing. A well-designed benchmark is likely to exercise a wide range of system behaviors, so the results you get from it may give you a fairly complete picture of your system’s performance under different realistic conditions. Widely used benchmarks have been heavily studied themselves, and are unlikely to have many bugs, and likely to be relatively good representations of the kind of workload they are intended to mimic. Some benchmarks (though not all) are built to be inherently scalable, allowing you to adjust the workload up or down with little more than changing a line or two in a configuration file. Since benchmarks are artificial, there are usually no privacy implications to using them.
As you no doubt expect, though, standard benchmarks have their own set of disadvantages. First, there are a limited number of them available, and there might not be one suited for the system or situation you want to test. One aspect of this characteristic is that standard benchmarks might not include portions of the workload space that are unusual in general, but important for your case. Another aspect of this characteristic is that it’s tempting to use a standard benchmark that isn’t quite right for your situation just because it’s easy to do so. Resist such temptations. Second, since developing a good benchmark is quite a lot of work, they tend to be used for a very long time, running the risk of representing archaic workloads that no longer match what would happen on a current system.
- Simulated workloads – In this approach, you build models of the loads you are interested in, typically models instantiated in executable code. These models are usually parameterized, allowing them to be scaled up or down, to alter the mix of different elements of the load, and otherwise to create variations on the load. When testing a system’s performance, one decides which parameter settings are most relevant and uses the simulated workload models with suitable settings. This approach has the advantage of being easily customized to many different scenarios and possibilities, since you need merely alter the model parameters accordingly. One important aspect of this flexibility is good handling of scaling, either up or down. Assuming that there is no true randomization in the models, they are infinitely repeatable, allowing you to perform directly comparable tests of different system alternatives. As with standard benchmarks, the artificiality of simulated workloads has the benefit of avoiding privacy considerations. However, the validity of the performance results you achieve is only as good as the quality of the models. It is not easy to produce good models of complex systems and phenomena, and one can easily overlook important features of real loads in building one’s models. While parameters can be easily altered and scaled, even if the model was faithful to real load for some settings, it may prove unrealistic at others. It may also be unclear how to set the various parameters to produce simulated load that matches a particular real load. If the parameters are set incorrectly, one may get a very false picture of how a real system would behave in those situations.
Leave a Reply
You must be logged in to post a comment.