14.5. 14.5 Traffic characterisation

Communications networks transmit data with random properties. Measurements of network attributes are statistical samples taken from random processes, for instance, response time, link utilisation, interarrival time of messages, etc. In this section we review basic statistics that are important in network modelling and performance prediction. After a family of statistical distributions has been selected that corresponds to a network attribute under analysis, the next step is to estimate the parameters of the distribution. In many cases the sample average or mean and the sample variance are used to estimate the parameters of a hypothesised distribution. Advanced software tools include the computations for these estimates. The mean is interpreted as the most likely value about which the samples cluster. The following equations can be used when discrete or continues raw data available. Let are samples of size . The mean of the sample is defined by

The sample variance is defined by

If the data are discrete and grouped in a frequency distribution, the equations above are modified as

where is the number of different values of and is the frequency of the value of . The standard deviation is the square root of the variance .

The variance and standard deviation show the deviation of the samples around the mean value. Small deviation from the mean demonstrates a strong central tendency of the samples. Large deviation reveals little central tendency and shows large statistical randomness.

Numerical estimates of the distribution parameters are required to reduce the family of distributions to a single distribution and test the corresponding hypothesis. Figure 14.1 describes estimators for the most common distributions occurring in network modelling. If denotes a parameter, the estimator is denoted by . Except for an adjustment to remove bias in the estimates of for the normal distribution and in the estimate of of the uniform distribution, these estimators are the maximum likelihood estimators based on the sample data.

Figure 14.1.  Estimation of the parameters of the most common distributions.

Estimation of the parameters of the most common distributions.

Probability distributions describe the random variations that occur in the real world. Although we call the variations random, randomness has different degrees; the different distributions correspond to how the variations occur. Therefore, different distributions are used for different simulation purposes. Probability distributions are represented by probability density functions. Probability density functions show how likely a certain value is. Cumulative density functions give the probability of selecting a number at or below a certain value. For example, if the cumulative density function value at 1 was equal to 0.85, then of the time, selecting from this distribution would give a number less than 1. The value of a cumulative density function at a point is the area under the corresponding probability density curve to the left of that value. Since the total area under the probability density function curve is equal to one, cumulative density functions converge to one as we move toward the positive direction. In most of the modelling cases, the modeller does not need to know all details to build a simulation model successfully. He or she has only to know which distribution is the most appropriate one for the case.

Below, we summarise the most common statistical distributions. We use the simulation modelling tool COMNET to depict the respective probability density functions (PDF). From the practical point of view, a PDF can be approximated by a histogram with all the frequencies of occurrences converted into probabilities.

A common use of probability distribution functions is to define various network parameters. A typical network parameter for modelling purposes is the time between successive instances of messages when multiple messages are created. The specified time is from the start of one message to the start of the next message. As it is discussed above, the most frequent distribution to use for interarrival times is the exponential distribution (see Figure 14.7).

Figure 14.7.  Exponential distribution of interarrival time with 10 sec on the average.

Exponential distribution of interarrival time with 10 sec on the average.

The parameters entered for the exponential distribution are the mean value and the random stream number to use. Network traffic is often described as a Poisson process. This generally means that the number of messages in successive time intervals has been observed and the distribution of the number of observations in an interval is Poisson distributed. In modelling tools, the number of messages per unit of time is not entered. Rather, the interarrival time between messages is required. It may be proven that if the number of messages per unit time interval is Poisson-distributed, then the interarrival time between successive messages is exponentially distributed. The interarrival distribution in the following dialog box for a message source in COMNET is defined by Exp (10.0). It means that the time from the start of one message to the start of the next message follows an exponential distribution with 10 seconds on the average. Figure 14.8 shows the corresponding probability density function.

Figure 14.8.  Probability density function of the Exp (10.0) interarrival time.

Probability density function of the Exp (10.0) interarrival time.

Many simulation models focus on the simulation of various traffic flows. Traffic flows can be simulated by either specifying the traffic characteristics as input to the model or by importing actual traffic traces that were captured during certain application transactions under study. The latter will be discussed in a subsequent section on Baselining.

Network modellers usually start the modelling process by first analysing the captured traffic traces to visualise network attributes. It helps the modeller understand the application level processes deep enough to map the corresponding network events to modelling constructs. Common tools can be used before building the model. After the preliminary analysis, the modeller may disregard processes, events that are not important for the study in question. For instance, the capture of traffic traces of a database transaction reveals a large variation in frame lengths. Figure 14.9 helps visualise the anomalies:

Figure 14.9.  Visualisation of anomalies in packet lengths.

Visualisation of anomalies in packet lengths.

The analysis of the same trace (Figure 14.10) also discloses a large deviation of the interarrival times of the same frames (delta times):

Figure 14.10.  Large deviations between delta times.

Large deviations between delta times.

Approximating the cumulative probability distribution function by a histogram of the frame lengths of the captured traffic trace (Figure 14.11) helps the modeller determine the family of the distribution:

Figure 14.11.  Histogram of frame lengths.

Histogram of frame lengths.