There are several approaches for network modelling. One possible approach is the creation of a starting model that follows the network topology and approximates the assumed network traffic statistically. After some changes are made, the modeller can investigate the impact of the changes of some system parameters on the network or application performance. This is an approach when it is more important to investigate the performance difference between two scenarios rather than starting from a model based on real network traffic. For instance, assuming certain client/server transactions, we want to measure the change of the response time as the function of the link utilisation 20%, 40%, 60%, etc. In this case it is not extremely important to start from a model based on actual network traffic. It is enough to specify certain amount of data transmission estimated by a frequent user or designer. We investigate, for this amount of data, how much the response time will increase as the link utilisation increases relative to the starting scenario.
The most common approach for network modelling follows the methodologies of proactive network management. It implies the creation of a network model using actual network traffic as input to simulate current and future behaviour of the network and predict the impact of the addition of new applications on the network performance. By making use of modelling and simulation tools network managers can change the network model by adding new devices, workstations, servers, and applications. Or they can upgrade the links to higher speed network connections and perform “what-if” scenarios before the implementation of the actual changes. We follow this approach in our further discussions because this approach has been widely accepted in the academia, corporate world, and the industry. In the subsequent paragraphs we elaborate a sequence of modelling steps, called the Model Development Life Cycle – MDLC that the author has applied in various real life scenarios of modelling large enterprise networks. The MDLC has the following steps:
Identification of the topology and network components.
Construction and validation of the baseline model. Perform network simulation studies using the baseline.
Creation of the application model using the details of the traffic generated by the applications.
Integration of the application and baseline model and completion of simulation studies.
Further data gathering as the network growths and changes and as we know more about the applications.
Repeat the same sequence.
In the following, we expand the steps above:
Topology data describes the physical network components (routers, circuits, and servers) and how they are connected. It includes the location and configuration description of each internetworking device, how those devices are connected (the circuit types and speeds), the type of LANs and WANs, the location of the servers, addressing schemes, a list of applications and protocols, etc.
In order to build the baseline model we need to acquire topology and traffic data. Modellers can acquire topology data either by entering the data manually or by using network management tools and network devices' configuration files. Several performance management tools use the Simple Network Management Protocol – SNMP to query the Management Information Base (MIB) maintained by SNMP agents running in the network's routers and other internetworking devices. This process is known as an SNMP discovery. We can import topology data from routers' configuration files to build a representation of the topology for the network in question. Some performance management tools can import data using the map file from a network management platform, such as HP OpenView or IBM NetView. Using the network management platform's export function, the map file can be imported by modelling.
The network traffic input to the baseline model can be derived from various sources: Traffic descriptions from interviews and network documents, design or maintenance documents, MIB/SNMP reports and network analyser and Remote Monitoring—traffic traces. RMON is a network management protocol that allows network information to be gathered at a single node. RMON traces are collected by RMON probes that collect data at different levels of the network architecture depending on the probe's standard. Figure 14.21 includes the most widely used standards and the level of data collection:
Network traffic can be categorised as usage-based data and application-based data. The primary difference between usage- and application-based data is the degree of details that the data provides and the conclusions that can be made based on the data. The division can be clearly specified by two adjacent OSI layers, the Transport layer and the Session layer: usage-based data is for investigating the performance issues through the transport layer; application-based data is for analysing the rest of the network architecture above the Transport layer. (In Internet terminology this is equivalent to the cut between the TCP level and the applications above the TCP level.)
The goal of collecting usage-based data is to determine the total traffic volume before the applications are implemented on the network. Usage-based data can be gathered from SNMP agents in routers or other internetworking devices. SNMP queries sent to the routers or switches provide statistics about the exact number of bytes that have passed through each LAN interface, WAN circuit, or (Permanent Virtual Circuit – PVC) interfaces. We can use the data to calculate the percentage of utilisation of the available bandwidth for each circuit.
The purpose of gathering application-based data is to determine the amount of data generated by an application and the type of demand the application makes. It allows the modeller to understand the behaviour of the application and to characterise the application level traffic. Data from traffic analysers or from RMON2-compatible probes, Sniffer, NETScout Manager, etc., provide specifics about the application traffic on the network. Strategically placed data collection devices can gather enough data to provide clear insight into the traffic behaviour and flow patterns of the network applications. Typical application level data collected by traffic analysers:
The type of applications.
Hosts communicating by network layer addresses (i.e., IP addresses).
The duration of the network conversation between any two hosts (start time and end time).
The number of bytes in both the forward and return directions for each network conversation.
The average size of the packets in the forward and return directions for each network conversation.
Packet size distributions.
Packet interarrival distributions.
Packet transport protocols.
Traffic profile, i.e., message and packet sizes, interarrival times, and processing delays.
Frequency of executing application for a typical user.
Major interactions of participating nodes and sequences of events.
The goal of building a baseline model is to create an accurate model of the network as it exists today. The baseline model reflects the current “as is” state of the network. All studies will assess changes to the baseline model. This model can most easily be validated since its predictions should be consistent with current network measurements. The baseline model generally only predicts basic performance measures such as resource utilisation and response time.
The baseline model is a combination of the topology and usage-based traffic data that have been collected earlier. It has to be validated against the performance parameters of the current network, i.e., we have to prove that the model behaves similarly to the actual network activities. The baseline model can be used either for analysis of the current network or it can serve as the basis for further application and capacity planning. Using the import functions of a modelling tool, the baseline can be constructed by importing first the topology data gathered in the data collection phase of the modelling life cycle. Topology data is typically stored in topology files (.top or .csv) created by Network Management Systems, for instance HP OpenView or Network Associate's Sniffer. Traffic files can be categorised as follows:
Conversation pair traffic files that contain aggregated end-to-end network load information, host names, packet counts, and byte counts for each conversation pair. The data sets allow the modelling tool to preserve the bursty nature of the traffic. These files can be captured by various data collection tools.
Event trace traffic files that contain network load information in the form of individual conversations on the network rather than summarised information. During simulation the file can replay the captured network activity on an event by event basis.
Before simulation the modeller has to decide on the following simulation parameters:
Run length: Runtime length must exceed the longest message delay in the network. During this time the simulation should produce sufficient number of events to allow the model to generate enough samples of every event.
Warm-up period: The simulation warm-up period is the time needed to initialise packets, buffers, message queues, circuits, and the various elements of the model. The warm-up period is equal to a typical message delay between hosts. Simulation warm-up is required to ensure that the simulation has reached steady-state before data collection begins.
Multiple replications: There may be a need for multiple runs of the same model in cases when statistics are not sufficiently close to true values. We also need multiple runs prior to validation when we execute multiple replicates to determine variation of statistics between replications. A common cause of variation between replications is rare events.
Confidence interval: A confidence interval is an interval used to estimate the likely size of a population parameter. It gives an estimated range of values that has a specified probability of containing the parameter being estimated. Most commonly used intervals are the 95% and 99% confidence intervals that have .95 and .99 probabilities respectively of containing the parameter. In simulation, confidence interval provides an indicator of the precision of the simulation results. Fewer replications result in a broader confidence interval and less precision.
In many modelling tools, after importing both the topology and traffic files, the baseline model is created automatically. It has to be checked for construction errors prior to any attempts at validation by performing the following steps:
Execute a preliminary run to confirm that all source-destination pairs are present in the model.
Execute a longer simulation with warm-up and measure the sent and received message counts and link utilisation to confirm that correct traffic volume is being transmitted.
Validating the baseline model is the proof that the simulation produces the same performance parameters that are confirmed by actual measurements on the physical network. The network parameters below can usually be measured in both the model and in the physical network:
Number of packets sent and received
Node's CPU utilisation
Confidence intervals and the number of independent samples affect how close a match between the model and the real network is to be expected. In most cases, the best that we can expect is an overlap of the confidence interval of predicted values from the simulation and the confidence interval of the measured data. A very close match may require too many samples of the network and too many replications of the simulation to make it practical.
Application models are studied whenever there is a need to evaluate the impact of a networked application on the network performance or to evaluate the application's performance affected by the network. Application models provide traffic details between network nodes generated during the execution of the application. The steps of building an application model are similar to the ones for baseline models.
Gather data on application events and user profiles.
Import application data into a simulation model manually or automatically.
Identify and correct any modelling errors.
Validate the model.
The integration of the application model(s) and baseline model follows the following steps:
Start with the baseline model created from usage-based data.
Use the information from the application usage scenarios (locations of users, number of users, transaction frequencies) to determine where and how to load the application profiles onto the baseline model.
Add the application profiles generated in the previous step to the baseline model to represent the additional traffic created by the applications under study.
Completion of Simulation studies consists of the following steps:
Use a modelling tool to run the model or simulation to completion.
Analyse the results: Look at the performance parameters of the target transactions in comparison to the goals established at the beginning of the simulation.
Analyse the utilisation and performance of various network elements, especially where the goals are not being met.
Typical simulation studies include the following cases:
Capacity analysis studies the changes of network parameters, for instance:
Changes in the number and location of users.
Changes in network elements capacity.
Changes in network technologies.
A modeller may be interested in the effect of the changes above on the following network parameters:
Switches and routers' utilisation
Communications link utilisation
Retransmitted and lost packets
Response time analysis
The scope of response time analysis is the study of message and packet transmission delay:
Application and network level packet end-to-end delay.
Packet round trip delay.
Application response time.
The scope of application studies is the ratio of the total application response time relative to the individual components of network and application delay. Application's analysis provides statistics of various measures of network and application performance in addition to the items discussed in a previous section.
The goal of this phase is to analyse or predict how a network will perform both under current conditions and when changes to traffic load (new applications, users, or network structure) are introduced:
Identify modifications to the network infrastructure that will alter capacity usage of the network's resources.
A redesign can include increasing or decreasing capacity, relocating network elements among existing network sites, or changing communications technology.
Modify the models to reflect these changes.
Assess known application development or deployment plans in terms of projected network impact.
Assess business conditions and plans in terms of their impact on the network from projected additional users, new sites, and other effects of the plans.
Use ongoing Baselining techniques to watch usage trends over time, especially related to Internet and intranet usage.