Empirical Testing of MUSE V0.2
Background and PHOLD
Experimental results are presented to illustrate the scalability and efficiency of MUSE. For empirical evaluation, experiments were initially conducted using a square grid consisting of 512 x 512 (262,144) agents participating in PHOLD. In subsequent experiments the number of compute nodes was increased in powers of two starting from one compute-node. Each compute node has the following specs:
- CPU Model: Intel Xeon (X2)
- CPU/Core Speed: 3.0 GHz (X2)
- RAM: 4GB
- OS: Linux 2.6.9-22.ELsmp
- Interconnect type & speed: Infiniband @ 20Gbps
Different matrics were used in order to paint a better picture of the performance. The following are the matrics used:
- Execution Time
- Speedup
- Efficiency
- Scalability
Execution Time
Execution time can be calculated for both the parallel algorithm and its serial counterpart. The serial runtime (Ts) is the wall clock time elapsed between the beginning and end of an execution of a sequential program, while the parallel runtime (Tp) is the time elapsed from the moment a parallel computation started to the time when the last processing element finished execution.
Speedup
A general interest while running a parallel program is to determine the performance gain that is achieved on parallelizing an application. Speedup is the ratio of time taken to solve a problem on a single processing element to the time required to solve the same problem on a parallel computer with p identical processing elements.
S= Ts/Tp
Efficiency
An ideal parallel system containing p processing elements can deliver a speedup equal to p. An ideal behavior is difficult to achieve since the processing elements are unable to devote 100% of their time to the computation of an algorithm. It is due to this reason a metric such as efficiency is used. Efficiency can be used to determine the percentage of time for which a processing element is useful.
E = S/p
In an ideal system, speedup is equal to p and efficiency is 1. In practical applications, speedup is generally less than p and efficiency is a value between 0 and 1. Unless you acheive Super Linear Speedup!
Scalability
An important metric used for evaluating the efficacy of a parallel algorithm is scalability. Scalability is defined as the measure of capacity of parallel program to increase its speedup in proportion to the number of processing elements and the size of the problem. A program is said to be scalable if it continues to remain efficient as the number of processing elements increases. A good rule of thumb I use is E =0.85
PHOLD
PHOLD works with an X x Y agent grid, where each agent in the grid will have four neighbors. When the simulation starts, each agent K initializes by sending N events to agent K. The N events have a random receive time, with a max receive time defined by the variable Delay. During PHOLD simulation, when an agent receives an event, the following takes place.
- Randomly select a neighbor to send the next event to.
- Choose a receive time for the current time plus Random [1 to Delay].
- Create and send the event.
Since MUSE runs in parallel the agents in PHOLD will be evenly spread across the different compute nodes.
Execution times with increasing nodes
Table 1 : Execution time of MUSE running PHOLD
Figure 1 : PHOLD trend with increasing nodes on MUSE
Figure 1 shows a nice trend and generally MUSE is doing what it should do. As the number of processing elements (compute node) increase execution time should decrease. The big question now is how well does it scale? To get an answer, figuring out the efficiency and speedup of MUSE become important. Using the execution time for MUSE on one compute node as the serial version (Ts = 1663 seconds), I was able to calculate the speedup. Using speedup, I calculated efficiency.
Table 2: Speedup on MUSE
Figure 2: Speedup on MUSE
Figure 3: Efficiency on MUSE
Figure 2 shows the speedup being over the number of processing elements. How is this possible? This becomes possible if the program acheived super linear speedup. In this PHOLD simulation the speedup was impressive. Hence, the efficiency will be above 1 for all increasing nodes as shown in figure 3. Does this mean that MUSE scales well? One more experiment is needed to show scalability.
Here is the reasoning for the next test. If MUSE efficiency is above one for each increasing node, then as the workload increases and as the number of compute node increase, then the execution times for all cases in the test should be relatively the same. Right?
The idea will be to adjust the number of nodes while trying to maintain the number of agents to compute node ratio roughly consistent. Starting from one node and moving up to 32 nodes in power of twos. Also incrementing our agents such that at any given configuration each compute node works with around 8000 to 10,000 agents.
Table 3: Execution time with increasing agents and nodes
Figure 4: PHOLD on MUSE with increasing agents and nodes
The first experiment observed super linear speedup and excellent efficiency with MUSE. The second experiment proved that as the number of agents increased and the number of nodes increased, execution time remained comparatively constant. This trend is that last piece of data needed to conclude that indeed MUSE is a scalable framework.