Empirical Testing of MUSE V0.2

Background and PHOLD

Experimental results are presented to illustrate the scalability and efficiency of MUSE. For empirical evaluation, experiments were initially conducted using a square grid consisting of 512 x 512 (262,144) agents participating in PHOLD. In subsequent experiments the number of compute nodes was increased in powers of two starting from one compute-node. Each compute node has the following specs:

  • CPU Model: Intel Xeon (X2)
  • CPU/Core Speed: 3.0 GHz (X2)
  • RAM: 4GB
  • OS: Linux 2.6.9-22.ELsmp
  • Interconnect type & speed: Infiniband @ 20Gbps

Different matrics were used in order to paint a better picture of the performance. The following are the matrics used:

  • Execution Time
  • Speedup
  • Efficiency
  • Scalability

Execution Time

Execution time can be calculated for both the parallel algorithm and its serial counterpart. The serial runtime (Ts) is the wall clock time elapsed between the beginning and end of an execution of a sequential program, while the parallel runtime (Tp) is the time elapsed from the moment a parallel computation started to the time when the last processing element finished execution.

Speedup

A general interest while running a parallel program is to determine the performance gain that is achieved on parallelizing an application. Speedup is the ratio of time taken to solve a problem on a single processing element to the time required to solve the same problem on a parallel computer with p identical processing elements.

S= Ts/Tp

Efficiency

An ideal parallel system containing p processing elements can deliver a speedup equal to p.  An ideal behavior is difficult to achieve since the processing elements are unable to devote 100% of their time to the computation of an algorithm. It is due to this reason a metric such as efficiency is used. Efficiency can be used to determine the percentage of time for which a processing element is useful.

E = S/p

In an ideal system, speedup is equal to p and efficiency is 1. In practical applications, speedup is generally less than p and efficiency is a value between 0 and 1. Unless you acheive Super Linear Speedup!

Scalability

An important metric used for evaluating the efficacy of a parallel algorithm is scalability. Scalability is defined as the measure of capacity of parallel program to increase its speedup in proportion to the number of processing elements and the size of the problem. A program is said to be scalable if it continues to remain efficient as the number of processing elements increases. A good rule of thumb I use is E =0.85

PHOLD

PHOLD works with an X x Y agent grid, where each agent in the grid will have four neighbors. When the simulation starts, each agent K initializes by sending N events to agent K. The N events have a random receive time, with a max receive time defined by the variable Delay. During PHOLD simulation, when an agent receives an event, the following takes place.

  1. Randomly select a neighbor to send the next event to.
  2. Choose a receive time for the current time plus Random [1 to Delay].
  3. Create and send the event.

Since MUSE runs in parallel the agents in PHOLD will be evenly spread across the different compute nodes.

Execution times with increasing nodes

phold_var_table

Table 1 : Execution time of MUSE running PHOLD

ee_muse_trend

Figure 1 : PHOLD trend with increasing nodes on MUSE

Figure 1 shows a nice trend and generally MUSE is doing what it should do. As the number of processing elements (compute node) increase execution time should decrease. The big question now is how well does it scale? To get an answer, figuring out the efficiency and speedup of MUSE become important. Using the execution time for MUSE on one compute node as the serial version (Ts = 1663 seconds), I was able to calculate the speedup. Using speedup, I calculated efficiency.

speedup_efficiency_ee_muse

Table 2: Speedup on MUSE

speedup_graph_ee_muse

Figure 2: Speedup on MUSE

efficiency_graph_ee_muse

Figure 3: Efficiency on MUSE

Figure 2 shows the speedup being over the number of processing elements. How is this possible? This becomes possible if the program acheived super linear speedup. In this PHOLD simulation the speedup was impressive. Hence, the efficiency will be above 1 for all increasing nodes as shown in figure 3. Does this mean that MUSE scales well? One more experiment is needed to show scalability.

Here is the reasoning for the next test. If MUSE efficiency is above one for each increasing node, then as the workload increases and as the number of compute node increase, then the execution times for all cases in the test should be relatively the same. Right?

The idea will be to adjust the number of nodes while trying to maintain the number of agents to compute node ratio roughly consistent. Starting from one node and moving up to 32 nodes in power of twos. Also incrementing our agents such that at any given configuration each compute node works with around 8000 to 10,000 agents.

increase_agents_and_nodes_ee_muse_table

Table 3: Execution time with increasing agents and nodes

increase_agents_and_nodes_ee_muse_graph

Figure 4: PHOLD on MUSE with increasing agents and nodes

The first experiment observed super linear speedup and excellent efficiency with MUSE. The second experiment proved that as the number of agents increased and the number of nodes increased, execution time remained comparatively constant. This trend is that last piece of data needed to conclude that indeed MUSE is a scalable framework.