Software measurements .
oct2005
This section includes general software measurements
(mainly timing):
10oct05: Some
linux kernels see no speed up when running 2 processes on a dual processor
cpu.
(23feb06: We finally looked inside the aolc boxes and they
do not have multiple cpus (even though the purchase order claimed they
did). So their timing is for a single processor with hyperthreading enabled.
So the conclusions about 2.4.21 kernels may not be correct...)
The idl routine (atmclp) processes the coded long
process atm data. It was used to benchmark some of the dual processor
cpus at the observatory. The data set used was:
-
100 ipps of 10 millisecs each (so 1 second of data).
-
1 usec baud, 500 usec transmitted pulse, 5 Mhz bandwidth.
-
For each ipp the tx samples were complex conjugated and then used for decoding
each height.
-
601 heights for each ipp. At each height:
-
The data was decoded with the xmiter samples (2500 samples/height)
-
A 4K complex transform was done and then the power was accumulated.
-
So 60100 4K transforms were done as well as the decoding and power accumulation.
-
This was doubled (120200) since it was a dual beam experiment.
-
The data set was read into memory. This ended up taking about 40Mb of data
(all the machines had lots of memory > 1gb)
-
The fftw routine was used for the fft (rather than the idl version).
-
The i/o took a small fraction of the total time (< 1%). top showed 100%
cpu usage while running.
-
idl 6.1 was used.
A single version of the processing was run and then two copies (two
separate idl sessions) were run. The times for the processing are shown
in the table below:
|
cpu
|
cpu type
freq(ghz)
|
hyper
thread
|
Linux
kernel
|
Time 1 copy
secs
|
Time 2 copies
secs
|
|
fusion00
|
xeon 2.4
|
no
|
2.4.18-27.8.0smp
|
59
|
62
|
|
fusion02
|
xeon 2.2
|
yes
|
2.4.21-4.ELsmp
|
61
|
99
|
|
aolc1*
|
xeon 2.4
|
no*
|
2.4.21-4.ELsmp
|
58
|
107
|
|
aolc2*
|
xeon 2.4
|
no*
|
2.4.21-4.ELsmp
|
61
|
134
|
|
pserverK
|
xeon 3.0
|
yes
|
2.6.8-1.521smp
|
57
53
|
57
53 (repeat)
|
|
pserverM
|
xeon 3.0
|
yes
|
2.6.8-1.521smp
|
52
|
60
|
|
pserverN
|
pent4 3.2
|
no
|
2.6.12-1.1447_FC4smp
|
61
|
104
(but cpu was busy)
|
*
You can see that the 2.4.21-4Elsmp kernels take twice as long to run
two copies as 1 copy. This means that there is no advantage to using the
dual processor (aolc2 actually took longer than twice the single copy time).
For most of the measurements top showed no other processes using the cpu.
The exception was pserverN where root was running a cp that took about
30% of the cpu.
For the aolc computers you should spread the jobs out over multiple
cpus rather than trying to run two of the same on the same cpu (until arun
gets a chance to update the kernels).
processing: x101/atm/testclp.pro
home_~phil