Chapter 7
Results and conclusions

To get an idea of how TC compares with PM, I have done a number of tests on both non-uniform sets and uniform sets, the latter consisting of either random generated sets or just monotonous grid patterns. I also ran a number of these tests with PP, but as was to be expected it wasn't possible to run the larger sets within a reasonable time, so I skipped PP for the ones with more than about 2000 particles. Note that in all tests from this chapter only 2D forces were used, to ensure that the results of TC and PP could be compared with the PM output.

Some remarks about the time measurements in this chapter : the runtime listed in all results is the so-called ``user-time'' that was spent when computing on the processor of our HP 755/125 systems. It's value was obtained using the clock() function, and as such it doesn't show the ``real-time'' or ``system-time'' spent on the program. However, the ratio is about the same for all different sorts of time and that's what we are most interested in.

But unfortunately I found that the values on HP were complete nonsense for the larger sets (taking more than about 1 hour), although on a Linux PC the same values were fine. After a long time of searching I found that the resolution on the HP is so high ( 106 ), that the available space for the clock value (4 bytes = 232 ) only allows runtimes of less than [(232)/(106)] = 4295 seconds to be measured. As this can render many hours or even days of computation to be totally useless (and in fact it did so in my case !) I decided to give this some special attention: ``forewarned is forearmed'' :-)

1.  Results

First let's take a look at some non-uniform sets (only runtime and energy are shown, the error in the momentum is omitted to avoid giving too much rather meaningless information). Also the results for the 3 non-uniform sets from the previous section can be found in here (see table ).

Non-uniform PP TCPM PP TC PM
Particles StepsTime per 10000 steps Meshsize Energy leak (%)
4 8000 0.19 1.05 85.0 32x32 27.02 % 27.02 % -9.540 %
16 2500 2.72 7.20 85.1 32x32 0.198 % 0.199 % 0.188 %
50 10000 26.36 34.53 86.3 32x32 12.20 % 12.26 % -3.435 %
300 (cluster) 400 955.0 313.3 95.5 32x32 1.319 % 1.319 % 1.025 %
500 50 2700.0 748 106 32x32 0.222 % 0.222 % 0.160 %
2000 30 43880 3907 467 64x64 0.0009 % 0.0009 % 0.0009 %

Table 1: Comparison between the 3 methods for several non-uniform sets.

The run for 4 particles couldn't be completed with PM as one particle moved outside the mesh. The error in the energy is now for only 2624 iterations, so about one third of the other 2.

Note that PM always uses a meshsize of at least 32x32, which results in runtimes of about the same order for all problems except for 2000 particles, where the meshsize was increased to 64x64.

There is another important remark to be made : in all cases we can say that - as long as the number of iterations and the timestep are identical - PP must necessarily have the largest accuracy. If there are large errors in the PP results they are almost always caused by collisions and close encounters.



Figure 1: Output for the 4 body problem from chapter 1 if run with PP (left) and with PM (right). In both cases a 2D force is used to make them comparable.

Well, TC is as vulnerable for close encounters as PP and so should have more or less the same error as PP, as long as the error made by the tree approximation is small compared with this error. So, the closer the error to the error made in PP, the more likely it is that the TC approximation is indeed a good one !

PM does not suffer from collisions and so also doesn't have the problem with energy leaks in close encounters. Energy (and also momentum) are often reasonably well preserved. However, all details caused by the close encounters are completely washed out or blurred too. I.e. the 4 body example from the introduction (with the sun, the planet and the moon: see picture ) runs fine with both TC and PP, but shows something that only vaguely resembles these pictures if run with PM (see picture 1).

Now let's take some uniform sets and do the same comparison. Table shows the results for the smaller meshes (where PP was still practically possible).

Meshes PP TCPM PP TC PM
Particles StepsTime per 10000 steps Meshsize Energy leak (%)
16 10000 2.68 6.50 85.5 32x32 1.094 % 1.093 % 0.956 %
64 2500 43.16 44.04 87.0 32x32 1.460 % 1.460 % 1.182 %
256 1000 694.1 273.5 93.7 32x32 4.702 % 4.699 % 3.564 %
500 100 2682 632.0 104.0 32x32 0.009 % 0.009 % 0.007 %
1024 100 11314 1543 123.0 32x32 5.590 % 5.589 % 4.078 %

Table 2: Comparison between the 3 methods for several smaller uniform sets (meshes).

If we put all results from the tables 1 and 2 in a picture we could compare the efficiency for PP, PM and TC. We see in figure that indeed the computation time for PP increases much faster than for PM and TC. It is also nice to see that TC computation time increases smoothly with the number of particles, while PM stays constant for a while, but suddenly jumps if the meshsize has to be increased. The latter stems from the fact that the FFT routines only accept meshsizes that are a power of 2.


Figure 2: Comparison of the PP, TC and PM runtimes for the problems from tables 7.1 and 7.2.

I also ran some larger uniform problems - both meshes and random sets - on only PM and TC, to see what their relative accuracy is and what the scaling of the computation time looks like. These larger problems - all generated by the 2 helper programs as mentioned earlier - are the same ones as already discussed in the Auto-Correlation section of chapter .

A special problem arose with the computation of the energy error : the potential energy computation is unfortunately an order O(N2) operation. This implies that for the large problems the bulk of the time would be spent in the energy error computation and not in the simulation part ! To give an idea : the total runtime for 131072 particles while doing the energy error computation was about 10 hours for PM and about 10.5 hours for TC, so this computation of the potential energy made the difference between PM and TC almost completely being washed out if only the total running time is being used.

For real large problems it may be an idea to use a Poisson equation and an FFT to compute an approximation of the potential energy, just as was done in the previous chapter (Analysis).

Particles ( P ) Steps ( S ) Runtime Energy leak Mom. leak X Mom. leak Y
Random sets and Tree-Code ( Q = 1 ) :

256 1000 28.32 5.1247 % 3905 -5891
1024 200 31.02 0.7801 % 3373 -2504
4096 100 102.50 0.0397 % 96 -61
16384 50 398.86 0.0067 % 19 11
65536 10 629.77 0.0008 % 31 34
131072 5 943.93 0.0002 % 0.15 46
Monotonous sets and Tree-Code ( Q = 1 ) :

256 1000 27.66 4.9800 % -2750 3103
1024 200 30.52 0.7547 % -4621 7284
4096 100 86.53 0.0739 % -177 328
16384 50 381.33 0.0717 % -663 1165
65536 10 681.62 0.0085 % -1073 1928
131072 5 812.01 0.0002 % -20 45

Table 3: Results for generated sets and Tree-Code.

The output for Tree-Code and the generated sets can be found in table 3, while the same results for Particle-Mesh can be found in . For reasons explained before the runtimes given in these tables are the runtimes without the initialisation and energy error computation.

Particles ( P ) Steps ( S ) Meshsize Runtime Energy leak Mom. leak X Mom. leak Y
Random sets and Particle-Mesh :

256 1000 32x32 9.43 3.6068 % 36918 39992
1024 200 32x32 2.44 0.5712 % 10176 28111
4096 100 64x64 5.69 0.0322 % -66.23 -168.8
16384 50 128x128 15.35 0.0048 % -8.52 -9.75
65536 10 256x256 16.88 0.0067 % -15.97 -8.83
131072 5 512x512 34.49 0.0002 % 0.611 -3.48
Monotonous sets and Particle-Mesh :

256 1000 32x32 9.38 3.7670 % -435.6 -484.1
1024 200 32x32 2.45 0.5713 % -310.3 -251.0
4096 100 64x64 5.65 0.0559 % -1.135 -0.928
16384 50 128x128 15.07 0.0543 % -4.257 -3.532
65536 10 256x256 15.46 0.0070 % -0.913 -0.755
131072 5 512x512 30.12 0.0002 % -1.620 -1.692

Table 4: Results for generated sets and Particle-Mesh.

To give an idea about the complexity of these problems, the number of computations for O(N2) and O(N·logN) operations for all the used combinations of particles and iteration steps is given in table .

Particles ( P ) Steps ( S ) P2·S P·logP·S
256 1000 6.5·107 2.0·106
1024 200 2.1·108 2.0·106
4096 100 1.7·109 4.9·106
16384 50 1.3·1010 1.1·107
65536 10 4.3·1010 1.0·107
131072 5 8.6·1010 1.1·107

Table 5: Total number of computations for both O(N2)\protect and O(NlogN)\protect for the generated problems.

As should be clear when comparing the numbers from table 5 with the results from the tables 3 and 4, the runtimes increase a little faster than should be expected if we consider that both methods are supposed to scale with O(NlogN) . There are however several explanations for this phenomenon :

2.  Conclusions

In the introduction I have listed some of the targets for this research project. I think I can say that most of the targets have been reached. However, although I did work already 2 years ago on the parallelization of the PP method described in this report, and although I have already done some work for the parallelization of the TC and PM programs as well, I didn't find the time to complete this unless I would be satisfied with only doing half work, resulting in i.e. a very poor speedup.

For this reason and also because this report is already quite a bit longer than I intended it to be at the beginning, I decided to omit the parallelization part completely and put the results in a separate report, to be submitted probably next year.

Ok, now it is the time to summarize some of the conclusions resulting from this report :


File translated from TEX by TTH, version 1.23.