To show that dynamic LibPFASST works in more complex settings, dynamic resources are applied to a 2D heat equation solver that works both parallely in time and parallely in space.
The original solver was part of the LibPFASST repository (
Examples/hypre). This modified version is available separately on Github.
The following video demonstrates how the heat equation is being solved and how process set operations are being used to dynamically grow and shrink the solver in the time dimension:
The heat equation is solved on a 2D square in the domain [0,π]x[0,π] with a zero boundary condition. The initial condition is described by the function u(x,y,0) = sin(x) + sin(y) which creates a "bump" on the square domain. A closed form solution is given by u(x,y,t) = exp(-2πt) (sin(x) + sin(y)). This makes it possible to compute the error in the solution created by LibPFASST.
Additionally to parallelizing in time using LibPFASST, the domain is split up in a grid wise manner. For example, the space domain can be split up into 4,9,16,... smaller squares. This is implemented by running 4,9,16,... parallel PFASST instances where the sweeper will use communication across the space domain to compute the time derivatives.
In the non-dynamic case, this communication can be hidden transparently from LibPFASST.
The main routine creates a grid out of
MPI_Comm_split so that each process is part of exactly one time and one space communicator.
The time communicator is then used as the main LibPFASST communicator and the space communicator is used for solving the space component of the equation using the HYPRE library.
Because resize operations work on process sets, we cannot rely on
MPI_Comm_split but instead need to split up the process sets directly.
This is done in the following fashion (implemented in
Here we need the order preservation assumption for creating an consistent grid (the indexing in the third step must work).
Because process set operations need global coordination, the fact that multiple PFASST instances are running in parallel cannot be transparently hidden from dynamic LibPFASST.
Instead, LibPFASST requires additional information to be able to resize correctly. This way of running LibPFASST is called "space-parallel mode".
It can be enabled by passing additional
horizontal_pset to the
This will set the respective attributes of
pf%dynprocs and create the respecive communicators.
Furthermore, the resize operations become more complex. The logic is implemented in the following subroutines:
pf_dynprocs_create_comm (for joining an existing run),
pf_dynprocs_handle_growth_global (for shrinking by a number of time steps) and
pf_dynprocs_handle_shrink_global (for growing by a number of time steps and establish communication with new processes).
Again, we need the order preservation assumption for this to work and to keep grid consistency.
These implement resizing in the following fashion (see also the video at the top):
ntime * num_new_timestepsas the requested size.
PSETOP_UNIONoperation across the last
num_timesteps_to_removespace process sets
PSETOP_DIFFoperations for each time process set to get the new time process sets
ntime * num_timesteps_to_removeas the requested size. Pass the delta process set as an argument, so the runtime only removes these processes.
Note: If you have followed the instructions for the Docker build in the Open MPI section, the showcase was already cloned and compiled at
To compile this program, make sure that you have built the dynamic version of LibPFASST that can be found here.
Then run the following commands:
# clone the repository git clone https://github.com/boi4/showcase_dyn_libpfasst.git && cd showcase_dyn_libpfasst # clone hypre git clone git clone https://github.com/hypre-space/hypre.git # build hypre cd hypre/src && ./configure --disable-fortran && make -j && cd ../.. # finally, compile this project make LIBPFASST=/path/to/LibPFASST/
Please refer to the LibPFASST documentation to see what parameters you can set in probin.nml.
The following additional parameters can control the run:
dump_values <- logical, whether to dump solution values after each block dump_dir <- string, where to dump the values to nspace <- integer, number of processes per time step, must be a square number T0 <- float, t0 TFin <- float, tfin nsteps <- integer, number of timesteps
Note that when nsteps/(TFin-T0) goes below some threshold (seems to be around ~1), the algorithm will become numerically unstable and a division by zero may appear.
You can run the solver with the following command:
mpirun <YOUR MPI RUN ARGUMENTS HERE> ./main.exe probin.nml
Because of the resize granularity restriction of the runtime, make sure to have the number of processes per node be equal to the number of processes per time step (
If you followed the Docker setup as described in the Open MPI section, you can pass the following arguments to mpirun (for nspace=4):
mpirun --mca btl_tcp_if_include eth0 --host n01:4,n02:4,n03:4,n04:4,n05:4,n06:4,n07:4,n08:4 -np 16 ./main.exe probin.nml