# Tutorial 16: Poisson equation on parallel distributed-memory computers

## Introduction

In this tutorial we will learn how to use GridapDistributed.jl and its satellite packages, GridapP4est.jl, GridapGmsh.jl, and GridapPETSc.jl, in order to solve a Poisson PDE problem on the unit square using grad-conforming Lagrangian Finite Elements for numerical discretization.

We will first solve the problem using solely the built-in tools in GridapDistributed.jl. While this is very useful for testing and debugging purposes, GridapDistributed.jl is not a library of parallel solvers. Indeed, the built-in linear solver kernel within GridapDistributed.jl, defined with the backslash operator \, is just a sparse LU solver applied to the global system gathered on a master task (thus not scalable). To address this, we will then illustrate which changes are required in the program to replace the built-in solver in GridapDistributed.jl by GridapPETSc.jl. This latter package provides the full set of scalable linear and nonlinear solvers in the PETSc numerical package.

On the other hand, in real-world applications, one typically needs to solve PDEs on more complex domains than simple boxes. To this end, we can leverage either GridapGmsh.jl, in order to partition and distribute automatically unstructured meshes read from disk in gmsh format, or GridapP4est.jl, which allows one to mesh in a very scalable way computational domains which can be decomposed as forests of octrees. The last part of the tutorial will present the necessary changes in the program in order to use these packages.

IMPORTANT NOTE: the parallel codes in this tutorial depend on the Message Passing Interface (MPI). Thus, they cannot be easily executed interactively, e.g., in a Jupyter notebook. Instead, one has to run them from a terminal using the mpiexecjl script as provided by MPI.jl, e.g., with the command mpiexecjl --project=. -n 4 julia src/poisson_distributed.jl run from the root directory of the Tutorials git repository.

## First example: GridapDistributed.jl built-in tools

using Gridap
using GridapDistributed
using PartitionedArrays

The first step in any GridapDistributed.jl program is to define a function (named main_ex1 below) to be executed on each part on which the domain is distributed. This function receives two arguments, rank_partition and distribute. The former is the process grid layout, (2,2) in this case, and the latter is a function that creates a distributed array with the identifiers of the parallel processes. The body of this function is equivalent to a sequential Gridap script, except for the CartesianDiscreteModel call, which in GridapDistributed also requires the parts and rank_partition arguments passed to the main_ex1 function. The domain is discretized using the parallel Cartesian-like mesh generator built-in in GridapDistributed.

function main_ex1(rank_partition,distribute)
parts  = distribute(LinearIndices((prod(rank_partition),)))
domain = (0,1,0,1)
mesh_partition = (4,4)
model = CartesianDiscreteModel(parts,rank_partition,domain,mesh_partition)
order = 2
u((x,y)) = (x+y)^order
f(x) = -Δ(u,x)
reffe = ReferenceFE(lagrangian,Float64,order)
V = TestFESpace(model,reffe,dirichlet_tags="boundary")
U = TrialFESpace(u,V)
Ω = Triangulation(model)
dΩ = Measure(Ω,2*order)
a(u,v) = ∫( ∇(v)⋅∇(u) )dΩ
l(v) = ∫( v*f )dΩ
op = AffineFEOperator(a,l,U,V)
uh = solve(op)
end

Once the main_ex1 function has been defined, we have to trigger its execution on the different parts. To this end, one calls the with_mpi function of PartitionedArrays.jl right at the beginning of the program.

rank_partition = (2,2)
with_mpi() do distribute
main_ex1(rank_partition,distribute)
end

The with_mpi(f) function receives one function (defined in-situ using Julia's do-block function call syntax here) assumed to have a single argument, the distribute function (see above). This function is called from with_mpi(f) and executed on each part. It in turn calls the main_ex1 function, which does the actual work.

Although not illustrated in this tutorial, we note that one may also use the with_debug(f) PartitionedArrays.jl function, instead of with_mpi(f). With this function, the code executes serially on a single process (and there is thus no need to use mpiexecjl to launch the program), although the data structures are still partitioned into parts. This is very useful, among others, for interactive execution of the code, and debugging, before moving to MPI parallelism.

## Second example: GridapDistributed.jl + GridapPETSc.jl for the linear solver

using GridapPETSc

In this example we use GridapPETSc.jl to have access to a scalable linear solver. The code is almost identical as the one above (see below). The main difference is that now we are wrapping most of the code of the main_ex2 function within a do-block syntax function call to the GridapPETSc.with(args=split(options)) function. The with function receives as a first argument a function with no arguments with the instructions to be executed on each MPI task/subdomain (that we pass to it as an anonymous function with no arguments), along with the options to be passed to the PETSc linear solver. For a detailed explanation of possible options we refer to the PETSc library documentation. Note that the call to PETScLinearSolver() initializes the PETSc solver with these options (even though options is not actually passed to the linear solver constructor). Besides, we have to pass the created linear solver object solver to the solve function to override the default linear solver (i.e., a call to the backslash \ Julia operator).

function main_ex2(rank_partition,distribute)
parts  = distribute(LinearIndices((prod(rank_partition),)))
options = "-ksp_type cg -pc_type gamg -ksp_monitor"
GridapPETSc.with(args=split(options)) do
domain = (0,1,0,1)
mesh_partition = (4,4)
model = CartesianDiscreteModel(parts,rank_partition,domain,mesh_partition)
order = 2
u((x,y)) = (x+y)^order
f(x) = -Δ(u,x)
reffe = ReferenceFE(lagrangian,Float64,order)
V = TestFESpace(model,reffe,dirichlet_tags="boundary")
U = TrialFESpace(u,V)
Ω = Triangulation(model)
dΩ = Measure(Ω,2*order)
a(u,v) = ∫( ∇(v)⋅∇(u) )dΩ
l(v) = ∫( v*f )dΩ
op = AffineFEOperator(a,l,U,V)
solver = PETScLinearSolver()
uh = solve(solver,op)
end
end

rank_partition = (2,2)
with_mpi() do distribute
main_ex2(rank_partition,distribute)
end

## Third example: second example + GridapP4est.jl for mesh generation

In this example, we define the Cartesian mesh using GridapP4est.jl via recursive uniform refinement starting with a single cell. It only involves minor modifications compared to the previous example. First, one has to generate a coarse mesh of the domain. As the domain is a just a simple box in the example, it suffices to use a coarse mesh with a single quadrilateral fitted to the box in order to capture the geometry of the domain with no geometrical error (see how the coarse_discrete_model object is generated). In more complex scenarios, one can read an unstructured coarse mesh from disk, generated, e.g., with an unstructured brick mesh generator. Second, when building the fine mesh of the domain (see UniformlyRefinedForestOfOctreesDiscreteModel call), one has to specify the number of uniform refinements to be performed on the coarse mesh in order to generate the fine mesh. Finally, when calling with_mpi(f), we do not longer specify a Cartesian partition but just the number of parts.

using GridapP4est

function main_ex3(nparts,distribute)
parts   = distribute(LinearIndices((nparts,)))
options = "-ksp_type cg -pc_type gamg -ksp_monitor"
GridapPETSc.with(args=split(options)) do
domain = (0,1,0,1)
coarse_mesh_partition = (1,1)
num_uniform_refinements = 2
coarse_discrete_model = CartesianDiscreteModel(domain,coarse_mesh_partition)
model = UniformlyRefinedForestOfOctreesDiscreteModel(parts,
coarse_discrete_model,
num_uniform_refinements)
order = 2
u((x,y)) = (x+y)^order
f(x) = -Δ(u,x)
reffe = ReferenceFE(lagrangian,Float64,order)
V = TestFESpace(model,reffe,dirichlet_tags="boundary")
U = TrialFESpace(u,V)
Ω = Triangulation(model)
dΩ = Measure(Ω,2*order)
a(u,v) = ∫( ∇(v)⋅∇(u) )dΩ
l(v) = ∫( v*f )dΩ
op = AffineFEOperator(a,l,U,V)
solver = PETScLinearSolver()
uh = solve(solver,op)
end
end

nparts = 4
with_mpi() do distribute
main_ex3(nparts,distribute)
end

## Fourth example: second example + GridapGmsh.jl for mesh generation

In this example, we want to use an unstructured mesh. The mesh is read from disk and partitioned/distributed automatically by GridapGmsh inside the call to the GmshDiscreteModel constructor.

using GridapGmsh
function main_ex4(nparts,distribute)
parts  = distribute(LinearIndices((nparts,)))
options = "-ksp_type cg -pc_type gamg -ksp_monitor"
GridapPETSc.with(args=split(options)) do
model = GmshDiscreteModel(parts,"../models/demo.msh")
order = 2
u((x,y)) = (x+y)^order
f(x) = -Δ(u,x)
reffe = ReferenceFE(lagrangian,Float64,order)
V = TestFESpace(model,reffe,dirichlet_tags=["boundary1","boundary2"])
U = TrialFESpace(u,V)
Ω = Triangulation(model)
dΩ = Measure(Ω,2*order)
a(u,v) = ∫( ∇(v)⋅∇(u) )dΩ
l(v) = ∫( v*f )dΩ
op = AffineFEOperator(a,l,U,V)
solver = PETScLinearSolver()
uh = solve(solver,op)
end