#+SETUPFILE: ../packages/org-themes/src/bigblow_inline/bigblow_inline.theme

#+STARTUP: org-pretty-entities entitiespretty
#+PROPERTY: header-args :comments no
#+OPTIONS: timestamp:nil
#+OPTIONS: h:3 num:t toc:3
#+TITLE: The pugs user manual
#+OPTIONS: author:nil date:nil
#+OPTIONS: tex:t

#+LANGUAGE: en
#+ATTR_LATEX: :width 4cm
#+HTML_HEAD_EXTRA: <style> pre.src-pugs:before { content: 'pugs'; } </style>

#+LATEX_CLASS_OPTIONS: [10pt]
#+LATEX_HEADER: \usepackage[hmargin=2.5cm,vmargin=1.5cm]{geometry}
#+LATEX_COMPILER: pdflatex --shell-escape

#+LATEX_HEADER_EXTRA: \usepackage{amsmath}
#+LATEX_HEADER_EXTRA: \usepackage{amsthm}
#+LATEX_HEADER_EXTRA: \usepackage{amssymb}
#+LATEX_HEADER_EXTRA: \usepackage{xcolor}
#+LATEX_HEADER_EXTRA: \usepackage{mdframed}
#+LATEX_HEADER_EXTRA: \BeforeBeginEnvironment{minted}{\begin{mdframed}[linecolor=blue,backgroundcolor=blue!5]}
#+LATEX_HEADER_EXTRA: \AfterEndEnvironment{minted}{\end{mdframed}}
#+LATEX_HEADER_EXTRA: \BeforeBeginEnvironment{verbatim}{\begin{mdframed}[linecolor=gray,backgroundcolor=green!5]}
#+LATEX_HEADER_EXTRA: \AfterEndEnvironment{verbatim}{\end{mdframed}}

* Introduction

~pugs~[fn:pugs-def] is a general purpose solver collection built to
approximate solutions of partial differential equations. It is mainly
(but not only) designed to deal with hyperbolic problems using
finite-volume methods.

~pugs~ is a parallel software that uses ~MPI~[fn:MPI-def] and multi-threading
for parallelism. Multi-threading is achieved through an encapsulation
of some [[https:github.com/kokkos/kokkos][Kokkos]] mechanisms.

The philosophy of ~pugs~ is to provide "simple" numerical tools that are
assembled together through a high-level language (a DSL[fn:DSL-def]
close to the mathematics) to build more complex solvers. This approach
is inspired by the success of [[http://freefem.org][FreeFEM]], which use a similar approach.

Before detailing the leading concepts and choices that we have made to
develop ~pugs~, we give a simple example that should illustrate the
capabilities of the code.

** An example
For instance, the following code builds a uniform Cartesian grid of
$]-1,1[^2$ made of $20\times20$ cells and transforms it by displacing the
mesh nodes according to a user defined vector field $T: \mathbb{R}^2
\to \mathbb{R}^2$.
#+NAME: introduction-example
#+BEGIN_SRC pugs :exports both :results output
  import mesh;
  import math;
  import writer;

  let pi:R, pi = acos(-1);
  let theta:R^2 -> R, x -> 0.5*pi*(x[0]*x[0]-1)*(x[1]*x[1]-1);
  let M:R^2 -> R^2x2, x -> (cos(theta(x)), -sin(theta(x)),
                            sin(theta(x)),  cos(theta(x)));
  let T: R^2 -> R^2, x -> x + M(x)*x;
  let m:mesh, m = cartesian2dMesh((-1,-1), (1,1), (20,20));

  m = transform(m, T);

  write_mesh(gnuplot_writer("transformed", 0), m);
#+END_SRC
The example is quite easy to read.
- First, some *modules* are loaded: the ~mesh~ module, which contains some
  mesh manipulation functions. The ~math~ module provides a set of
  classical mathematical functions ($\sin$, $\cos$, ...). The ~writer~
  module is used to save meshes or discrete functions to files using
  various formats.
- The second block of data defines variables of different kind
  - ~pi~ is a real value that is initialized by an approximation of $\pi$.
  - ~theta~ is the real value function $\theta$ defined by
    \begin{equation*}
      \theta: \mathbb{R}^2 \to \mathbb{R},\quad\mathbf{x} \mapsto
      \frac{\pi}{2} (x_0^2-1)(x_1^2-1)
    \end{equation*}
  - ~M~ is the $\mathbb{R}^{2\times2}$  matrix field $M$ defined by
    \begin{equation*}
      M: \mathbb{R}^2 \to \mathbb{R}^{2\times2},\quad\mathbf{x} \mapsto
      \begin{pmatrix}
        \cos(\theta(\mathbf{x})) & -\sin(\theta(\mathbf{x}))  \\
        \sin(\theta(\mathbf{x}))  & \cos(\theta(\mathbf{x}))
      \end{pmatrix}
    \end{equation*}
  - ~T~ is the vector field $T$ defined by
    \begin{equation*}
      T: \mathbb{R}^2 \to \mathbb{R}^2, \quad\mathbf{x} \mapsto (I+M(\mathbf{x}))\mathbf{x}
    \end{equation*}
  - Finally ~m~ is defined as the uniform Cartesian mesh grid of
    $]-1,1[^2$. The last argument: ~(20,20)~ sets the number of cells in
    each direction: $20$.
- The third block is the application of the transformation $T$ to the
  mesh. Observe that if the resulting mesh is stored in the same
  variable ~m~, the old one was not modified in the process. It is
  important to already have in mind that the ~pugs~ language *does not
  allow* the modifications of values of *non-basic* types. This is
  discussed in the section [[variable-types]].
- Finally, the last block consists in saving the obtained mesh in a
  ~gnuplot~ file. The result is shown on Figure [[fig:intro-example]].

#+NAME: intro-transform-mesh-img
#+BEGIN_SRC gnuplot :exports results :file (substitute-in-file-name "${PUGS_SOURCE_DIR}/doc/intro-transform-mesh.png")
  reset
  unset grid
  unset border
  unset key
  unset xtics
  unset ytics
  set terminal png truecolor enhanced
  set size square
  plot '<(sed "" $PUGS_SOURCE_DIR/doc/transformed.0000.gnu)' w l
#+END_SRC

#+CAPTION: Obtained transformed mesh
#+NAME: fig:intro-example
#+ATTR_LATEX: :width 0.38\textwidth
#+ATTR_HTML: :width 300px;
#+RESULTS: intro-transform-mesh-img

Even if this first example is very simple, some key aspects can
already be discussed.
- There is no predefined constant in ~pugs~. Here a value is provided
  for ~pi~.
- There are two kinds of variable in ~pugs~: variables of basic types
  and variable of high-level types. This two kinds of variable behave
  almost the same but one must know their differences to understand
  better the underlying mechanisms and choices that we made. See
  [[variable-types]] for details.
- Also, there are two types of function: *user-defined* functions and
  *builtin* functions. In this example, ~theta~, ~M~ and ~T~ are user-defined
  functions. All other functions (~cos~, ~cartesian2dMesh~,...) are
  builtin functions and are generally defined when importing a
  module. These functions behave similarly, one should refer to
  [[functions]] for details.

** Concepts and design

As it was already stated ~pugs~ can be viewed as a collection of
numerical methods or utilities that can be assembled together, using a
user friendly language, to build simulation scenarios.

Utilities are tools that are often used in numerical
simulations. Examples of such utilities are mesh manipulations,
definition of initial conditions, post-processing, error
calculations,...

*** A C++ toolbox driven by a user friendly language

Numerical simulation packages are software of a particular
kind. Generally, in order to run a calculation, one has to define a
set of data and parameters. This can simply be definition of a
discretization parameter such as the mesh size. One can also specify
boundary conditions, equations of state, source terms for a specific
model. Choosing a numerical method or even more setting the model
itself is common in large code.

In ~pugs~, all these "parameters" are set through a
DSL[fn:DSL-def]. Thus, when ~pugs~ is launched, it actually executes the
provided script. A ~C++~ function is associated to each instruction of
the script. The ~C++~ components of ~pugs~ are completely unaware of the
other ones. ~pugs~ interpreter is responsible of data flow between the
components: it manages the data transfer between those ~C++~ components
and ensure that the workflow is properly defined.

**** Why?

In this section we motivate the choice of a language and not of a more
standard approach.

***** Data files are evil

There are lots of reasons not to use data files. By data file, we
refer to a set of options that describe physical models, numerical
methods or their settings.

- Data files are not flexible. This implies in one hand that
  application scenarios must be known somehow precisely to reflect
  possible options combinations and on the other hand even defining a
  specific initial data may require the creation of a new option and
  the associated code (in ~C++~ for instance). \\
  It is quite common to fix the last point by adding a local
  interpreter to evaluate user functions for instance.
- Data files may contain irrelevant information. Actually, it is quite
  common to allow to define options that are valid but irrelevant to a
  given scenario. This policy can be changed but it is generally not
  an easy task and require more work from the user (which can be a
  good thing).
- It is quite common that data files become obsolete. An option was
  not the right one, or its type depended of a simpler context... This
  puts pressure on the user.
- Even worst options meaning can depend on other
  options. Unfortunately, it happens quite easily. For instance, a
  global option can change implicitly the treatment associated to
  another one. This is quite dangerous since writing or reading the
  data file requires an important knowledge of the code internals.
- Another major difficulty when dealing with data files is to check
  the compatibility of provided options.

***** Embedded "data files" are not a solution

Using directly the general purpose language (~C~, ~C++~, ~Fortran~,...) used
to write the code can be tempting. It has the advantage that no
particular treatment is necessary to build a parser (to read data
files or a script), but it presents several drawbacks.

- The first one is probably that it allows to much freedom. While
  defining the model and numerical options, the user has generally
  access to the whole code and can change almost everything, even
  things that should not be changed.
- Again, easily one can have access to irrelevant options and it
  requires a great knowledge of the code to find relevant ones.
- With that regard, defining a simulation properly can be a difficult
  task. For instance, in the early developments of ~pugs~ (when it was
  just a raw ~C++~ code) it was tricky to change boundary conditions for
  coupled physics.
- Another difficulty is related to the fact that code's internal API
  is likely to change permanently in a research code. Thus valid
  constructions or setting may become rapidly obsolete.  In other
  words keeping up to date embedded "data file" might be difficult.
- Finally it requires recompilation of pieces of code (which can be
  large in some cases) even if one is just changing a simple
  parameter.

***** Benefits of a DSL

Actually, an honest analysis cannot conclude that a DSL is the
solution to all problems. However, it offers some advantages.

- First it allows a fine control on what the user can or cannot
  performed. In some sense it offers a chosen level of flexibility.
- It allows to structure the code in the sense that new developments
  have to be designed not only focusing on the result but also on the
  way it should be used (its interactions with the scripting language).
- In the same idea, it provides a framework that should ease the
  definition of "do simple things and do it well".
- There are no hidden dependencies between numerical options: the code
  is easier to read and it is more difficult to get obsolete (this is
  not that true in early development since the language itself and
  some concepts are still likely to change).
- The simulation scenario is *defined* by the script, it is the
  responsibility of the user in some sense and not to the charge of
  the code to check its meaning.


***** ~pugs~ language purpose

~pugs~ language is used to assemble ~C++~ tools which are well
tested. These tools should ideally be small pieces of ~C++~ code that
do one single thing and do it well.

Another purpose of the language is to allow perform high-level
calculations. In other words, the language defines a data flow and
checks that each ~C++~ piece of code is used correctly. Since each piece
of code acts as a pure function (arguments are unchanged by calls),
the context is quite easily checked.

Finally it aims at simplifying the definition of new methods since
common utilities are available directly in scripts.

***** The framework: divide and conquer

~pugs~ is a research oriented software, thus generally the user is also
a developer. If this paragraph is indeed more dedicated to the
developer, reading it, the user will understand better the choices and
policy that drive the code.

Actually the development framework imposed by the DSL tends to guide
writing of new methods.

- A natural consequence is that the developer is encourage to write
  small independent ~C++~ methods (even it does not forbid to write big
  monolithic pieces of code). However as already said, one should try
  to respect the following precept: write small piece of code to ease
  their testing and validation, try to do simple things the right way.
- Also, in the process of writing a *new numerical methods*, one must
  create *new functions in the language*. Moreover, if a new method is
  close to an existing one, it is generally *better* to use completely
  new underlying ~C++~ code than to patch existing methods. Starting
  from a *copy* of the existing code is *encouraged* for
  developments. This may sound weird since classical development
  guidelines encourage inheritance or early redesign. Actually, this
  policy is the result of the following discussion.
  - Using this approach, one *separates* clearly the *design* of a
    numerical method and the *design* of the code!
  - From the computer science point of view, early design for new
    numerical methods is generally wrong: usually one cannot
    anticipate precisely enough eventual problems or method
    corrections.
  - It is much more difficult to introduce bugs in existing methods,
    since previously validated methods are unchanged!
  - For the same reason, existing methods performances are naturally
    unchanged by new developments.
  - Also, when comparing methods, it is better to compare to the
    original existing code.
  - If the new method is abandoned or simply not kept, the existing
    code is not polluted.
  - Finally when a method is validated and ready to integrate the
    mainline of the code, it is easier to see differences with
    existing ones and *at this time* one can redesign the ~C++~ code
    checking that results are unchanged and performances not
    deteriorated. At this time, it is likely that the numerical method
    design if finished, thus (re)designing the code makes more sense.
- Another consequence is that utilities are not be developed again and
  again.
  - This implies an /a priori/ safer code: utilities are well tested and
    validated.
  - It saves development time obviously.
  - Numerical methods code is not polluted by environment instructions
    (data initialization, error calculation, post-processing,...)
  - The counterpart is somehow classical. In the one hand, the
    knowledge of existing utilities is required, this document tries
    to address a part of it. In the other hand, if the developer
    requires a new utility, a good practice is to discuss with the
    others to check if it could benefit to the others to determine if
    it should integrate rapidly or not the main development branch.

***** Why not python or any other scripting language?

As it was already pointed out above, general purpose languages offer
to much freedom: it is not easy to protect data. For instance in the
~pugs~ DSL, non basic variables are constant (see paragraph
[[variable-types]]). It is important since its prevent the user to modify
data in an inconsistent way. Also, one must keep in mind that
constraining the expressiveness is actually a strength. As said
before, one can warranty coherence of the data, perform calculations
without paying attention to the parallelism aspects,... Observe that
it is not a limitation: if the DSL's field of application needs to be
extended, it is always possible. But these extensions should never
break the rule that a DSL must not become a general purpose language.

Providing a language close to the application (or here in particular
close to the Mathematics language) reduces the gap between the
application and its expression (code): Domain Specific Languages are
made for that.

Finally, python is ugly.

*** TODO A high-level language

Defining a suitable language

**** Keep it simple

**** Performances and parallelism

- The language *does not allow* low-level manipulations of high-level
  type variables. This means that for instance, one cannot access or
  modify specific cell or node value. A consequence of such a choice
  is that this *constrained framework* allows to write algorithms
  (within the pugs language) that can be executed in parallel
  naturally. To achieve that, there should *never* be parallel
  instructions or instructions that require *explicit parallel* actions
  in the ~pugs~ language.
- Not providing low-level instruction simplifies the code and reduces
  the risks of errors

* TODO Language

** Variables

*** Types<<variable-types>>

**** Basic types

**** High-level types

**** Tuples

*** Lifetime

** Statements

** Functions<<functions>>

*** User-defined functions

*** Builtin functions

[fn:pugs-def] ~pugs~: Parallel Unstructured Grid Solvers
[fn:MPI-def] ~MPI~: Message Passing Interface
[fn:DSL-def] ~DSL~: Domain Specific Language