#+SETUPFILE: ../packages/org-themes/src/bigblow_inline/bigblow_inline.theme #+STARTUP: org-pretty-entities entitiespretty #+PROPERTY: header-args :comments no #+OPTIONS: timestamp:nil #+OPTIONS: h:3 num:t toc:3 #+TITLE: The pugs user manual #+OPTIONS: author:nil date:nil #+OPTIONS: tex:t #+LANGUAGE: en #+ATTR_LATEX: :width 4cm #+HTML_HEAD_EXTRA: <style> pre.src-pugs:before { content: 'pugs'; } </style> #+LATEX_CLASS_OPTIONS: [10pt] #+LATEX_HEADER: \usepackage[hmargin=2.5cm,vmargin=1.5cm]{geometry} #+LATEX_COMPILER: pdflatex --shell-escape #+LATEX_HEADER_EXTRA: \usepackage{amsmath} #+LATEX_HEADER_EXTRA: \usepackage{amsthm} #+LATEX_HEADER_EXTRA: \usepackage{amssymb} #+LATEX_HEADER_EXTRA: \usepackage{xcolor} #+LATEX_HEADER_EXTRA: \usepackage{mdframed} #+LATEX_HEADER_EXTRA: \BeforeBeginEnvironment{minted}{\begin{mdframed}[linecolor=blue,backgroundcolor=blue!5]} #+LATEX_HEADER_EXTRA: \AfterEndEnvironment{minted}{\end{mdframed}} #+LATEX_HEADER_EXTRA: \BeforeBeginEnvironment{verbatim}{\begin{mdframed}[linecolor=gray,backgroundcolor=green!5]} #+LATEX_HEADER_EXTRA: \AfterEndEnvironment{verbatim}{\end{mdframed}} * Introduction ~pugs~[fn:pugs-def] is a general purpose solver collection built to approximate solutions of partial differential equations. It is mainly (but not only) designed to deal with hyperbolic problems using finite-volume methods. ~pugs~ is a parallel software that uses ~MPI~[fn:MPI-def] and multi-threading for parallelism. Multi-threading is achieved through an encapsulation of some [[https:github.com/kokkos/kokkos][Kokkos]] mechanisms. The philosophy of ~pugs~ is to provide "simple" numerical tools that are assembled together through a high-level language (a DSL[fn:DSL-def] close to the mathematics) to build more complex solvers. This approach is inspired by the success of [[http://freefem.org][FreeFEM]], which use a similar approach. Before detailing the leading concepts and choices that we have made to develop ~pugs~, we give a simple example that should illustrate the capabilities of the code. ** An example For instance, the following code builds a uniform Cartesian grid of $]-1,1[^2$ made of $20\times20$ cells and transforms it by displacing the mesh nodes according to a user defined vector field $T: \mathbb{R}^2 \to \mathbb{R}^2$. #+NAME: introduction-example #+BEGIN_SRC pugs :exports both :results output import mesh; import math; import writer; let pi:R, pi = acos(-1); let theta:R^2 -> R, x -> 0.5*pi*(x[0]*x[0]-1)*(x[1]*x[1]-1); let M:R^2 -> R^2x2, x -> (cos(theta(x)), -sin(theta(x)), sin(theta(x)), cos(theta(x))); let T: R^2 -> R^2, x -> x + M(x)*x; let m:mesh, m = cartesian2dMesh((-1,-1), (1,1), (20,20)); m = transform(m, T); write_mesh(gnuplot_writer("transformed", 0), m); #+END_SRC The example is quite easy to read. - First, some *modules* are loaded: the ~mesh~ module, which contains some mesh manipulation functions. The ~math~ module provides a set of classical mathematical functions ($\sin$, $\cos$, ...). The ~writer~ module is used to save meshes or discrete functions to files using various formats. - The second block of data defines variables of different kind - ~pi~ is a real value that is initialized by an approximation of $\pi$. - ~theta~ is the real value function $\theta$ defined by \begin{equation*} \theta: \mathbb{R}^2 \to \mathbb{R},\quad\mathbf{x} \mapsto \frac{\pi}{2} (x_0^2-1)(x_1^2-1) \end{equation*} - ~M~ is the $\mathbb{R}^{2\times2}$ matrix field $M$ defined by \begin{equation*} M: \mathbb{R}^2 \to \mathbb{R}^{2\times2},\quad\mathbf{x} \mapsto \begin{pmatrix} \cos(\theta(\mathbf{x})) & -\sin(\theta(\mathbf{x})) \\ \sin(\theta(\mathbf{x})) & \cos(\theta(\mathbf{x})) \end{pmatrix} \end{equation*} - ~T~ is the vector field $T$ defined by \begin{equation*} T: \mathbb{R}^2 \to \mathbb{R}^2, \quad\mathbf{x} \mapsto (I+M(\mathbf{x}))\mathbf{x} \end{equation*} - Finally ~m~ is defined as the uniform Cartesian mesh grid of $]-1,1[^2$. The last argument: ~(20,20)~ sets the number of cells in each direction: $20$. - The third block is the application of the transformation $T$ to the mesh. Observe that if the resulting mesh is stored in the same variable ~m~, the old one was not modified in the process. It is important to already have in mind that the ~pugs~ language *does not allow* the modifications of values of *non-basic* types. This is discussed in the section [[variable-types]]. - Finally, the last block consists in saving the obtained mesh in a ~gnuplot~ file. The result is shown on Figure [[fig:intro-example]]. #+NAME: intro-transform-mesh-img #+BEGIN_SRC gnuplot :exports results :file (substitute-in-file-name "${PUGS_SOURCE_DIR}/doc/intro-transform-mesh.png") reset unset grid unset border unset key unset xtics unset ytics set terminal png truecolor enhanced set size square plot '<(sed "" $PUGS_SOURCE_DIR/doc/transformed.0000.gnu)' w l #+END_SRC #+CAPTION: Obtained transformed mesh #+NAME: fig:intro-example #+ATTR_LATEX: :width 0.38\textwidth #+ATTR_HTML: :width 300px; #+RESULTS: intro-transform-mesh-img Even if this first example is very simple, some key aspects can already be discussed. - There is no predefined constant in ~pugs~. Here a value is provided for ~pi~. - There are two kinds of variable in ~pugs~: variables of basic types and variable of high-level types. This two kinds of variable behave almost the same but one must know their differences to understand better the underlying mechanisms and choices that we made. See [[variable-types]] for details. - Also, there are two types of function: *user-defined* functions and *builtin* functions. In this example, ~theta~, ~M~ and ~T~ are user-defined functions. All other functions (~cos~, ~cartesian2dMesh~,...) are builtin functions and are generally defined when importing a module. These functions behave similarly, one should refer to [[functions]] for details. ** Concepts and design As it was already stated ~pugs~ can be viewed as a collection of numerical methods or utilities that can be assembled together, using a user friendly language, to build simulation scenarios. Utilities are tools that are often used in numerical simulations. Examples of such utilities are mesh manipulations, definition of initial conditions, post-processing, error calculations,... *** A C++ toolbox driven by a user friendly language Numerical simulation packages are software of a particular kind. Generally, in order to run a calculation, one has to define a set of data and parameters. This can simply be definition of a discretization parameter such as the mesh size. One can also specify boundary conditions, equations of state, source terms for a specific model. Choosing a numerical method or even more setting the model itself is common in large code. In ~pugs~, all these "parameters" are set through a DSL[fn:DSL-def]. Thus, when ~pugs~ is launched, it actually executes the provided script. A ~C++~ function is associated to each instruction of the script. The ~C++~ components of ~pugs~ are completely unaware of the other ones. ~pugs~ interpreter is responsible of data flow between the components: it manages the data transfer between those ~C++~ components and ensure that the workflow is properly defined. **** Why? In this section we motivate the choice of a language and not of a more standard approach. ***** Data files are evil There are lots of reasons not to use data files. By data file, we refer to a set of options that describe physical models, numerical methods or their settings. - Data files are not flexible. This implies in one hand that application scenarios must be known somehow precisely to reflect possible options combinations and on the other hand even defining a specific initial data may require the creation of a new option and the associated code (in ~C++~ for instance). \\ It is quite common to fix the last point by adding a local interpreter to evaluate user functions for instance. - Data files may contain irrelevant information. Actually, it is quite common to allow to define options that are valid but irrelevant to a given scenario. This policy can be changed but it is generally not an easy task and require more work from the user (which can be a good thing). - It is quite common that data files become obsolete. An option was not the right one, or its type depended of a simpler context... This puts pressure on the user. - Even worst options meaning can depend on other options. Unfortunately, it happens quite easily. For instance, a global option can change implicitly the treatment associated to another one. This is quite dangerous since writing or reading the data file requires an important knowledge of the code internals. - Another major difficulty when dealing with data files is to check the compatibility of provided options. ***** Embedded "data files" are not a solution Using directly the general purpose language (~C~, ~C++~, ~Fortran~,...) used to write the code can be tempting. It has the advantage that no particular treatment is necessary to build a parser (to read data files or a script), but it presents several drawbacks. - The first one is probably that it allows to much freedom. While defining the model and numerical options, the user has generally access to the whole code and can change almost everything, even things that should not be changed. - Again, easily one can have access to irrelevant options and it requires a great knowledge of the code to find relevant ones. - With that regard, defining a simulation properly can be a difficult task. For instance, in the early developments of ~pugs~ (when it was just a raw ~C++~ code) it was tricky to change boundary conditions for coupled physics. - Another difficulty is related to the fact that code's internal API is likely to change permanently in a research code. Thus valid constructions or setting may become rapidly obsolete. In other words keeping up to date embedded "data file" might be difficult. - Finally it requires recompilation of pieces of code (which can be large in some cases) even if one is just changing a simple parameter. ***** Benefits of a DSL Actually, an honest analysis cannot conclude that a DSL is the solution to all problems. However, it offers some advantages. - First it allows a fine control on what the user can or cannot performed. In some sense it offers a chosen level of flexibility. - It allows to structure the code in the sense that new developments have to be designed not only focusing on the result but also on the way it should be used (its interactions with the scripting language). - In the same idea, it provides a framework that should ease the definition of "do simple things and do it well". - There are no hidden dependencies between numerical options: the code is easier to read and it is more difficult to get obsolete (this is not that true in early development since the language itself and some concepts are still likely to change). - The simulation scenario is *defined* by the script, it is the responsibility of the user in some sense and not to the charge of the code to check its meaning. ***** ~pugs~ language purpose ~pugs~ language is used to assemble ~C++~ tools which are well tested. These tools should ideally be small pieces of ~C++~ code that do one single thing and do it well. Another purpose of the language is to allow perform high-level calculations. In other words, the language defines a data flow and checks that each ~C++~ piece of code is used correctly. Since each piece of code acts as a pure function (arguments are unchanged by calls), the context is quite easily checked. Finally it aims at simplifying the definition of new methods since common utilities are available directly in scripts. ***** The framework: divide and conquer ~pugs~ is a research oriented software, thus generally the user is also a developer. If this paragraph is indeed more dedicated to the developer, reading it, the user will understand better the choices and policy that drive the code. Actually the development framework imposed by the DSL tends to guide writing of new methods. - A natural consequence is that the developer is encourage to write small independent ~C++~ methods (even it does not forbid to write big monolithic pieces of code). However as already said, one should try to respect the following precept: write small piece of code to ease their testing and validation, try to do simple things the right way. - Also, in the process of writing a *new numerical methods*, one must create *new functions in the language*. Moreover, if a new method is close to an existing one, it is generally *better* to use completely new underlying ~C++~ code than to patch existing methods. Starting from a *copy* of the existing code is *encouraged* for developments. This may sound weird since classical development guidelines encourage inheritance or early redesign. Actually, this policy is the result of the following discussion. - Using this approach, one *separates* clearly the *design* of a numerical method and the *design* of the code! - From the computer science point of view, early design for new numerical methods is generally wrong: usually one cannot anticipate precisely enough eventual problems or method corrections. - It is much more difficult to introduce bugs in existing methods, since previously validated methods are unchanged! - For the same reason, existing methods performances are naturally unchanged by new developments. - Also, when comparing methods, it is better to compare to the original existing code. - If the new method is abandoned or simply not kept, the existing code is not polluted. - Finally when a method is validated and ready to integrate the mainline of the code, it is easier to see differences with existing ones and *at this time* one can redesign the ~C++~ code checking that results are unchanged and performances not deteriorated. At this time, it is likely that the numerical method design if finished, thus (re)designing the code makes more sense. - Another consequence is that utilities are not be developed again and again. - This implies an /a priori/ safer code: utilities are well tested and validated. - It saves development time obviously. - Numerical methods code is not polluted by environment instructions (data initialization, error calculation, post-processing,...) - The counterpart is somehow classical. In the one hand, the knowledge of existing utilities is required, this document tries to address a part of it. In the other hand, if the developer requires a new utility, a good practice is to discuss with the others to check if it could benefit to the others to determine if it should integrate rapidly or not the main development branch. ***** Why not python or any other scripting language? As it was already pointed out above, general purpose languages offer to much freedom: it is not easy to protect data. For instance in the ~pugs~ DSL, non basic variables are constant (see paragraph [[variable-types]]). It is important since its prevent the user to modify data in an inconsistent way. Also, one must keep in mind that constraining the expressiveness is actually a strength. As said before, one can warranty coherence of the data, perform calculations without paying attention to the parallelism aspects,... Observe that it is not a limitation: if the DSL's field of application needs to be extended, it is always possible. But these extensions should never break the rule that a DSL must not become a general purpose language. Providing a language close to the application (or here in particular close to the Mathematics language) reduces the gap between the application and its expression (code): Domain Specific Languages are made for that. Finally, python is ugly. *** TODO A high-level language Defining a suitable language **** Keep it simple **** Performances and parallelism - The language *does not allow* low-level manipulations of high-level type variables. This means that for instance, one cannot access or modify specific cell or node value. A consequence of such a choice is that this *constrained framework* allows to write algorithms (within the pugs language) that can be executed in parallel naturally. To achieve that, there should *never* be parallel instructions or instructions that require *explicit parallel* actions in the ~pugs~ language. - Not providing low-level instruction simplifies the code and reduces the risks of errors * TODO Language ** Variables *** Types<<variable-types>> **** Basic types **** High-level types **** Tuples *** Lifetime ** Statements ** Functions<<functions>> *** User-defined functions *** Builtin functions [fn:pugs-def] ~pugs~: Parallel Unstructured Grid Solvers [fn:MPI-def] ~MPI~: Message Passing Interface [fn:DSL-def] ~DSL~: Domain Specific Language