Skip to main content
Skip table of contents

Parallelization over multiple nodes of a cluster

Running distributed Monolix on a cluster

To distribute calculations over multiple nodes, a special Monolix executable needs to be used. This executable, named distMonolix, comes with the installation of the Linux version of MonolixSuite. A separate cluster license is available and needs to be obtained in order to use distMonolix. All settings applicable to the monolix executable on this page can also be used with distMonolix on a cluster.

Monolix installation

To run MonolixSuite on a cluster, each cluster node must have access to the MonolixSuite directory and to the user home directory. Thus, there are two possibilities.

  1. MonolixSuite is installed on each node.

  2. MonolixSuite installation is shared. MonolixSuite is installed on a master server. Each cluster node accesses to MonolixSuite through a shared directory (via CIFS, Network drive, NFS, …).

License management

On a cluster, we are managing the usage of our applications with the license management system described here.
The license management server is on a physical machine and manage the application through its license file. The associated license file has to be put in the folder {MonolixSuite install path}/config/system/access (and also {MonolixSuite install path}/bin/Monolix_mcr/runtime/config/system/access for MonolixSuite2016R1). So either on all nodes in the installation case 1, or only on the master server in the configuration 2.

Running Monolix on a single node

If a Monolix run is performed on a single node, it is possible to run Monolix using its executable in the lib folder (typically $HOME/Lixoft/MonolixSuite2024R1/lib/):

CODE
monolix --no-gui -p mlxtran_project_path

where mlxtran_project_path is a Monolix project with a .mlxtran extension.

Running Monolix on multiple nodes using MPI

To run Monolix on multiple nodes, OpenMPI needs to be installed on all nodes. To run using MPI directly using the distMonolix executable in the lib folder (typically $HOME/Lixoft/MonolixSuite2024R1/lib/), you can use this command (this command will distribute Monolix over 4 nodes listed in hostfile.txt with hostfile.txt specifying that each host [i.e node] has 1 slot [i.e core]):

CODE
mpirun -n 4 -hostfile hostfile.txt distMonolix -p mlxtran_project_path --thread 16

Arguments that can be provided to distMonolix are the same ones as with Monolix. This includes --tool to select a multi-run task (model building, convergence assessment, bootstrap) and --config to provide settings for this task. The thread argument indicates the number of threads for each MPI process, see details below.

MPI and multithreading

The distMonolix executable is multithreaded, meaning that each MPI process uses multiple threads.

Multithreading is generally faster than distributing computation across multiple MPI processes on the same node, because threads share memory space and avoid the overhead of inter-process communication. MPI, on the other hand, is designed for distributed memory environments, and using it for core-level parallelism on a single node can introduce unnecessary complexity and reduce performance.

The number of CPU cores used for multithreading by distMonolix is determined either by the --thread argument, or—if this argument is not provided—by the value specified in the config.ini file. By default, the config.ini file sets the number of threads to the number of the available cores on the node. To control thread usage explicitly, you can pass --thread <number> when launching distMonolix.

It is strongly recommended to run only one MPI process per node and use multithreading for parallelism within that node. Running multiple MPI processes per node can lead to resource contention and significantly reduce performance.

We strongly recommend using the --thread argument when running distMonolix to explicitly control the number of threads per MPI process. If --thread is not specified, distMonolix will default to the value in config.ini, which may not always align with the allocated resources, potentially leading to suboptimal performance.

MPI troubleshooting

Different versions of distributed Monolix were built using different versions of Open MPI. If a more recent version was installed on the cluster, the following error may appear when trying to run distributed Monolix:

distMonolix: error while loading shared libraries: libmpi_cxx.so.YY: cannot open shared object file: No such file or directory

To resolve the error, you have to create a symbolic link from your installation:

  • from your installation of libmpi.so (usually in /usr/lib64/openmi/lib/libmpi.so.XX) to libmpi.so.YY (in the MonolixSuite lib folder):

    CODE
     sudo  ln -s your_installation_of_openmi/lib/libmpi.so.XX  installation_of_MonolixSuiteXXXX/lib/libmpi.so.YY
  • from your installation of libmpi_cxx.so (usually in /usr/lib64/openmi/lib/libmpi_cxx.so.XX) to libmpi_cxx.so.YY (in the MonolixSuite lib folder):

    CODE
     sudo  ln -s your_installation_of_openmi/lib/libmpi_cxx.so.XX   installation_of_MonolixSuiteXXXX/lib/libmpi_cxx.so.YY

Distributed calculation

How the distribution is done differs between different tasks:

  • in MCMC (SAEM, Fisher by Stochastic Approximation, Conditional Distribution): pools of ids are created and distributed over the MPI processes,

  • in Importance Sampling: the same is done with simulation pools,

  • in multi-run tasks (bootstrap, convergence assessment): each run is distributed over all processes and the runs are performed one after the other.

Using distributed Monolix with a scheduler

Usually, runs on clusters are scheduled using a job scheduling application (e.g., Torque, PBS, GridEngine, Slurm, LSF, …). After submitting a Monolix run with the job scheduling application, the run will wait in a queue until enough of the resources become available. When the resources become available, the run will be performed.

Generally, a run is submitted to the cluster using a specific command, e.g. qsub in the case of Torque, PBS or GridEngine (former SGE). This command runs a script, provided as parameter, on a cluster node chosen by the cluster scheduler.

Scheduling Monolix runs with Slurm Workload Manager

When using Slurm Workload Manager on a cluster, the runs are submitted using the sbatch command. With the command, a path to a batch script needs to be provided. A simple example of a batch script that can be used to run Monolix is shown here (note that there is no need to provide the number of nodes directly to the mpirun command in the script, since Slurm will automatically pass that information to Open MPI):

BASH
#!/bin/bash
mpirun --bind-to core --map-by slot:PE=$2 ~/Lixoft/MonolixSuite2024R1/lib/distMonolix -p $1 --thread $2

While the examples provided use mpirun within sbatch scripts, launching your MPI application directly via srun (e.g., srun ./your_executable) is generally the preferred and often more efficient method on Slurm. srun typically offers better integration with Slurm's resource allocation (--cpus-per-task, task distribution) and process management. We recommend using srun for launching MPI tasks whenever feasible.

If the script is saved as run.sh, we can schedule a Monolix project to run with the following command (the command will distribute the run across 4 tasks on 4 nodes with each task running 8 threads:

CODE
$ sbatch --nodes 4 --ntasks-per-node 1 --cpus-per-task 8 run.sh mlxtran_project_path 8

Additional arguments, such as time limit or job name, can be provided to the sbatch command either through the command line or through the batch script. All the available options are listed on this page.

The --ntasks-per-node indicates the number of tasks (i.e MPI processes) per node. Within a node, parallelization using multi-threading is more efficient than parallelization using several processes. It is thus recommended to set --ntasks-per-node to 1 and to use the --thread argument of distMonolix to define the number of threads.

The --cpus-per-task option specifies the number of CPU cores allocated per task. When running distMonolix, each MPI process is considered a task, so setting --cpus-per-task=<N> ensures that each MPI process gets N CPU cores for multithreading. Thus, the value of --cpus-per-task and --thread should be the same.

Here is how the run.sh file should look like if we want to assign a job name through the file:

BASH
#!/bin/bash
#SBATCH --job-name=monolixRun

mpirun --bind-to core --map-by slot:PE=$2 ~/Lixoft/MonolixSuite2024R1/lib/distMonolix -p $1 --thread $2

After submitting the job using the sbatch command, we can use the command squeue check the status of a run:

CODE
$ sbatch --nodes 4 --ntasks-per-node 1 --cpus-per-task 8 run.sh mlxtran_project_path 8
$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                86     debug monolixR   lixoft  R       0:02      4 slave[1-4]

We can cancel the run using scancel and specifying the job ID:

CODE
$ scancel 86

Scheduling Monolix runs with IBM Spectrum LSF

When using IBM Spectrum LSF on a cluster, the runs are submitted using the bsub command. With the command, a path to a batch script needs to be provided. A simple example of a batch script that can be used to run Monolix is shown here (note that there is no need to provide the number of processes directly to the mpirun command in the script, since LSF and the MPI library communicate to determine this from the resource request):

BASH
#!/bin/bash
# Basic IBM Spectrum LSF execution script (resource requests via bsub command line)
mpirun ~/Lixoft/MonolixSuite2024R1/lib/distMonolix -p "$1" --thread "$2"

If the script is saved as run_lsf.sh, we can schedule a Monolix project to run with the following command. This command aims to replicate the Slurm example's resource allocation (distribute the run across 4 tasks [i.e MPI processes] on 4 nodes with each task running 8 threads). In IBM Spectrum LSF, this is achieved by requesting 4 processes (-n 4), placing 1 process per host (span[ptile=1]), and requesting 8 cores bound to each process (affinity[core(8)]).

CODE
$ bsub -n 4 -R "span[ptile=1] affinity[core(8)]" < run_lsf.sh mlxtran_project_path 8
  • bsub: The LSF job submission command.

  • -n 4: Requests 4 MPI processes (tasks) in total for the job.

  • -R "span[ptile=1] affinity[core(8)]": A resource requirement string specifying:

    • span[ptile=1]: Places each of the 4 MPI processes onto a separate host [i.e node]

    • affinity[core(8)]: Tells LSF to allocate and bind 8 CPU cores to each of the 4 processes. This ensures that each multithreaded Monolix process has dedicated resources. Note: The exact affinity syntax can sometimes vary based on IBM Spectrum LSF version and configuration. Consult your local IBM Spectrum LSF documentation.

  • < run_lsf.sh: Redirects the script content to the bsub command.

  • mlxtran_project_path: The path to the Monolix project file (becomes $1).

  • 8: The number of threads for each Monolix task (becomes $2, used by Monolix via --thread and requested from LSF via affinity[core(8)]).

The number of threads indicated via --thread should be the same as the number of cores bound to each MPI process with affinity[core()].

The span[ptile=xx] indicates the number of MPI processes per node. Within a node, parallelization using multi-threading is more efficient than parallelization using several processes. It is thus recommended to set span[ptile=1] (1 MPI process per node) and use the --thread argument of distMonolix to define the number of threads.

Additional arguments, such as time limit (-W HH:MM), job name (-J jobname), or queue (-q queuename), can be provided to the bsub command either through the command line or through #BSUB directives within the batch script. All available options are listed in the IBM Spectrum LSF documentation (man bsub).

Here is how the run_lsf.sh file should look if we want to assign a job name through the file:

BASH
#!/bin/bash
#BSUB -J monolixRun
mpirun ~/Lixoft/MonolixSuite2024R1/lib/distMonolix -p "$1" --thread "$2"

After submitting the job using the bsub command, we can use the command bjobs check the status of a run:

CODE
$ bsub -n 4 -R "span[ptile=1] affinity[core(8)]" < run_lsf.sh mlxtran_project_path 8
Job  is submitted to default queue <normal>.

$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
98766   lixoft  PEND  normal     login01                 monolixRun Apr  3 16:28:00
# --- A short time later ---
$ bjobs
JOBID   USER    STAT  QUEUE      FROM_HOST   EXEC_HOST   JOB_NAME   SUBMIT_TIME
98766   lixoft  RUN   normal     login01     host1       monolixRun Apr  3 16:28:15

We can cancel the run using bkill and specifying the job ID:

CODE
$ bkill 98766
Job  is being terminated

For further assistance, contact us.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.