Skip to main content
Skip table of contents

Parallelization over multiple nodes of a cluster

Running distributed Monolix on a cluster

To distribute calculations over multiple nodes, a special Monolix executable needs to be used. This executable, named distMonolix, comes with the installation of the Linux version of MonolixSuite. A separate cluster license is available and needs to be obtained in order to use distMonolix. All settings on this page can also be used with distMonolix on a cluster.

Monolix installation

To run MonolixSuite on a cluster, each cluster node must have access to the MonolixSuite directory and to the user home directory. Thus, there are two possibilities.

  1. MonolixSuite is installed on each node.

  2. MonolixSuite installation is shared. MonolixSuite is installed on a master server. Each cluster node accesses to MonolixSuite through a shared directory (via CIFS, Network drive, NFS, …).

License management

On a cluster, we are managing the usage of our applications with the license management system described here.
The license management server is on a physical machine and manage the application through its license file. The associated license file has to be put in the folder {MonolixSuite install path}/config/system/access (and also {MonolixSuite install path}/bin/Monolix_mcr/runtime/config/system/access for MonolixSuite2016R1). So either on all nodes in the installation case 1, or only on the master server in the configuration 2.

Running Monolix on a single node

If a Monolix run is performed on a single node, it is possible to run Monolix using its executable in the lib folder (typically $HOME/Lixoft/MonolixSuite2024R1/lib/):

CODE
monolix --no-gui -p mlxtran_project_path

where mlxtran_project_path is a Monolix project with a .mlxtran extension.

Running Monolix on multiple nodes using MPI

To run Monolix on multiple nodes, OpenMPI needs to be installed on all nodes. To run using MPI directly using the distMonolix executable in the lib folder (typically $HOME/Lixoft/MonolixSuite2024R1/lib/), you can use this command (this command will distribute Monolix over 4 nodes listed in hostfile.txt):

CODE
mpirun -n 4 -hostfile hostfile.txt distMonolix -p mlxtran_project_path

Arguments that can be provided to distMonolix are the same ones as with Monolix. This includes --tool to select a multi-run task (model building, convergence assessment, bootstrap) and --config to provide settings for this task.

MPI troubleshooting

Different versions of distributed Monolix were built using different versions of Open MPI. If a more recent version was installed on the cluster, the following error may appear when trying to run distributed Monolix:

distMonolix: error while loading shared libraries: libmpi_cxx.so.YY: cannot open shared object file: No such file or directory

To resolve the error, you have to create a symbolic link from your installation:

  • from your installation of libmpi.so (usually in /usr/lib64/openmi/lib/libmpi.so.XX) to libmpi.so.YY (in the MonolixSuite lib folder):

    CODE
     sudo  ln -s your_installation_of_openmi/lib/libmpi.so.XX  installation_of_MonolixSuiteXXXX/lib/libmpi.so.YY
  • from your installation of libmpi_cxx.so (usually in /usr/lib64/openmi/lib/libmpi_cxx.so.XX) to libmpi_cxx.so.YY (in the MonolixSuite lib folder):

    CODE
     sudo  ln -s your_installation_of_openmi/lib/libmpi_cxx.so.XX   installation_of_MonolixSuiteXXXX/lib/libmpi_cxx.so.YY

Distributed calculation

How the distribution is done differs between different tasks:

  • in MCMC (SAEM, Fisher by Stochastic Approximation, Conditional Distribution): pools of ids are created and distributed by process,

  • in Importance Sampling: the same is done with simulation pools,

  • in multi-run tasks (bootstrap, convergence assessment): each run is distributed over all processes.

Using distributed Monolix with a scheduler

Usually, runs on clusters are scheduled using a job scheduling application (e.g., Torque, PBS, GridEngine, Slurm, LSF, …). After submitting a Monolix run with the job scheduling application, the run will wait in a queue until enough of the resources become available. When the resources become available, the run will be performed.

Generally, a task is submitted to the cluster using a specific command, e.g. qsub in the case of Torque, PBS or GridEngine (former SGE). This command runs a script, provided as parameter, on a cluster node chosen by the cluster scheduler.

Scheduling Monolix runs with Slurm Workload Manager

When using Slurm Workload Manager on a cluster, the runs are submitted using the sbatch command. With the command, a path to a batch script needs to be provided. A simple example of a batch script that can be used to run Monolix is shown here (note that there is no need to provide the number of nodes directly to the mpirun command in the script, since Slurm will automatically pass that information to Open MPI):

CODE
#!/bin/bash
mpirun ~/Lixoft/MonolixSuite2024R1/lib/distMonolix -p $1

If the script is saved as run.sh, we can schedule a Monolix project to run with the following command (the command will distribute the run across 16 tasks on 4 nodes:

CODE
$ sbatch -n 16 --nodes 4 run.sh mlxtran_project_path

Additional arguments, such as time limit or job name, can be provided to the sbatch command either through the command line or through the batch script. All the available options are listed on this page.

Here is how the run.sh file should look like if we want to assign a job name through the file:

CODE
#!/bin/bash
#SBATCH --job-name=monolixRun

mpirun ~/Lixoft/MonolixSuite2024R1/lib/distMonolix -p $1

After submitting the job using the sbatch command, we can use the command squeue check the status of a run:

CODE
$ sbatch -n 16 --nodes 4 run.sh mlxtran_project_path
$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
                86     debug monolixR   lixoft  R       0:02      4 slave[1-4]

We can cancel the run using scancel and specifying the job ID:

CODE
$ scancel 86

For further assistance, contact us.

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.