Tips for running ampsci on HPC systems using SLURM.

HPC / SLURM

ampsci runs well on HPC systems via SLURM. It is a shared-memory (OpenMP) code (it does not use MPI) so all parallelism is within a single node.

On HPC systems, compilation and jobs are typically submitted via SLURM – see below for example scripts. Do not run jobs, including compiling ampsci, directly on the login node of the HPC. You must submit all jobs via the queueing system.

SLURM is a widely-used open-source job scheduler for HPC clusters; other schedulers exist (PBS, LSF, etc.) but SLURM is the most common.
Jobs are submitted to a queue via sbatch, and then run when the required resources are available
- The job script contains #SBATCH directives specifying resource requests, followed by the shell commands to run
- You must request resources carefully: too little memory and your job will be killed; too much and you'll wait longer in the queue
- Several example slurm job scripts are given below
You can monitor the queue with squeue, and cancel jobs with scancel.

Useful references:

Key SLURM commands

Command	Description
`sbatch job.slurm`	submit a job script to the queue
`squeue -u $USER`	list your queued/running jobs
`scancel <jobid>`	cancel a job
`sinfo`	show available partitions and node status
`sacct -j <jobid>`	show accounting info for a completed job
`seff <jobid>`	show CPU and memory efficiency for a completed job

Key SLURM directives

Directive	Meaning
`--nodes=1`	single node (ampsci does not use MPI)
`--ntasks-per-node=1`	one task per node
`--cpus-per-task=N`	number of OpenMP threads; set to match `make -jN` and `OMP_NUM_THREADS`
`--mem=XG`	memory per node
`--time=D-H:MM:SS`	wall time (time limit)
`--partition=...`	queue/partition name (site-specific)
`--account=...`	account to charge (site-specific)
`--constraint=...`	request specific CPU architecture (optional, site-specific)

#!/bin/bash --login is recommended – it sources the user's login environment, which ensures module is available.

Compiling ampsci on HPC systems

On HPC systems, you will typically need to module load the required dependencies
- What we require: C++ compiler, lapack, blas, GSL
- Often, most of these come 'bundled' in a "toolchain" (e.g., foss)
Load the required modules before running configure.sh.
On most systems (e.g., Friday) unversioned names work:

module load foss gsl
./configure.sh -y
make

On others (e.g. Bunya), explicit versions are required and must be consistent. On most systems, the GSL module typically matches the foss toolchain something like (though specifics may change on different HPC systems):

foss	GCC	Matching GSL (typical)
2022a	11.3.0	gsl/2.7-gcc-11.3.0
2023a	12.3.0	gsl/2.7-gcc-12.3.0
2024a	13.3.0	gsl/2.8-gcc-13.3.0

So, we would do something like:

module load foss/2024a gsl/2.8-gcc-13.3.0
./configure.sh -y
make

Load required modules before running ampsci or configure.sh. It might be a good idea to add a module purge first, which avoids conflicts from previously loaded modules.

module purge
module load foss/2024a gsl/2.8-gcc-13.3.0
module list

Use module avail foss or module spider gsl or similar to find available versions.

configure.sh attempts to auto-detect BLAS/OpenBLAS version (via $EBROOTOPENBLAS) and sets LDLIBS accordingly. If auto-detection fails, you will have to set it manually in Makefile, e.g.:

LDLIBS ?= -lgsl -lgslcblas -lopenblas

Note: -lgfortran may also be required on some older configurations.

If configure.sh does not produce a working build, refer to the manual Compilation Details for details

BLAS threading (recommended for CI calculations)

The foss toolchain includes OpenBLAS, which is already multi-threaded. Set OPENBLAS_NUM_THREADS in your job script to match --cpus-per-task:

export OPENBLAS_NUM_THREADS=$SLURM_CPUS_PER_TASK

Intel MKL is an alternative to OpenBLAS and may be faster on Intel nodes. Load the MKL module (name is site-specific – try module spider mkl or module spider imkl):

module load imkl # name varies: intel-mkl, imkl, mkl, ...

Then set in the Makefile:

LDLIBS ?= -lgsl -lgslcblas -lmkl_rt

And in your job script:

export MKL_THREADING_LAYER=GNU # required with GNU OpenMP

export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK

See Compilation Details for more on BLAS options.

Example scripts

Four example SLURM scripts are provided in doc/examples/:

compile.slurm – compile ampsci (OpenBLAS)
compile_mkl.slurm – compile ampsci with Intel MKL
singlejob.slurm – run a single ampsci job
arrayjob.slurm – run an array of jobs (e.g. over a parameter range)

Compile job

#!/bin/bash --login
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=9G
#SBATCH --job-name=CompileAMPSCI
#SBATCH --time=0:05:00
#SBATCH --partition=general
## Replace with your group's account string:
#SBATCH --account=a_your_account
 
## Uncomment and set for specific CPU architecture (Bunya etc.):
##SBATCH --constraint=epyc3
 
module purge
module load foss/2024a
module load gsl/2.8-gcc-13.3.0
 
./configure.sh -y
 
make -j8 ampsci |tee -a compile-log.out

Compile job (Intel MKL)

#!/bin/bash --login
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=8
#SBATCH --mem=9G
#SBATCH --job-name=CompileAMPSCI_MKL
#SBATCH --time=0:05:00
#SBATCH --partition=general
## Replace with your group's account string:
#SBATCH --account=a_your_account
 
## Uncomment and set for specific CPU architecture (Bunya etc.):
##SBATCH --constraint=epyc3
 
module purge
module load intel/2024a          # provides MKL
module load foss/2024a     # gets GCC + OpenBLAS (for GSL dependency)
module load gsl/2.8-gcc-13.3.0  # GSL (C library; compatible with GCC or Intel compiler)
 
./configure.sh -y
 
## -lmkl_rt :: link against MKL version of lapack/blas
make LDLIBS="-lgsl -lgslcblas -lmkl_rt" -j8 ampsci |tee -a compile-log-mkl.out

Single job

#!/bin/bash --login
 
## Computational Resources to request:
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=16G
#SBATCH --partition=general
 
## "Wall time" (time limit), Days-HH:MM:SS
#SBATCH --time=0-1:00:00
 
## Optional job name
#SBATCH --job-name=SingleAmpsciJob
 
## Replace with your group's account string:
#SBATCH --account=a_your_account_string
 
module purge
module load foss/2024a
module load gsl/2.8-gcc-13.3.0
 
## Set number of OMP threads:
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
## Set BLAS threads to match (for OpenBLAS or MKL):
export OPENBLAS_NUM_THREADS=$SLURM_CPUS_PER_TASK
## If using MKL instead of OpenBLAS, replace above with:
## export MKL_THREADING_LAYER=GNU
## export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK
 
## Path to where ampsci was compiled
AMPSCI=$HOME/ampsci/ampsci
 
## Input and output file names:
input=inputfile.in
output=${input/".in"/".out"}
 
$AMPSCI $input |tee -a $output

Array job

#!/bin/bash --login
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --cpus-per-task=16
#SBATCH --mem=16G
#SBATCH --job-name=ArrayAmpsciJob
#SBATCH --time=0-1:00:00
#SBATCH --partition=general
## Replace with your group's account string:
#SBATCH --account=a_your_account
 
## Set array range to match number of input files (0-indexed):
#SBATCH --array=0-24
 
module purge
module load foss/2024a
module load gsl/2.8-gcc-13.3.0
 
## Set number of OMP threads:
export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
## Set BLAS threads to match (for OpenBLAS or MKL):
export OPENBLAS_NUM_THREADS=$SLURM_CPUS_PER_TASK
## If using MKL instead of OpenBLAS, replace above with:
## export MKL_THREADING_LAYER=GNU
## export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK
 
## Path to where ampsci was compiled
AMPSCI=$HOME/ampsci/ampsci
 
## Build list of input files (all matching "*.in" in current directory)
input_file_list=()
for input_file in *.in; do
  input_file_list+=($input_file)
done
 
## Input and output file names - one for each array job:
input=${input_file_list[${SLURM_ARRAY_TASK_ID}]}
output=${input/".in"/".out"}
 
$AMPSCI $input |tee -a $output

Tips

Threads: set OMP_NUM_THREADS to match --cpus-per-task, or ampsci will default to using all available cores on the node:

export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK

./ampsci input.in

Memory

ampsci memory use depends on the basis size.
8–16 GB is often sufficient for small calculations; large MBPT calculations may need more.
Use ampsci -z <Basis> to estimate memory requirements (can be very rough). e.g.,

./ampsci -z 35spdfgh

Output

redirect output to a file for later inspection. Useing tee is recommended
tee will output to screen and to text file; -a means append:

./ampsci input.in |tee -a output.out