Ouro
  • Docs
  • Blog
Join for freeSign in
  • Teams
  • Search
Assets
  • Quests
  • Posts
  • APIs
  • Data
  • Teams
  • Search
Assets
  • Quests
  • Posts
  • APIs
  • Data
4h

Compiling VASP in Modal with GPU acceleration

Continuing to share some templates for getting DFT software running in Modal and taking advantage of their GPU hardware. This time we'll be looking at setting up a Modal Image to run VASP. I'll be working with VASP v6.3.0, but I expect this approach to work for any version v6.x.x. In the last post, we looked at setting up ABACUS. Check that post if you want to run DFT in a serverless environment but don't have a VASP license:

Compiling ABACUS for GPU acceleration in Modal

post

A simple guide for compiling ABACUS to run with GPU acceleration in Modal. The post explains how to build ABACUS with CUDA support and run DFT calculations in a serverless environment. It covers why Modal’s on‑demand GPUs (like A100) can help, and which ABACUS setup (plane waves with basis_type pw and ks_solver bpcg) tends to work best on GPUs in version 3.9.0.

1d

The process this time is fairly similar to the last time. We choose a custom docker image, install any required packages, do some compiler linking, then compile the software from source.

I mostly followed their OpenACC setup for NVIDIA HPC build template. I had never heard of OpenACC before, but this is what it is:

OpenACC is an open standard for parallel programming that uses compiler directives in C, C++, and Fortran to offload code to accelerators like GPUs. It allows developers to add directives to existing code to manage parallel execution and data movement, making it easier to accelerate applications for heterogeneous systems with both CPUs and GPUs.

Kinda cool. You build once, and automatically get support for CPU and GPU acceleration. Kind of like hybrid web/mobile apps with a framework like React Native, I wonder how much performance you leave on the table with this approach. Anyways, I'll leave my personal opinions for another post and get back to sharing the instructions.


We're going to make things easy on ourselves and the environment setup by using a NVIDIA HPC SDK container, offered directly from NVIDIA. This is the container I chose:

https://catalog.ngc.nvidia.com/orgs/nvidia/containers/nvhpc?version=24.11-devel-cuda12.6-ubuntu22.04

It's got all sorts of CUDA stuff installed, and still uses a familiar distro Ubuntu. From the page, key features of the NVIDIA HPC SDK for Linux include:

  • Support for NVIDIA Blackwell architecture GPUs

  • Support for NVIDIA Ampere Architecture GPUs with FP16, TF32 and FP64 tensor cores and MIG

  • Support for x86-64 and Arm Server multicore CPUs

  • NVC++ ISO C++17 compiler with Parallel Algorithms acceleration on GPUs, OpenACC and OpenMP

  • NVFORTRAN ISO Fortran 2003 compiler with array intrinsics acceleration on GPUs, CUDA Fortran, OpenACC and OpenMP

  • NVC ISO C11 compiler with OpenACC and OpenMP

  • NVCC NVIDIA CUDA C++ compiler

  • cuBLAS GPU-accelerated basic linear algebra subroutine (BLAS) library

And a lot more. Check out the container page for more info. Key features is that it has everything we need for OpenACC and support ARM hardware which is what I believe Modal CPUs are working with.

Before we can pull the image we need on Modal, we need to setup a NVIDIA Cloud account. This is just a requirement NVIDIA has to be able to pull from their registry. Doesn't cost anything, but it is kind of annoying.

See their guide on how to get setup. I followed this guide, and you should too, but I'll paraphrase the steps here. Specifically, we want to follow the instructions for how you'd pull a container using the Docker CLI, as it seems Modal follows the same flow.

  • Make an account with NVG

  • Create an API key. Key should look like nvapi-***

  • Create a Secret in Modal with the following setup:

    Create a secret on Modal to use for pulling images from NVG Catalog

    Image file

    Keys must be called REGISTRY_USERNAME and REGISTRY_PASSWORD. REGISTRY_USERNAME must equal $oauthtoken. REGISTRY_PASSWORD is your API you generate from your NVIDIA Cloud account.

    4h
    • Pay close attention to the key names and values here. They are not arbitrary. REGISTRY_USERNAME = $oauthtoken and REGISTRY_PASSWORD = nvapi-***.

  • We are ready to move on to the Modal Image setup now.

We build the image like so:

python
vasp_image = (
    modal.Image.from_registry("nvcr.io/nvidia/nvhpc:24.11-devel-cuda12.6-ubuntu22.04")
    # Only essentials; FFTW from OS. Use NVHPC for MPI/BLAS/LAPACK/ScaLAPACK.
    .apt_install(
        "git",
        "make",
        "cmake",
        "build-essential",
        "binutils",
        "rsync",
        "software-properties-common",
        "libfftw3-dev",
        "python3",
        "python3-dev",
        "python3-pip",
    )
    .run_commands("python3 -m pip install --upgrade pip")
    # Add VASP sources and your files
    .add_local_dir("vasp.6.3.0", "/opt/vasp.6.3.0", copy=True)
    .add_local_file(
        "makefile.include.gpu", "/opt/vasp.6.3.0/makefile.include", copy=True
    )
    .add_local_file("build_vasp.sh", "/opt/vasp.6.3.0/build_vasp.sh", copy=True)
    # Build using the script (ensures SDK toolchain + MPI are used)
    # This builds all VASP versions: vasp_std, vasp_gam, vasp_ncl
    # For MAE calculations, vasp_ncl (noncollinear) is required
    .run_commands(
        "bash -lc 'chmod +x /opt/vasp.6.3.0/build_vasp.sh && /opt/vasp.6.3.0/build_vasp.sh'"
    )
    # Create python symlink after VASP build (preserves cache) but before pip install (where needed)
    .run_commands("ln -sf /usr/bin/python3 /usr/bin/python")
    # Python deps for your service layer
    .pip_install(
        "pymatgen>=2023.8.0",
        "numpy>=1.24.0",
        "ase>=3.22.0",
        "fastapi>=0.104.0",
        "python-multipart>=0.0.6",
    )
    # Runtime env: ensure HPC-X is first on PATH; avoid Ubuntu OpenMPI entirely
    .env(
        {
            "PATH": (
                "/opt/nvidia/hpc_sdk/Linux_x86_64/24.11/comm_libs/mpi/bin:"
                "/opt/nvidia/hpc_sdk/Linux_x86_64/24.11/compilers/bin:"
                "/opt/vasp/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin"
            ),
            "LD_LIBRARY_PATH": (
                "/opt/nvidia/hpc_sdk/Linux_x86_64/24.11/comm_libs/mpi/lib:"
                "/opt/nvidia/hpc_sdk/Linux_x86_64/24.11/cuda/12.6/lib64:"
                "/opt/nvidia/hpc_sdk/Linux_x86_64/24.11/compilers/lib:"
                "/opt/nvidia/hpc_sdk/Linux_x86_64/24.11/comm_libs/nccl/lib:"
                "/opt/nvidia/hpc_sdk/Linux_x86_64/24.11/compilers/extras/qd/lib"
            ),
            "VASP_CMD": "/opt/vasp/bin/vasp_ncl",  # Use noncollinear version for MAE
            "VASP_PP_PATH": "/data/potcars",
            "OMPI_ALLOW_RUN_AS_ROOT": "1",
            "OMPI_ALLOW_RUN_AS_ROOT_CONFIRM": "1",
            "OMPI_CC": "nvc",
            "OMPI_CXX": "nvc++",
            "OMPI_FC": "nvfortran",
        }
    )
    # Your Python helper modules
    # .add_local_file("vasp_mae_calculator.py", "/root/vasp_mae_calculator.py")
    # .add_local_file("vasp_parser.py", "/root/vasp_parser.py")
)

Locally in the same folder as this Python file, you'll need your VASP source (mine is in a folder called vasp.6.3.0), the Makefile (mine is called makefile.include.gpu), and a build script to pull everything together (called build_vasp.sh here).

You'd download the VASP source from their website, but the rest of it I can share here:

makefile.include.gpu

plaintext
# VASP makefile.include for NVIDIA HPC SDK (OpenACC GPU build)
# Target: A100 (cc80), CUDA 12.6

# ---- Precompiler options (aligned with VASP nvhpc_acc template) ----
CPP_OPTIONS = -DHOST=\"LinuxNV\" \
              -DMPI -DMPI_INPLACE -DMPI_BLOCK=8000 -Duse_collective \
              -DscaLAPACK \
              -DCACHE_SIZE=4000 \
              -Davoidalloc \
              -Dvasp6 \
              -Dtbdyn \
              -Dqd_emulate \
              -Dfock_dblbuf \
              -DACC_OFFLOAD \
              -DNVCUDA \
              -DUSENCCL \
              -D_OPENACC

CPP = nvfortran -Mpreprocess -Mfree -Mextend -E $(CPP_OPTIONS) $*$(FUFFIX) > $*$(SUFFIX)

# ---- GPU/Compiler flags ----
GPU ?= -gpu=cc80,cuda12.6
CC   = mpicc  -acc $(GPU)
FC   = mpif90 -acc $(GPU)
FCL  = mpif90 -acc $(GPU) -c++libs

FREE   = -Mfree
FFLAGS = -Mbackslash -Mlarge_arrays
OFLAG  = -fast
DEBUG  = -Mfree -O0 -traceback

# CUDA/NCCL libs via NVHPC convenience switches
LLIBS  = -cudalib=cublas,cusolver,cufft,nccl -cuda

# FFT objects (mandatory for OpenACC GPU builds)
OBJECTS = fftmpiw.o fftmpi_map.o fftw3d.o fft3dlib.o

# Redefine the standard list of O1 and O2 objects
SOURCE_O1  := pade_fit.o
SOURCE_O2  := pead.o

# ---- NVHPC root + QD ----
NVROOT ?= $(shell which nvfortran | awk -F /compilers/bin/nvfortran '{ print $$1 }')
QD     ?= $(NVROOT)/compilers/extras/qd
LLIBS  += -L$(QD)/lib -lqdmod -lqd
INCS   += -I$(QD)/include/qd

# ---- BLAS/LAPACK/ScaLAPACK from NVHPC ----
BLAS      = -lblas
LAPACK    = -llapack
SCALAPACK = -Mscalapack
LLIBS    += $(SCALAPACK) $(LAPACK) $(BLAS)

# ---- FFTW (only library that must come from the OS) ----
FFTW_ROOT ?= /usr
LLIBS     += -L$(FFTW_ROOT)/lib -lfftw3
INCS      += -I$(FFTW_ROOT)/include

# ---- For what used to be vasp.5.lib ----
CPP_LIB    = $(CPP)
FC_LIB     = nvfortran
CC_LIB     = nvc -w
CFLAGS_LIB = -O
FFLAGS_LIB = -O1 -Mfixed
FREE_LIB   = $(FREE)

OBJECTS_LIB = linpack_double.o

# ---- Parser library ----
CXX_PARS = nvc++ --no_warnings

# ---- Optional (SDK >= 24.1): multi-process GPU dense LA ----
#CPP_OPTIONS += -DCUSOLVERMP -DCUBLASMP
#LLIBS      += -cudalib=cusolvermp,cublasmp -lnvhpcwrapcal

Because of all the complexity going on here, I'm well aware that there could be further optimizations to these flags and config that could make a significant performance difference. If you find anything, please share it here!

build_vasp.sh

plaintext
#!/usr/bin/env bash
set -euo pipefail

# Build all VASP versions for GPU (NVHPC + OpenACC)
#
# This script builds three VASP executables:
#   - vasp_std: Standard version with NGZhalf optimization (collinear magnetism only)
#   - vasp_gam: Gamma-point only version (faster for single k-point calculations)
#   - vasp_ncl: Noncollinear version (required for SOC and MAE calculations)
#
# For MAE calculations, you MUST use vasp_ncl as it supports noncollinear
# magnetism and spin-orbit coupling without the NGXhalf/NGZhalf flags.

# Discover NVHPC root from nvfortran location
NVHPC_BIN="$(command -v nvfortran)"
NVHPC_ROOT="${NVHPC_BIN%/compilers/bin/nvfortran}"

# Prefer the SDK’s CUDA-aware OpenMPI (HPC-X) and compilers
export PATH="$NVHPC_ROOT/comm_libs/mpi/bin:$NVHPC_ROOT/compilers/bin:$PATH"

# Try to locate CUDA libdir under the SDK tree (e.g., .../cuda/12.6/lib64)
CUDA_LIBDIR="$(ls -d "$NVHPC_ROOT"/cuda/*/lib64 2>/dev/null | head -n1 || true)"

export LD_LIBRARY_PATH="$NVHPC_ROOT/comm_libs/mpi/lib${LD_LIBRARY_PATH+:$LD_LIBRARY_PATH}"
export LD_LIBRARY_PATH="$NVHPC_ROOT/compilers/lib:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="$NVHPC_ROOT/comm_libs/nccl/lib:$LD_LIBRARY_PATH"
export LD_LIBRARY_PATH="$NVHPC_ROOT/compilers/extras/qd/lib:$LD_LIBRARY_PATH"
if [[ -n "${CUDA_LIBDIR:-}" ]]; then
  export LD_LIBRARY_PATH="$CUDA_LIBDIR:$LD_LIBRARY_PATH"
fi

# Ensure MPI wrappers use NVHPC compilers
export OMPI_CC=nvc
export OMPI_CXX=nvc++
export OMPI_FC=nvfortran

# Allow mpirun inside container
export OMPI_ALLOW_RUN_AS_ROOT=1
export OMPI_ALLOW_RUN_AS_ROOT_CONFIRM=1

cd /opt/vasp.6.3.0

echo "=== Using MPI wrapper ==="
which mpif90 || true
mpif90 --version || true

echo "=== Clean & build all VASP versions ==="
rm -rf build bin/vasp_*

# Build all three main versions
echo "=== Building vasp_std (standard collinear version) ==="
make std -j"$(nproc)" 2>&1 | tee build_std.log

echo "=== Building vasp_gam (gamma-point only version) ==="
make gam -j"$(nproc)" 2>&1 | tee build_gam.log

echo "=== Building vasp_ncl (noncollinear version - required for MAE) ==="
make ncl -j"$(nproc)" 2>&1 | tee build_ncl.log

# Copy all to /opt/vasp/bin
mkdir -p /opt/vasp/bin
for vasp_exe in vasp_std vasp_gam vasp_ncl; do
    if [ -f "bin/$vasp_exe" ]; then
        cp "bin/$vasp_exe" "/opt/vasp/bin/"
        strip "/opt/vasp/bin/$vasp_exe" || true
        chmod +x "/opt/vasp/bin/$vasp_exe"
        echo "=== $vasp_exe details ==="
        ls -lh "/opt/vasp/bin/$vasp_exe"
        file "/opt/vasp/bin/$vasp_exe"
    else
        echo "Warning: bin/$vasp_exe not found"
    fi
done

echo ""
echo "=== All VASP binaries ==="
ls -lh /opt/vasp/bin/

echo ""
echo "=== ldd for vasp_ncl (MAE version) ==="
ldd /opt/vasp/bin/vasp_ncl | head -40

Note that this script builds three different binaries, vasp_std, vasp_ncl, and vasp_gam. For many use cases, you likely only need vasp_std. If so, I'd recommend removing the make commands for the others as it makes compile time quite long. Expect to wait ~30 minutes to an hour for compilation regardless.

And that's all there is. We have our main Python Modal script pulling in the VASP source, our Makefile, and a finally build script to bring it all together.

Good luck with it! This enables some really cool use cases in building first-principles simulations apps that you can share with others and let them use them on their own structures/data. If you make something cool, share it here as an API! Modal already makes that very easy with their FastAPI integration. Happy building.

Loading comments...
12 views

On this page

  • Compiling VASP in Modal with GPU acceleration
Loading compatible actions...

This post explains how to run VASP with GPU acceleration inside Modal. It uses VASP version 6.3.0 and should work for other 6.x.x builds. The idea is to create a Modal Image that has an OpenACC-enabled GPU workflow, based on NVIDIA’s HPC SDK. The result is a self-contained image that can run GPU-accelerated VASP calculations in a serverless Modal environment.