How autojob structures directories 💭

This document summarizes the requirements for successfully using the CLI command autojob advance

Legacy vs. Normal Mode

Legacy mode describes a set of differences in the data model, directory structure, and metadata in autojob. These differences derive from the first use-case: a VASP relaxation calculation of an adsorbate complex followed by a vibrational calculation to determine the Gibbs free energy.

Legacy Mode

In legacy mode, directory trees must adhere to the first structure outlined in setup.

Task (Job) Metadata files are JSON files which can be parsed into dictionaries with the following keys:

  • "Name"

  • "Notes"

  • "Study Group ID"

  • "Study ID"

  • "Calculation ID"

  • "Job ID": The unique ID for the job step - corresponds to an atomic compute job. Must be a 10-digit alphanumeric string beginning with “j”

  • "Study Type"

  • "Calculation Type"

  • "Calculator Type"

Such files are produced when calling Task.to_directory(..., legacy_mode=True, ...).

The legacy directory structure is as follows:

  • Study Group

    • study_group.json

    • Study

      • parameterizations.json

      • record.txt

      • study.json

      • workflow.json

      • Calculation

        • calculation.json

        • Job

          • job.json

Here, a “Job” is the atomic unit and corresponds roughly to a job submission in a job scheduler. A “Calculation” is then the ultimate goal of the initial submission (e.g., structure relaxation, adsorption calculation, etc.). The motivation for this structure stems from the early use of autojob as solely a way of parametrizing VASP calculations. As such, since calculations would frequently require restarting-whether due to time limit constraints, VASP errors, or parametrization optimization-grouping each “Job” under a “Calculation” made it easy to navigate to previous iterations of the “Calculation”, restart jobs, and retrace one’s steps. The calculation.json file served as a record of all jobs.

For other computational chemistry tasks, however, (e.g., slab generation, adsorbate placement, etc.), the additional nested structure adds unnecessary complexity. Furthermore, the linking of jobs can be achieved by simply retaining a unique identifier analogous to the calculation ID that links jobs. These factors led to the conception of the second supported directory structure.

Normal Mode

Task (Job) Metadata files are JSON files which can be parsed into dictionaries with the following keys:

  • "Name"

  • "Notes"

  • "Study Group ID"

  • "Study ID"

  • "Task ID": The unique ID for the job step - corresponds to an atomic compute job. Must be a valid UUID

  • "@class"

  • "module"

Such files are produced when calling Task.to_directory(..., legacy_mode=False, ...).

Note that the following fields are missing in comparison to legacy mode:

  • "Study Type"

  • "Calculation Type"

  • "Calculator Type"

  • "Calculation ID"

The modern directory structure is as follows:

  • Study Group

    • study_group.json

    • Study

      • parameterizations.json

      • record.txt

      • study.json

      • workflow.json

      • Task

        • task.json

In the future, the top-level directory may be foregone such that the final supported directory structure may take the form:

  • Study

    • parameterizations.json

    • record.txt

    • study.json

    • workflow.json

    • Task

      • task.json

Data Files

Study Files

parameterizations.json

a dictionary mapping a workflow step ID to a Step

record.txt

a text file in which each line lists a task ID of a completed task

study.json

contains metadata about the study (e.g., Study Group ID, Study ID, Study Type, Date Created) and a list of calculation IDs (or task IDs) of calculations (or tasks) belonging to the Study

workflow.json

a dictionary mapping a workflow step ID to a list of workflow step IDs; a directed acyclic graph representing the study’s workflow

Task Files

run.py

a python script containing the logic for executing the task; can be set with the --python-script CLI argument for autojob or the AUTOJOB_PYTHON_SCRIPT environment variable

  • Requirements:

    • The structure of the script must match that of the template file

    • Minimal, notable features of the template file:

      • ASE calculator imported

      • structure file read using ase.io.read

      • the ASE calculator configuration format

vasp.sh

a Bash script containing the logic from running the computing job; can be set with the --slurm-script CLI argument for autojob or the AUTOJOB_SLURM_SCRIPT environment variable

  • Requirements:

    • The structure of the script must match that of the template file

    • Minimal, notable features of the template file:

      • SLURM configuration directives

      • files to delete

      • files to copy

job.json

a JSON-serialized dictionary of task metadata; can be set with the --job-file CLI argument for autojob or the AUTOJOB_JOB_FILE environment variable; Generally, this should not be directly edited but should be modified indirectly when a new job is created by any one of the utility functions (e.g., advance(), restart_relaxation(), create_vibration())

task.json

a JSON-serialized dictionary summarizing a completed task

Legacy Files

calculation.json

a JSON-serialized dictionary of calculation metadata; can be set with the --calculation-file CLI argument for autojob or the AUTOJOB_CALCULATION_FILE environment variable; Generally, this should not be directly edited but should be modified indirectly when a new job is created by any one of the utility functions (e.g., advance(), restart_relaxation(), create_vibration())