How autojob structures directories 💭¶
This document summarizes the requirements for successfully using the CLI command autojob advance
Legacy vs. Normal Mode¶
Legacy mode describes a set of differences in the data model, directory
structure, and metadata in autojob. These differences derive from the first
use-case: a VASP relaxation calculation of an adsorbate complex followed by a
vibrational calculation to determine the Gibbs free energy.
Legacy Mode¶
In legacy mode, directory trees must adhere to the first structure outlined in setup.
Task (Job) Metadata files are JSON files which can be parsed into dictionaries with the following keys:
"Name""Notes""Study Group ID""Study ID""Calculation ID""Job ID": The unique ID for the job step - corresponds to an atomic compute job. Must be a 10-digit alphanumeric string beginning with “j”"Study Type""Calculation Type""Calculator Type"
Such files are produced when calling Task.to_directory(..., legacy_mode=True, ...).
The legacy directory structure is as follows:
Study Group
study_group.jsonStudy
parameterizations.jsonrecord.txtstudy.jsonworkflow.jsonCalculation
calculation.jsonJob
job.json
Here, a “Job” is the atomic unit and corresponds roughly to a job submission
in a job scheduler. A “Calculation” is then the ultimate goal of the initial
submission (e.g., structure relaxation, adsorption calculation, etc.). The
motivation for this structure stems from the early use of autojob as solely a
way of parametrizing VASP calculations. As such, since calculations would
frequently require restarting-whether due to time limit constraints, VASP
errors, or parametrization optimization-grouping each “Job”
under a “Calculation” made it easy to navigate to previous iterations of the
“Calculation”, restart jobs, and retrace one’s steps. The calculation.json
file served as a record of all jobs.
For other computational chemistry tasks, however, (e.g., slab generation, adsorbate placement, etc.), the additional nested structure adds unnecessary complexity. Furthermore, the linking of jobs can be achieved by simply retaining a unique identifier analogous to the calculation ID that links jobs. These factors led to the conception of the second supported directory structure.
Normal Mode¶
Task (Job) Metadata files are JSON files which can be parsed into dictionaries with the following keys:
"Name""Notes""Study Group ID""Study ID""Task ID": The unique ID for the job step - corresponds to an atomic compute job. Must be a valid UUID"@class""module"
Such files are produced when calling Task.to_directory(..., legacy_mode=False, ...).
Note that the following fields are missing in comparison to legacy mode:
"Study Type""Calculation Type""Calculator Type""Calculation ID"
The modern directory structure is as follows:
Study Group
study_group.jsonStudy
parameterizations.jsonrecord.txtstudy.jsonworkflow.jsonTask
task.json
In the future, the top-level directory may be foregone such that the final supported directory structure may take the form:
Study
parameterizations.jsonrecord.txtstudy.jsonworkflow.jsonTask
task.json
Data Files¶
Study Files¶
parameterizations.jsona dictionary mapping a workflow step ID to a
Steprecord.txta text file in which each line lists a task ID of a completed task
study.jsoncontains metadata about the study (e.g., Study Group ID, Study ID, Study Type, Date Created) and a list of calculation IDs (or task IDs) of calculations (or tasks) belonging to the Study
workflow.jsona dictionary mapping a workflow step ID to a list of workflow step IDs; a directed acyclic graph representing the study’s workflow
Task Files¶
run.pya python script containing the logic for executing the task; can be set with the
--python-scriptCLI argument forautojobor theAUTOJOB_PYTHON_SCRIPTenvironment variable
Requirements:
The structure of the script must match that of the template file
Minimal, notable features of the template file:
ASE calculator imported
structure file read using ase.io.read
the ASE calculator configuration format
vasp.sha Bash script containing the logic from running the computing job; can be set with the
--slurm-scriptCLI argument forautojobor theAUTOJOB_SLURM_SCRIPTenvironment variable
Requirements:
The structure of the script must match that of the template file
Minimal, notable features of the template file:
SLURM configuration directives
files to delete
files to copy
job.jsona JSON-serialized dictionary of task metadata; can be set with the
--job-fileCLI argument forautojobor theAUTOJOB_JOB_FILEenvironment variable; Generally, this should not be directly edited but should be modified indirectly when a new job is created by any one of the utility functions (e.g.,advance(),restart_relaxation(),create_vibration())task.jsona JSON-serialized dictionary summarizing a completed task
Legacy Files¶
calculation.jsona JSON-serialized dictionary of calculation metadata; can be set with the
--calculation-fileCLI argument forautojobor theAUTOJOB_CALCULATION_FILEenvironment variable; Generally, this should not be directly edited but should be modified indirectly when a new job is created by any one of the utility functions (e.g.,advance(),restart_relaxation(),create_vibration())