How `autojob` structures directories 💭¶

This document summarizes the requirements for successfully using the CLI command autojob advance

Legacy vs. Normal Mode¶

Legacy mode describes a set of differences in the data model, directory structure, and metadata in autojob. These differences derive from the first use-case: a VASP relaxation calculation of an adsorbate complex followed by a vibrational calculation to determine the Gibbs free energy.

Legacy Mode¶

In legacy mode, directory trees must adhere to the first structure outlined in setup.

Task (Job) Metadata files are JSON files which can be parsed into dictionaries with the following keys:

"Name"
"Notes"
"Study Group ID"
"Study ID"
"Calculation ID"
"Job ID": The unique ID for the job step - corresponds to an atomic compute job. Must be a 10-digit alphanumeric string beginning with “j”
"Study Type"
"Calculation Type"
"Calculator Type"

Such files are produced when calling Task.to_directory(..., legacy_mode=True, ...).

The legacy directory structure is as follows:

Study Group
- study_group.json
- Study
  - parameterizations.json
  - record.txt
  - study.json
  - workflow.json
  - Calculation
    - calculation.json
    - Job
      - job.json

Here, a “Job” is the atomic unit and corresponds roughly to a job submission in a job scheduler. A “Calculation” is then the ultimate goal of the initial submission (e.g., structure relaxation, adsorption calculation, etc.). The motivation for this structure stems from the early use of autojob as solely a way of parametrizing VASP calculations. As such, since calculations would frequently require restarting-whether due to time limit constraints, VASP errors, or parametrization optimization-grouping each “Job” under a “Calculation” made it easy to navigate to previous iterations of the “Calculation”, restart jobs, and retrace one’s steps. The calculation.json file served as a record of all jobs.

For other computational chemistry tasks, however, (e.g., slab generation, adsorbate placement, etc.), the additional nested structure adds unnecessary complexity. Furthermore, the linking of jobs can be achieved by simply retaining a unique identifier analogous to the calculation ID that links jobs. These factors led to the conception of the second supported directory structure.

Normal Mode¶

Task (Job) Metadata files are JSON files which can be parsed into dictionaries with the following keys:

"Name"
"Notes"
"Study Group ID"
"Study ID"
"Task ID": The unique ID for the job step - corresponds to an atomic compute job. Must be a valid UUID
"@class"
"module"

Such files are produced when calling Task.to_directory(..., legacy_mode=False, ...).

Note that the following fields are missing in comparison to legacy mode:

"Study Type"
"Calculation Type"
"Calculator Type"
"Calculation ID"

The modern directory structure is as follows:

Study Group
- study_group.json
- Study
  - parameterizations.json
  - record.txt
  - study.json
  - workflow.json
  - Task
    - task.json

In the future, the top-level directory may be foregone such that the final supported directory structure may take the form:

Study
- parameterizations.json
- record.txt
- study.json
- workflow.json
- Task
  - task.json

Data Files¶

Study Files¶

parameterizations.json: a dictionary mapping a workflow step ID to a Step
record.txt: a text file in which each line lists a task ID of a completed task
study.json: contains metadata about the study (e.g., Study Group ID, Study ID, Study Type, Date Created) and a list of calculation IDs (or task IDs) of calculations (or tasks) belonging to the Study
workflow.json: a dictionary mapping a workflow step ID to a list of workflow step IDs; a directed acyclic graph representing the study’s workflow

Task Files¶

run.py: a python script containing the logic for executing the task; can be set with the --python-script CLI argument for autojob or the AUTOJOB_PYTHON_SCRIPT environment variable

Requirements:
- The structure of the script must match that of the template file
- Minimal, notable features of the template file:
  - ASE calculator imported
  - structure file read using ase.io.read
  - the ASE calculator configuration format

vasp.sh: a Bash script containing the logic from running the computing job; can be set with the --slurm-script CLI argument for autojob or the AUTOJOB_SLURM_SCRIPT environment variable

Requirements:
- The structure of the script must match that of the template file
- Minimal, notable features of the template file:
  - SLURM configuration directives
  - files to delete
  - files to copy

job.json: a JSON-serialized dictionary of task metadata; can be set with the --job-file CLI argument for autojob or the AUTOJOB_JOB_FILE environment variable; Generally, this should not be directly edited but should be modified indirectly when a new job is created by any one of the utility functions (e.g., advance(), restart_relaxation(), create_vibration())
task.json: a JSON-serialized dictionary summarizing a completed task

Legacy Files¶

calculation.json: a JSON-serialized dictionary of calculation metadata; can be set with the --calculation-file CLI argument for autojob or the AUTOJOB_CALCULATION_FILE environment variable; Generally, this should not be directly edited but should be modified indirectly when a new job is created by any one of the utility functions (e.g., advance(), restart_relaxation(), create_vibration())

How autojob structures directories 💭¶