How atomic compute jobs are represented and organized

The Task class that autojob defines represents an atomic compute job. A task has inputs and outputs as well as metadata which are stored as Task attributes. Importantly, autojob also defines two import subclasses, Action (WIP) and Calculation. An Action is a procedure that does not require heavy computational resources. Examples of procedures that may be represented as actions are defected structure generation, adsorbate placement, or slab creation. Alternatively, a Calculation does require heavier computational resources and may include submission to a job scheduler. Like the name suggests, examples of Calculation instances include various calculations (e.g., DFT, CCSD, QM/MM).

More facts about using Tasks:

  • In addition to task inputs/outputs and metadata, Calculation instances also have calculation inputs and outputs defined.

  • Task inputs can be written to a directory with the Task.to_directory() method.

  • Tasks results can be retrieved with the Task.from_directory() method.

  • Studies can be constructed from several Task instances.

  • Study groups can be constructed from several Study instances.

  • You can write input directories for all tasks of a study group using the StudyGroup.to_directory() method.

So, for example, you could retrieve the results of a calculation directory,

from autojob.calculation.calculation import Calculation

task = Calculation.from_directory(Path())

and then use the output structure from the task to generate a new set of tasks to submit.

from pathlib import Path

from shortuuid import uuid

from autojob.study import Study
from autojob.study_group import StudyGroup

functionals = ["PBEPBE", "PBE1PBE", "wB97xD", "B3LYP"]

calculations = []

study_group_id = "g" + uuid()[:9]
study_id = "s" + uuid()[:9]

# modify parameters and metadata
for functional in functionals:
    new_calc = task.copy()

    # copies output atoms to input atoms
    new_calc.prepare_inputs_atoms()
    new_calc.calculation_inputs.parameters["xc"] = functional
    new_calc.task_metadata.study_group_id = study_group_id
    new_calc.task_metadata.study_id = study_id
    new_calc.task_metadata.calculation_id = "c" + uuid()[:9]
    new_calc.task_metadata.task_id = "j" + uuid()[:9]
    atoms = calc.task_inputs.atoms
    assert atoms is not None  # noqa: S101
    structure = atoms.info["structure"]
    calculation_id = calc.task_metadata.calculation_id
    new_calc.scheduler_inputs.job_name = (
        f"{structure}-{calculation_id}"
    )
    calculations.append(new_calc)

# create study
study = Study(
    name="Study",
    tasks=calculations,
    study_id=study_id,
)

# create studiy group
study_group = StudyGroup(
    name="Study Group",
    studies=[study],
    study_group_id=study_group_id,
)

# Write input directories
study_group_dir = Path(study_group_id)
study_group_dir.mkdir()
study_group.to_directory(Path())

Useful Definitions

Study

A study is a collection of workflows.

Workflow

A workflow is a directed acyclic graph of actions and tasks.

Action

An action is a locally run step in a workflow such as determining all non-equivalent adsorption sites on a metal surface or permuting a defect within a structure.

Calculation

A calculation is an atomic compute job that may be submitted to a scheduler. Calculations often require parallelization and submission to a workload manager such as Slurm. Examples of tasks include single-point calculations, relaxation calculations, and ab-initio molecular dynamics calculations.

Task

A task is essentially the intersection of an action and a calculation. A task can be thought of as a general step in a workflow.