autojob.harvest package¶

Harvest the results of a completed task.

Submodules¶

autojob.harvest.archive module¶

I/O functions for the utility scripts (compiled here for consistency).

autojob.harvest.archive.archive(filename: str, archive_mode: Literal['csv', 'json', 'both'], harvested: list[Calculation], *, dest: Path | None = None) None[source]¶

Archive completed calculations with the given format.

Parameters:
  • filename – The filename with which to archive the calculations.

  • archive_mode – The format with which to archive the calculations. Must be one of "csv", "json", or "both".

  • harvested – The list of calculations to archive.

  • dest – The directory in which to save the archives.

autojob.harvest.archive.archive_csv(calculations: list[Calculation], dest: Path | None = None) None[source]¶

Archive a list of calculations in CSV format.

Parameters:
  • calculations – A list of calculations to archive.

  • dest – The filename to use archive the calculation. Defaults to "database_<TIME_STAMP>.csv" where TIME_STAMP is the current time in ISO format.

autojob.harvest.archive.archive_json(calculations: list[Calculation], dest: Path | None = None) None[source]¶

Archive a list of calculations in JSON format.

Parameters:
  • calculations – A list of calculations to archive.

  • dest – The filename to use archive the calculation. Defaults to "database_<TIME_STAMP>.json" where TIME_STAMP is the current time in ISO format.

autojob.harvest.archive.flatten_calculations(calculations: list[Calculation]) list[dict[str, Any]][source]¶

Flatten each calculation into a CSV-friendly format.

Parameters:

calculations – The calculations to flatten.

Returns:

A list of dictionaries mapping calculation fields (e.g., energy, forces, zpe_correction) to their values. The keys of nested dictionaries such as the calculation parameters are also accessible.

autojob.harvest.cli module¶

CLI function for harvesting task results.

autojob.harvest.harvest module¶

Harvest data from the directories of completed calculations.

Example

Harvest the results in the current working directory as vibrational calculations

from pathlib import Path

from autojob.calculation.vibration import Vibration
from autojob.harvest.harvest import harvest

harvest(dir_name=Path.cwd(), strictness="relaxed", preferred=Vibration)

Important

Always verify the units of harvested quantities.

autojob.harvest.harvest.harvest(dir_name: str | Path, *, strictness: Literal['strict', 'relaxed', 'atomic'] | None = None, whitelists: list[str] | list[Path] | None = None, blacklists: list[str] | list[Path] | None = None, preferred: type[Task] | None = None) list[Task][source]¶

Collect all data in subdirectories of the given directory.

Parameters:
  • dir_name – The directory under which to collect data.

  • strictness – How to treat tasks for which errors are thrown during their harvesting. If "strict", all harvesting will abort. If "atomic", only calculations for which errors are not thrown will be harvested. If "relaxed", every attempt to harvest all calculations. The default behaviour is controlled by the value of SETTINGS.STRICT_MODE. If SETTINGS.STRICT_MODE=True, the default behaviour will be that of strictness="strict". Otherwise, the default behaviour will be that of strictness="relaxed".

  • whitelists – A list of strings or paths representing whitelist filenames, where each whitelist points to a list of task IDs that should be harvested. When specified, only tasks with task IDs matching these IDs will be harvested. Defaults to None in which case all tasks are eligible for harvesting.

  • blacklists – A list of strings or paths representing blacklist filenames, where each blacklist points to a list of task IDs that should not be harvested. hen specified, no tasks with task IDs in this list will be harvested. Defaults to None in which case all tasks will be harvested.

  • preferred – A preferred Task type to use to harvest each calculation. Defaults to autojob.task.Task.

Returns:

A list of Task s containing the data within dir_name.

autojob.harvest.mechanism module¶

Associate calculations with a reaction mechanism.

class autojob.harvest.mechanism.ElementaryStep(net_hydrogens: int = 0, net_electrons: int = 0, net_water_lost: int = 0, reference: str | None = None)[source]¶

Bases: NamedTuple

An elementary reaction step relative to a reference.

Variables:
  • net_hydrogens (int) – The net number of hydrogens transferred from the reference to the ElementaryStep. Defaults to 0.

  • net_electrons (int) – The net number of electrons transferred from the reference to the ElementaryStep. Defaults to 0.

  • net_water_lost (int) – The net number of water atoms lost from the reference to the ElementaryStep. Defaults to 0.

  • reference (str | None) – The reference state for the ElementaryStep. Defaults to None. If provided, reference can be used to relate ElementaryS`tep s defined for different references.

Note

net_water_lost is defined opposite to how net_hydrogens and net_electrons are defined so that simple tuple comparisons can be used to order ElementaryStep s according to typical electronchemical reactions. However, this ordering is not absolute as there may be reaction mechanisms for which electron transfer precedes proton transfer.

Create new instance of ElementaryStep(net_hydrogens, net_electrons, net_water_lost, reference)

apply_comp_hydrogen_model(energy_h2: float, energy_h2o: float, *, initial: float, final: float, applied_potential: float = 0.0) float[source]¶

Calculate an energy using the computational hydrogen electrode (CHE).

This method follows the formalism outlined in:

J. K. Nørskov, J. Rossmeisl, A. Logadottir, L. Lindqvist, J. R. Kitchin, T. Bligaard, and H. Jónsson The Journal of Physical Chemistry B 2004 108 (46), 17886-17892 DOI: 10.1021/jp047349j

Parameters:
  • energy_h2 – The energy of gas phase hydrogen to use for the calculation.

  • energy_h2o – The energy of gas phase water to use for the alculation.

  • initial – The energy of the referece state to use for the calculation. This should be the energy of the species identified by autojob.harvest.mechanism.ElementaryStep.reference.

  • final – The energy of the final state to use for the calculation.

  • applied_potential – The applied potential in Volts.

Returns:

The energy under the CHE formalism.

net_electrons: int¶

Alias for field number 1

net_hydrogens: int¶

Alias for field number 0

net_water_lost: int¶

Alias for field number 2

reference: str | None¶

Alias for field number 3

class autojob.harvest.mechanism.MechanisticEntry(elementary_step: ElementaryStep, name: str, energy: float)[source]¶

Bases: NamedTuple

Calculated thermodynamic data for an elementary step.

Variables:
  • elementary_step (autojob.harvest.mechanism.ElementaryStep) – The ElementaryStep with which the MechanisticEntry is associated.

  • name (str) – A string labeling the entry. For example, the catalyst or molecule name.

  • energy (float) – A float representing the calculated energy for the entry.

Create new instance of MechanisticEntry(elementary_step, name, energy)

elementary_step: ElementaryStep¶

Alias for field number 0

energy: float¶

Alias for field number 2

name: str¶

Alias for field number 1

autojob.harvest.mechanism.aggregate_mechanism_data(all_data: list[dict[str, Any]]) list[list[dict[str, Any]]][source]¶

Group all data deemed to belong to the same mechanism entry.

Data is determined to be grouped based on whether any other calculation ids appear in their “Calculation Notes” key.

Parameters:

all_data – A list of dictionaries containing the data. Each dictionary must contain the following keys: - “Calculation Notes” - “Job Notes”

Returns:

A list of lists of dictionaries grouped by mechanism entry.

autojob.harvest.mechanism.build_job_graph(all_data: list[dict[str, Any]]) dict[str, tuple[str | None, str | None]][source]¶

Build directed acyclic graph representing the connectivity of the jobs.

Parameters:

all_data – A list of dictionaries representing job data. Job connectivity is determined based on the “Job Notes” key.

Returns:

A dictionary mapping jobs to their ancestor (job, calculation). If the ancestor of a job cannot be found, its ancestor is (None, None).

autojob.harvest.mechanism.find_ancestor(graph: dict[str, str], data: dict[str, Any], all_data: list[dict[str, Any]]) str[source]¶

Find the ancestor calculation of given calculation.

autojob.harvest.mechanism.find_ancestors(jobs: list[dict[str, Any]], data: dict[str, Any], ancestor_calculation: str, ancestor_job: str) list[dict[str, Any]][source]¶

Find the ancestor calculations of given calculation.

autojob.harvest.patch module¶

Supplement harvested data with data patches.

Oftentimes, you may have additional data that you either

  1. can’t determine a priori (and thus mark the task with it prior to submission), or

  2. extract programatically (these may be analyses that require fuzzy intuition).

but nonetheless want to store with your data. This module defines some simple routines and classes to facilitate the latter use-case.

A Patch is just that, a “patch” - it fills in the gap in data that may exist. To define one, you specify to a feature of the data to which it should be applied and what data should be added when it is applied.

from ase import Atoms
from autojob.harvest.patch import Patch

pch = Patch(match_path=["study_id"],
    match_value="123456789",
    patch_path=["atoms", "positions"]
    patch_value=[0.0, 0.0, 0.0]
)

datapoint1 = {
    "study_id": None,
    "atoms": None
}

atoms = Atoms("C", positions=[[0.0, 1.0, 2.0]])
datapoint2 = {
    "study_id": None,
    "atoms": atoms
}

pch.apply(datapoint1)
print(datapoint1["atoms"])
None

pch.apply(datapoint2)
print(datapoint2["atoms"].positions)
[0.0, 0.0, 0.0]

To what data the Patch will apply is specified by match_path and match_value. While, what will be applied is specified by patch_path and patch_value.

Note

Patch applies to both dictionaries and objects alike!

Example

Apply a set of patches in batch

from autojob.task import Task

tasks = [Task(...), Task(...), ...]
patches = [Patch(..., Patch(...), ...]

for task in tasks:
    for patch in patches:
        patch.apply(task)
class autojob.harvest.patch.Patch(match_path: list[str], match_value: Any, patch_path: list[str], patch_value: Any)[source]¶

Bases: NamedTuple

A data patch.

Variables:
  • match_path (list[str]) – A list of attribute/key names used to identify which attributes are to be patched by the path.

  • match_value (Any) – The value of the attribute/key that must match.

  • patch_path (list[str]) – The value of the attribute/key to be patched.

  • patch_value (Any) – The value of the attribute/key to be set.

Create new instance of Patch(match_path, match_value, patch_path, patch_value)

apply(data: object) None[source]¶

Apply a patch to an object.

Parameters:

data – the data to which the patch will be applied. Note that this method may or may not end up modifying data, but if it does, it will do in place.

match_path: list[str]¶

Alias for field number 0

match_value: Any¶

Alias for field number 1

patch_path: list[str]¶

Alias for field number 2

patch_value: Any¶

Alias for field number 3

autojob.harvest.patch.build_metadata_patches(dir_name: Path, *, metadata_type: Literal['study_group', 'study', 'calculation'] = 'study_group', strict_mode: bool = True, legacy_mode: bool = False) list[Patch][source]¶

Create patches from metadata files.

Parameters:
  • dir_name – The name of the directory under which to search for metadata. Defaults to the current working directory.

  • strict_mode – Whether or not to abort metadata collection if metadata cannot be found. Defaults to SETTINGS.STRICT_MODE.

  • metadata_type – The type of metadata file from which patches are to be built. Must be one of "study_group", "study", "calculation". Defaults to "study_group".

  • legacy_mode – Whether or not to assume the legacy format for the directory. Defaults to False.

Returns:

A list of Patch objects which will add metadata to TaskMetadata.__pydantic_extra__. Further, patch paths are defined such that study group, study, and calculation metadata will be added under the "study_group_metadata", "study_metadata", and "calculation_metadata" keys, respectively.

Example

Patch study group and study metadata for all tasks in a subdirectory.

from pathlib import Path

from autojob.harvest.harvest import harvest
from autojob.harvest.patch import build_metadata_patches
from autojob.harvest.patch import patch_tasks

dir_name = Path().cwd()
tasks = harvest(dir_name)
patches = build_metadata_patches(dir_name)
patch_tasks(patches, tasks)
autojob.harvest.patch.patch_tasks(patches: list[Patch], tasks: list[Task]) None[source]¶

Patch a list of tasks.

This method modifies tasks in place.

Parameters:
  • patches – The patches to apply.

  • tasks – The tasks to which the patches will be applied.

autojob.harvest.structure module¶

Extract structural metadata.

autojob.harvest.structure.load_structural_data(dir_name: str | Path) dict[str, Any][source]¶

Extract the structural data from the input structure of the directory.

Parameters:

dir_name – The directory of the completed calculation.

Returns:

The structural data. The following keys are guaranteed to be present in the returned dictionary:

  • "Structure"

  • "Base Structure": only assigned if extracted from adsorbate complex name

  • "Adsorbate"

  • "Site"

  • "Orientation"

If no data is found, every value will be None.