autojob.harvest package¶

Harvest the results of a completed task.

Submodules¶

autojob.harvest.archive module¶

I/O functions for the utility scripts (compiled here for consistency).

autojob.harvest.archive.archive(filename: str, archive_mode: Literal['csv', 'json', 'both'], harvested: list[Calculation], *, dest: Path | None = None, time_stamp: bool = False) None[source]¶

Archive completed calculations with the given format.

Parameters:
  • filename – The filename (without extension) with which to archive the calculations.

  • archive_mode – The format with which to archive the calculations. Must be one of "csv", "json", or "both".

  • harvested – The list of calculations to archive.

  • dest – The directory in which to save the archives.

  • time_stamp – Whether or not to time stamp the archive file.

autojob.harvest.archive.archive_csv(calculations: list[Calculation], dest: Path | None = None) None[source]¶

Archive a list of calculations in CSV format.

Parameters:
  • calculations – A list of calculations to archive.

  • dest – The filename to use archive the calculation. Defaults to "database_<TIME_STAMP>.csv" where TIME_STAMP is the current time in ISO format.

autojob.harvest.archive.archive_json(tasks: list[TaskBase], dest: Path | None = None) None[source]¶

Archive a list of calculations in JSON format.

Parameters:
  • tasks – A list of tasks to archive.

  • dest – The filename to use archive the calculation. Defaults to "database_<TIME_STAMP>.json" where TIME_STAMP is the current time in ISO format.

autojob.harvest.archive.flatten_calculations(calculations: list[Calculation]) list[dict[str, Any]][source]¶

Flatten each calculation into a CSV-friendly format.

Parameters:

calculations – The calculations to flatten.

Returns:

A list of dictionaries mapping calculation fields (e.g., energy, forces, zpe_correction) to their values. The keys of nested dictionaries such as the calculation parameters are also accessible.

autojob.harvest.cli module¶

CLI function for harvesting task results.

autojob.harvest.harvest module¶

Harvest data from the directories of completed tasks.

Example

Harvest the results in the current working directory as calculations

from pathlib import Path

from autojob.tasks.calculation import Calculation
from autojob.harvest.harvest import harvest

harvest(dir_name=Path.cwd(), strictness="relaxed", preferred="calculation")

Important

Always verify the units of harvested quantities.

autojob.harvest.harvest.harvest(dir_name: str | Path, *, strictness: Literal['strict', 'relaxed', 'atomic'] | None = None, whitelists: list[str] | list[Path] | None = None, blacklists: list[str] | list[Path] | None = None, preferred: str | None = None, use_cache: bool = False) list[TaskBase][source]¶

Collect all data in subdirectories of the given directory.

Parameters:
  • dir_name – The directory under which to collect data.

  • strictness – How to treat tasks for which errors are thrown during their harvesting. If "strict", all harvesting will abort. If "atomic", only calculations for which errors are not thrown will be harvested. If "relaxed", every attempt to harvest all calculations. The default behaviour is controlled by the value of SETTINGS.STRICT_MODE. If SETTINGS.STRICT_MODE=True, the default behaviour will be that of strictness="strict". Otherwise, the default behaviour will be that of strictness="relaxed".

  • whitelists – A list of strings or paths representing whitelist filenames, where each whitelist points to a list of task IDs that should be harvested. When specified, only tasks with task IDs matching these IDs will be harvested. Defaults to None in which case all tasks are eligible for harvesting.

  • blacklists – A list of strings or paths representing blacklist filenames, where each blacklist points to a list of task IDs that should not be harvested. hen specified, no tasks with task IDs in this list will be harvested. Defaults to None in which case all tasks will be harvested.

  • preferred – The name of the preferred TaskBase type to use to harvest each calculation. Defaults to SETTINGS.DEFAULT_TASK.

  • use_cache – Whether or not to use cached results. If False, then cached results will be ignored and overwritten. Otherwise, outputs will be read from an existing cache.

Returns:

A list of TaskBase s containing the data within dir_name.

autojob.harvest.mechanism module¶

Associate calculations with a reaction mechanism.

class autojob.harvest.mechanism.ElectrochemicalState(net_hydrogens: int = 0, net_electrons: int = 0, net_water_lost: int = 0, reference: str | None = None, stoich: float = 1.0)[source]¶

Bases: NamedTuple

An electrochemical state along reaction relative to a reference.

Variables:
  • net_hydrogens (int) – The net number of hydrogens transferred from the reference to the ElectrochemicalState. Defaults to 0.

  • net_electrons (int) – The net number of electrons transferred from the reference to the ElectrochemicalState. Defaults to 0.

  • net_water_lost (int) – The net number of water atoms lost from the reference to the ElectrochemicalState. Defaults to 0.

  • reference (str | None) – The reference state for the ElectrochemicalState. Defaults to None. If provided, reference can be used to relate ElectrochemicalState s defined for different references.

  • stoich (float) – The absolute value of the stoichiometric coefficient for the reference state. Defaults to 1.0.

Note

net_water_lost is defined opposite to how net_hydrogens and net_electrons are defined so that simple tuple comparisons can be used to order ElementaryStep s according to typical electrochemical reactions. However, this ordering is not absolute as there may be reaction mechanisms for which electron transfer precedes proton transfer.

Create new instance of ElectrochemicalState(net_hydrogens, net_electrons, net_water_lost, reference, stoich)

apply_comp_hydrogen_model(energy_h2: float, energy_h2o: float, *, initial: float, final: float, applied_potential: float = 0.0) float[source]¶

Calculate an energy using the computational hydrogen electrode (CHE).

This method follows the formalism outlined in:

J. K. Nørskov, J. Rossmeisl, A. Logadottir, L. Lindqvist, J. R. Kitchin, T. Bligaard, and H. Jónsson The Journal of Physical Chemistry B 2004 108 (46), 17886-17892 DOI: 10.1021/jp047349j

Parameters:
  • energy_h2 – The energy of gas phase hydrogen to use for the calculation.

  • energy_h2o – The energy of gas phase water to use for the calculation.

  • initial – The energy of the reference state to use for the elementary step. For catalyzed reactions, this could include the catalyst energy.

  • final – The energy of the final state of the elementary step.

  • applied_potential – The applied potential in Volts.

Returns:

The free energy of the species resulting from the given elementary step under the CHE formalism.

net_electrons: int¶

Alias for field number 1

net_hydrogens: int¶

Alias for field number 0

net_water_lost: int¶

Alias for field number 2

reference: str | None¶

Alias for field number 3

stoich: float¶

Alias for field number 4

class autojob.harvest.mechanism.MechanisticEntry(elementary_step: ElectrochemicalState, name: str, energy: float)[source]¶

Bases: NamedTuple

Calculated thermodynamic data for an elementary step.

Variables:
  • elementary_step (autojob.harvest.mechanism.ElectrochemicalState) – The ElementaryStep with which the MechanisticEntry is associated.

  • name (str) – A string labeling the entry. For example, the catalyst or molecule name.

  • energy (float) – A float representing the calculated energy for the entry.

Create new instance of MechanisticEntry(elementary_step, name, energy)

elementary_step: ElectrochemicalState¶

Alias for field number 0

energy: float¶

Alias for field number 2

name: str¶

Alias for field number 1

autojob.harvest.mechanism.aggregate_mechanism_data(all_data: list[dict[str, Any]]) list[list[dict[str, Any]]][source]¶

Group all data deemed to belong to the same mechanism entry.

Data is determined to be grouped based on whether any other calculation ids appear in their “Calculation Notes” key.

Parameters:

all_data – A list of dictionaries containing the data. Each dictionary must contain the following keys: - “Calculation Notes” - “Job Notes”

Returns:

A list of lists of dictionaries grouped by mechanism entry.

autojob.harvest.mechanism.build_job_graph(all_data: list[dict[str, Any]]) dict[str, tuple[str | None, str | None]][source]¶

Build directed acyclic graph representing the connectivity of the jobs.

Parameters:

all_data – A list of dictionaries representing job data. Job connectivity is determined based on the “Job Notes” key.

Returns:

A dictionary mapping jobs to their ancestor (job, calculation). If the ancestor of a job cannot be found, its ancestor is (None, None).

autojob.harvest.mechanism.find_ancestor(graph: dict[str, str], data: dict[str, Any], all_data: list[dict[str, Any]]) str[source]¶

Find the ancestor calculation of given calculation.

autojob.harvest.mechanism.find_ancestors(jobs: list[dict[str, Any]], data: dict[str, Any], ancestor_calculation: str, ancestor_job: str) list[dict[str, Any]][source]¶

Find the ancestor calculations of given calculation.

autojob.harvest.patch module¶

Supplement harvested data with data patches.

Oftentimes, you may have additional data that you either

  1. can’t determine a priori (and thus mark the task with it prior to submission), or

  2. extract programatically (these may be analyses that require fuzzy intuition).

but nonetheless want to store with your data. This module defines some simple routines and classes to facilitate the latter use-case.

A Patch is just that, a “patch” - it fills in the gap in data that may exist. To define one, you specify to a feature of the data to which it should be applied and what data should be added when it is applied.

from ase import Atoms
from autojob.harvest.patch import Patch

pch = Patch(match_path=["study_id"],
    match_value="123456789",
    patch_path=["atoms", "positions"]
    patch_value=[0.0, 0.0, 0.0]
)

datapoint1 = {
    "study_id": None,
    "atoms": None
}

atoms = Atoms("C", positions=[[0.0, 1.0, 2.0]])
datapoint2 = {
    "study_id": None,
    "atoms": atoms
}

pch.apply(datapoint1)
print(datapoint1["atoms"])
None

pch.apply(datapoint2)
print(datapoint2["atoms"].positions)
[0.0, 0.0, 0.0]

To what data the Patch will apply is specified by match_path and match_value. While, what will be applied is specified by patch_path and patch_value.

Note

Patch applies to both dictionaries and objects alike!

Example

Apply a set of patches in batch

from autojob.task import Task

tasks = [Task(...), Task(...), ...]
patches = [Patch(..., Patch(...), ...]

for task in tasks:
    for patch in patches:
        patch.apply(task)
class autojob.harvest.patch.Patch(match_path: list[str], match_value: Any, patch_path: list[str], patch_value: Any)[source]¶

Bases: NamedTuple

A data patch.

Variables:
  • match_path (list[str]) – A list of attribute/key names used to identify which attributes are to be patched by the path.

  • match_value (Any) – The value of the attribute/key that must match.

  • patch_path (list[str]) – The value of the attribute/key to be patched.

  • patch_value (Any) – The value of the attribute/key to be set.

Create new instance of Patch(match_path, match_value, patch_path, patch_value)

apply(data: object) None[source]¶

Apply a patch to an object.

Parameters:

data – the data to which the patch will be applied. Note that this method may or may not end up modifying data, but if it does, it will do in place.

match_path: list[str]¶

Alias for field number 0

match_value: Any¶

Alias for field number 1

patch_path: list[str]¶

Alias for field number 2

patch_value: Any¶

Alias for field number 3

autojob.harvest.patch.build_metadata_patches(dir_name: Path, *, metadata_type: Literal['study_group', 'study', 'task_group'] = 'study_group', strict_mode: bool | None = None) list[Patch][source]¶

Create patches from metadata files.

Parameters:
  • dir_name – The name of the directory under which to search for metadata. Defaults to the current working directory.

  • strict_mode – Whether or not to abort metadata collection if metadata cannot be found. Defaults to SETTINGS.STRICT_MODE.

  • metadata_type – The type of metadata file from which patches are to be built. Must be one of "study_group", "study", "calculation". Defaults to "study_group".

Returns:

A list of Patch objects which will add metadata to TaskMetadata.__pydantic_extra__. Further, patch paths are defined such that study group, study, and calculation metadata will be added under the "study_group_metadata", "study_metadata", and "calculation_metadata" keys, respectively.

Example

Patch study group and study metadata for all tasks in a subdirectory.

from pathlib import Path

from autojob.harvest.harvest import harvest
from autojob.harvest.patch import build_metadata_patches
from autojob.harvest.patch import patch_tasks

dir_name = Path().cwd()
tasks = harvest(dir_name)
patches = build_metadata_patches(dir_name)
patch_tasks(patches, tasks)
autojob.harvest.patch.patch_tasks(patches: list[Patch], tasks: list[TaskBase]) None[source]¶

Patch a list of tasks.

This method modifies tasks in place.

Parameters:
  • patches – The patches to apply.

  • tasks – The tasks to which the patches will be applied.