autojob.harvest package¶
Harvest the results of a completed task.
Submodules¶
autojob.harvest.archive module¶
I/O functions for the utility scripts (compiled here for consistency).
- autojob.harvest.archive.archive(filename: str, archive_mode: Literal['csv', 'json', 'both'], harvested: list[Calculation], *, dest: Path | None = None) None[source]¶
Archive completed calculations with the given format.
- Parameters:
filename – The filename with which to archive the calculations.
archive_mode – The format with which to archive the calculations. Must be one of
"csv","json", or"both".harvested – The list of calculations to archive.
dest – The directory in which to save the archives.
- autojob.harvest.archive.archive_csv(calculations: list[Calculation], dest: Path | None = None) None[source]¶
Archive a list of calculations in CSV format.
- Parameters:
calculations – A list of calculations to archive.
dest – The filename to use archive the calculation. Defaults to
"database_<TIME_STAMP>.csv"whereTIME_STAMPis the current time in ISO format.
- autojob.harvest.archive.archive_json(calculations: list[Calculation], dest: Path | None = None) None[source]¶
Archive a list of calculations in JSON format.
- Parameters:
calculations – A list of calculations to archive.
dest – The filename to use archive the calculation. Defaults to
"database_<TIME_STAMP>.json"whereTIME_STAMPis the current time in ISO format.
- autojob.harvest.archive.flatten_calculations(calculations: list[Calculation]) list[dict[str, Any]][source]¶
Flatten each calculation into a CSV-friendly format.
- Parameters:
calculations – The calculations to flatten.
- Returns:
A list of dictionaries mapping calculation fields (e.g.,
energy,forces,zpe_correction) to their values. The keys of nested dictionaries such as the calculation parameters are also accessible.
autojob.harvest.cli module¶
CLI function for harvesting task results.
autojob.harvest.harvest module¶
Harvest data from the directories of completed calculations.
Example
Harvest the results in the current working directory as vibrational calculations
from pathlib import Path
from autojob.calculation.vibration import Vibration
from autojob.harvest.harvest import harvest
harvest(dir_name=Path.cwd(), strictness="relaxed", preferred=Vibration)
Important
Always verify the units of harvested quantities.
- autojob.harvest.harvest.harvest(dir_name: str | Path, *, strictness: Literal['strict', 'relaxed', 'atomic'] | None = None, whitelists: list[str] | list[Path] | None = None, blacklists: list[str] | list[Path] | None = None, preferred: type[Task] | None = None) list[Task][source]¶
Collect all data in subdirectories of the given directory.
- Parameters:
dir_name – The directory under which to collect data.
strictness – How to treat tasks for which errors are thrown during their harvesting. If
"strict", all harvesting will abort. If"atomic", only calculations for which errors are not thrown will be harvested. If"relaxed", every attempt to harvest all calculations. The default behaviour is controlled by the value ofSETTINGS.STRICT_MODE. IfSETTINGS.STRICT_MODE=True, the default behaviour will be that ofstrictness="strict". Otherwise, the default behaviour will be that ofstrictness="relaxed".whitelists – A list of strings or paths representing whitelist filenames, where each whitelist points to a list of task IDs that should be harvested. When specified, only tasks with task IDs matching these IDs will be harvested. Defaults to None in which case all tasks are eligible for harvesting.
blacklists – A list of strings or paths representing blacklist filenames, where each blacklist points to a list of task IDs that should not be harvested. hen specified, no tasks with task IDs in this list will be harvested. Defaults to None in which case all tasks will be harvested.
preferred – A preferred Task type to use to harvest each calculation. Defaults to
autojob.task.Task.
- Returns:
A list of
Tasks containing the data withindir_name.
autojob.harvest.mechanism module¶
Associate calculations with a reaction mechanism.
- class autojob.harvest.mechanism.ElementaryStep(net_hydrogens: int = 0, net_electrons: int = 0, net_water_lost: int = 0, reference: str | None = None)[source]¶
Bases:
NamedTupleAn elementary reaction step relative to a reference.
- Variables:
net_hydrogens (int) – The net number of hydrogens transferred from the reference to the
ElementaryStep. Defaults to 0.net_electrons (int) – The net number of electrons transferred from the reference to the
ElementaryStep. Defaults to 0.net_water_lost (int) – The net number of water atoms lost from the reference to the
ElementaryStep. Defaults to 0.reference (str | None) – The reference state for the
ElementaryStep. Defaults to None. If provided, reference can be used to relate ElementaryS`tep s defined for different references.
Note
net_water_lostis defined opposite to hownet_hydrogensandnet_electronsare defined so that simple tuple comparisons can be used to orderElementarySteps according to typical electronchemical reactions. However, this ordering is not absolute as there may be reaction mechanisms for which electron transfer precedes proton transfer.Create new instance of ElementaryStep(net_hydrogens, net_electrons, net_water_lost, reference)
- apply_comp_hydrogen_model(energy_h2: float, energy_h2o: float, *, initial: float, final: float, applied_potential: float = 0.0) float[source]¶
Calculate an energy using the computational hydrogen electrode (CHE).
This method follows the formalism outlined in:
J. K. Nørskov, J. Rossmeisl, A. Logadottir, L. Lindqvist, J. R. Kitchin, T. Bligaard, and H. Jónsson The Journal of Physical Chemistry B 2004 108 (46), 17886-17892 DOI: 10.1021/jp047349j
- Parameters:
energy_h2 – The energy of gas phase hydrogen to use for the calculation.
energy_h2o – The energy of gas phase water to use for the alculation.
initial – The energy of the referece state to use for the calculation. This should be the energy of the species identified by
autojob.harvest.mechanism.ElementaryStep.reference.final – The energy of the final state to use for the calculation.
applied_potential – The applied potential in Volts.
- Returns:
The energy under the CHE formalism.
- class autojob.harvest.mechanism.MechanisticEntry(elementary_step: ElementaryStep, name: str, energy: float)[source]¶
Bases:
NamedTupleCalculated thermodynamic data for an elementary step.
- Variables:
elementary_step (autojob.harvest.mechanism.ElementaryStep) – The
ElementaryStepwith which theMechanisticEntryis associated.name (str) – A string labeling the entry. For example, the catalyst or molecule name.
energy (float) – A float representing the calculated energy for the entry.
Create new instance of MechanisticEntry(elementary_step, name, energy)
- elementary_step: ElementaryStep¶
Alias for field number 0
- autojob.harvest.mechanism.aggregate_mechanism_data(all_data: list[dict[str, Any]]) list[list[dict[str, Any]]][source]¶
Group all data deemed to belong to the same mechanism entry.
Data is determined to be grouped based on whether any other calculation ids appear in their “Calculation Notes” key.
- Parameters:
all_data – A list of dictionaries containing the data. Each dictionary must contain the following keys: - “Calculation Notes” - “Job Notes”
- Returns:
A list of lists of dictionaries grouped by mechanism entry.
- autojob.harvest.mechanism.build_job_graph(all_data: list[dict[str, Any]]) dict[str, tuple[str | None, str | None]][source]¶
Build directed acyclic graph representing the connectivity of the jobs.
- Parameters:
all_data – A list of dictionaries representing job data. Job connectivity is determined based on the “Job Notes” key.
- Returns:
A dictionary mapping jobs to their ancestor (
job,calculation). If the ancestor of a job cannot be found, its ancestor is (None, None).
autojob.harvest.patch module¶
Supplement harvested data with data patches.
Oftentimes, you may have additional data that you either
can’t determine a priori (and thus mark the task with it prior to submission), or
extract programatically (these may be analyses that require fuzzy intuition).
but nonetheless want to store with your data. This module defines some simple routines and classes to facilitate the latter use-case.
A Patch is just that, a “patch” - it fills in the gap in data
that may exist. To define one, you specify to a feature of the data
to which it should be applied and what data should be added when it is
applied.
from ase import Atoms
from autojob.harvest.patch import Patch
pch = Patch(match_path=["study_id"],
match_value="123456789",
patch_path=["atoms", "positions"]
patch_value=[0.0, 0.0, 0.0]
)
datapoint1 = {
"study_id": None,
"atoms": None
}
atoms = Atoms("C", positions=[[0.0, 1.0, 2.0]])
datapoint2 = {
"study_id": None,
"atoms": atoms
}
pch.apply(datapoint1)
print(datapoint1["atoms"])
None
pch.apply(datapoint2)
print(datapoint2["atoms"].positions)
[0.0, 0.0, 0.0]
To what data the Patch will apply is specified by match_path and
match_value. While, what will be applied is specified by patch_path and
patch_value.
Note
Patch applies to both dictionaries and objects alike!
Example
Apply a set of patches in batch
from autojob.task import Task
tasks = [Task(...), Task(...), ...]
patches = [Patch(..., Patch(...), ...]
for task in tasks:
for patch in patches:
patch.apply(task)
- class autojob.harvest.patch.Patch(match_path: list[str], match_value: Any, patch_path: list[str], patch_value: Any)[source]¶
Bases:
NamedTupleA data patch.
- Variables:
match_path (list[str]) – A list of attribute/key names used to identify which attributes are to be patched by the path.
match_value (Any) – The value of the attribute/key that must match.
patch_path (list[str]) – The value of the attribute/key to be patched.
patch_value (Any) – The value of the attribute/key to be set.
Create new instance of Patch(match_path, match_value, patch_path, patch_value)
- autojob.harvest.patch.build_metadata_patches(dir_name: Path, *, metadata_type: Literal['study_group', 'study', 'calculation'] = 'study_group', strict_mode: bool = True, legacy_mode: bool = False) list[Patch][source]¶
Create patches from metadata files.
- Parameters:
dir_name – The name of the directory under which to search for metadata. Defaults to the current working directory.
strict_mode – Whether or not to abort metadata collection if metadata cannot be found. Defaults to
SETTINGS.STRICT_MODE.metadata_type – The type of metadata file from which patches are to be built. Must be one of
"study_group","study","calculation". Defaults to"study_group".legacy_mode – Whether or not to assume the legacy format for the directory. Defaults to False.
- Returns:
A list of
Patchobjects which will add metadata toTaskMetadata.__pydantic_extra__. Further, patch paths are defined such that study group, study, and calculation metadata will be added under the"study_group_metadata","study_metadata", and"calculation_metadata"keys, respectively.
Example
Patch study group and study metadata for all tasks in a subdirectory.
from pathlib import Path from autojob.harvest.harvest import harvest from autojob.harvest.patch import build_metadata_patches from autojob.harvest.patch import patch_tasks dir_name = Path().cwd() tasks = harvest(dir_name) patches = build_metadata_patches(dir_name) patch_tasks(patches, tasks)
autojob.harvest.structure module¶
Extract structural metadata.
- autojob.harvest.structure.load_structural_data(dir_name: str | Path) dict[str, Any][source]¶
Extract the structural data from the input structure of the directory.
- Parameters:
dir_name – The directory of the completed calculation.
- Returns:
The structural data. The following keys are guaranteed to be present in the returned dictionary:
"Structure""Base Structure": only assigned if extracted from adsorbate complex name"Adsorbate""Site""Orientation"
If no data is found, every value will be None.