autojob.harvest package¶
Harvest the results of a completed task.
Submodules¶
autojob.harvest.archive module¶
I/O functions for the utility scripts (compiled here for consistency).
- autojob.harvest.archive.archive(filename: str, archive_mode: Literal['csv', 'json', 'both'], harvested: list[Calculation], *, dest: Path | None = None, time_stamp: bool = False) None[source]¶
Archive completed calculations with the given format.
- Parameters:
filename – The filename (without extension) with which to archive the calculations.
archive_mode – The format with which to archive the calculations. Must be one of
"csv","json", or"both".harvested – The list of calculations to archive.
dest – The directory in which to save the archives.
time_stamp – Whether or not to time stamp the archive file.
- autojob.harvest.archive.archive_csv(calculations: list[Calculation], dest: Path | None = None) None[source]¶
Archive a list of calculations in CSV format.
- Parameters:
calculations – A list of calculations to archive.
dest – The filename to use archive the calculation. Defaults to
"database_<TIME_STAMP>.csv"whereTIME_STAMPis the current time in ISO format.
- autojob.harvest.archive.archive_json(tasks: list[TaskBase], dest: Path | None = None) None[source]¶
Archive a list of calculations in JSON format.
- Parameters:
tasks – A list of tasks to archive.
dest – The filename to use archive the calculation. Defaults to
"database_<TIME_STAMP>.json"whereTIME_STAMPis the current time in ISO format.
- autojob.harvest.archive.flatten_calculations(calculations: list[Calculation]) list[dict[str, Any]][source]¶
Flatten each calculation into a CSV-friendly format.
- Parameters:
calculations – The calculations to flatten.
- Returns:
A list of dictionaries mapping calculation fields (e.g.,
energy,forces,zpe_correction) to their values. The keys of nested dictionaries such as the calculation parameters are also accessible.
autojob.harvest.cli module¶
CLI function for harvesting task results.
autojob.harvest.harvest module¶
Harvest data from the directories of completed tasks.
Example
Harvest the results in the current working directory as calculations
from pathlib import Path
from autojob.tasks.calculation import Calculation
from autojob.harvest.harvest import harvest
harvest(dir_name=Path.cwd(), strictness="relaxed", preferred="calculation")
Important
Always verify the units of harvested quantities.
- autojob.harvest.harvest.harvest(dir_name: str | Path, *, strictness: Literal['strict', 'relaxed', 'atomic'] | None = None, whitelists: list[str] | list[Path] | None = None, blacklists: list[str] | list[Path] | None = None, preferred: str | None = None, use_cache: bool = False) list[TaskBase][source]¶
Collect all data in subdirectories of the given directory.
- Parameters:
dir_name – The directory under which to collect data.
strictness – How to treat tasks for which errors are thrown during their harvesting. If
"strict", all harvesting will abort. If"atomic", only calculations for which errors are not thrown will be harvested. If"relaxed", every attempt to harvest all calculations. The default behaviour is controlled by the value ofSETTINGS.STRICT_MODE. IfSETTINGS.STRICT_MODE=True, the default behaviour will be that ofstrictness="strict". Otherwise, the default behaviour will be that ofstrictness="relaxed".whitelists – A list of strings or paths representing whitelist filenames, where each whitelist points to a list of task IDs that should be harvested. When specified, only tasks with task IDs matching these IDs will be harvested. Defaults to None in which case all tasks are eligible for harvesting.
blacklists – A list of strings or paths representing blacklist filenames, where each blacklist points to a list of task IDs that should not be harvested. hen specified, no tasks with task IDs in this list will be harvested. Defaults to None in which case all tasks will be harvested.
preferred – The name of the preferred TaskBase type to use to harvest each calculation. Defaults to
SETTINGS.DEFAULT_TASK.use_cache – Whether or not to use cached results. If False, then cached results will be ignored and overwritten. Otherwise, outputs will be read from an existing cache.
- Returns:
A list of
TaskBases containing the data withindir_name.
autojob.harvest.mechanism module¶
Associate calculations with a reaction mechanism.
- class autojob.harvest.mechanism.ElectrochemicalState(net_hydrogens: int = 0, net_electrons: int = 0, net_water_lost: int = 0, reference: str | None = None, stoich: float = 1.0)[source]¶
Bases:
NamedTupleAn electrochemical state along reaction relative to a reference.
- Variables:
net_hydrogens (int) – The net number of hydrogens transferred from the reference to the
ElectrochemicalState. Defaults to 0.net_electrons (int) – The net number of electrons transferred from the reference to the
ElectrochemicalState. Defaults to 0.net_water_lost (int) – The net number of water atoms lost from the reference to the
ElectrochemicalState. Defaults to 0.reference (str | None) – The reference state for the
ElectrochemicalState. Defaults to None. If provided, reference can be used to relate ElectrochemicalState s defined for different references.stoich (float) – The absolute value of the stoichiometric coefficient for the reference state. Defaults to 1.0.
Note
net_water_lostis defined opposite to hownet_hydrogensandnet_electronsare defined so that simple tuple comparisons can be used to orderElementarySteps according to typical electrochemical reactions. However, this ordering is not absolute as there may be reaction mechanisms for which electron transfer precedes proton transfer.Create new instance of ElectrochemicalState(net_hydrogens, net_electrons, net_water_lost, reference, stoich)
- apply_comp_hydrogen_model(energy_h2: float, energy_h2o: float, *, initial: float, final: float, applied_potential: float = 0.0) float[source]¶
Calculate an energy using the computational hydrogen electrode (CHE).
This method follows the formalism outlined in:
J. K. Nørskov, J. Rossmeisl, A. Logadottir, L. Lindqvist, J. R. Kitchin, T. Bligaard, and H. Jónsson The Journal of Physical Chemistry B 2004 108 (46), 17886-17892 DOI: 10.1021/jp047349j
- Parameters:
energy_h2 – The energy of gas phase hydrogen to use for the calculation.
energy_h2o – The energy of gas phase water to use for the calculation.
initial – The energy of the reference state to use for the elementary step. For catalyzed reactions, this could include the catalyst energy.
final – The energy of the final state of the elementary step.
applied_potential – The applied potential in Volts.
- Returns:
The free energy of the species resulting from the given elementary step under the CHE formalism.
- class autojob.harvest.mechanism.MechanisticEntry(elementary_step: ElectrochemicalState, name: str, energy: float)[source]¶
Bases:
NamedTupleCalculated thermodynamic data for an elementary step.
- Variables:
elementary_step (autojob.harvest.mechanism.ElectrochemicalState) – The
ElementaryStepwith which theMechanisticEntryis associated.name (str) – A string labeling the entry. For example, the catalyst or molecule name.
energy (float) – A float representing the calculated energy for the entry.
Create new instance of MechanisticEntry(elementary_step, name, energy)
- elementary_step: ElectrochemicalState¶
Alias for field number 0
- autojob.harvest.mechanism.aggregate_mechanism_data(all_data: list[dict[str, Any]]) list[list[dict[str, Any]]][source]¶
Group all data deemed to belong to the same mechanism entry.
Data is determined to be grouped based on whether any other calculation ids appear in their “Calculation Notes” key.
- Parameters:
all_data – A list of dictionaries containing the data. Each dictionary must contain the following keys: - “Calculation Notes” - “Job Notes”
- Returns:
A list of lists of dictionaries grouped by mechanism entry.
- autojob.harvest.mechanism.build_job_graph(all_data: list[dict[str, Any]]) dict[str, tuple[str | None, str | None]][source]¶
Build directed acyclic graph representing the connectivity of the jobs.
- Parameters:
all_data – A list of dictionaries representing job data. Job connectivity is determined based on the “Job Notes” key.
- Returns:
A dictionary mapping jobs to their ancestor (
job,calculation). If the ancestor of a job cannot be found, its ancestor is (None, None).
autojob.harvest.patch module¶
Supplement harvested data with data patches.
Oftentimes, you may have additional data that you either
can’t determine a priori (and thus mark the task with it prior to submission), or
extract programatically (these may be analyses that require fuzzy intuition).
but nonetheless want to store with your data. This module defines some simple routines and classes to facilitate the latter use-case.
A Patch is just that, a “patch” - it fills in the gap in data
that may exist. To define one, you specify to a feature of the data
to which it should be applied and what data should be added when it is
applied.
from ase import Atoms
from autojob.harvest.patch import Patch
pch = Patch(match_path=["study_id"],
match_value="123456789",
patch_path=["atoms", "positions"]
patch_value=[0.0, 0.0, 0.0]
)
datapoint1 = {
"study_id": None,
"atoms": None
}
atoms = Atoms("C", positions=[[0.0, 1.0, 2.0]])
datapoint2 = {
"study_id": None,
"atoms": atoms
}
pch.apply(datapoint1)
print(datapoint1["atoms"])
None
pch.apply(datapoint2)
print(datapoint2["atoms"].positions)
[0.0, 0.0, 0.0]
To what data the Patch will apply is specified by match_path and
match_value. While, what will be applied is specified by patch_path and
patch_value.
Note
Patch applies to both dictionaries and objects alike!
Example
Apply a set of patches in batch
from autojob.task import Task
tasks = [Task(...), Task(...), ...]
patches = [Patch(..., Patch(...), ...]
for task in tasks:
for patch in patches:
patch.apply(task)
- class autojob.harvest.patch.Patch(match_path: list[str], match_value: Any, patch_path: list[str], patch_value: Any)[source]¶
Bases:
NamedTupleA data patch.
- Variables:
match_path (list[str]) – A list of attribute/key names used to identify which attributes are to be patched by the path.
match_value (Any) – The value of the attribute/key that must match.
patch_path (list[str]) – The value of the attribute/key to be patched.
patch_value (Any) – The value of the attribute/key to be set.
Create new instance of Patch(match_path, match_value, patch_path, patch_value)
- autojob.harvest.patch.build_metadata_patches(dir_name: Path, *, metadata_type: Literal['study_group', 'study', 'task_group'] = 'study_group', strict_mode: bool | None = None) list[Patch][source]¶
Create patches from metadata files.
- Parameters:
dir_name – The name of the directory under which to search for metadata. Defaults to the current working directory.
strict_mode – Whether or not to abort metadata collection if metadata cannot be found. Defaults to
SETTINGS.STRICT_MODE.metadata_type – The type of metadata file from which patches are to be built. Must be one of
"study_group","study","calculation". Defaults to"study_group".
- Returns:
A list of
Patchobjects which will add metadata toTaskMetadata.__pydantic_extra__. Further, patch paths are defined such that study group, study, and calculation metadata will be added under the"study_group_metadata","study_metadata", and"calculation_metadata"keys, respectively.
Example
Patch study group and study metadata for all tasks in a subdirectory.
from pathlib import Path from autojob.harvest.harvest import harvest from autojob.harvest.patch import build_metadata_patches from autojob.harvest.patch import patch_tasks dir_name = Path().cwd() tasks = harvest(dir_name) patches = build_metadata_patches(dir_name) patch_tasks(patches, tasks)