How autojob structures directories ๐Ÿ’ญยถ

Because of autojobโ€˜s committment to file-based persistence, it imposes certain requirements on the directory structures that it is works with. autojob can support structured and unstructured directories. The type of directory that autojob finds changes how tasks are harvested and whether workflows are run.

Structured Directoriesยถ

The first and more complete directory structure is outlined below:

study_group/
โ”œโ”€ study_group.json
โ””โ”€ study/
   โ”œโ”€ study.json
   โ”œโ”€ parameterizations.json
   โ”œโ”€ record.txt
   โ”œโ”€ workflow.json
   โ””โ”€ task_group/
      โ”œโ”€ task_group.json
      โ””โ”€ task/
         โ”œโ”€ task.json
         โ””โ”€ inputs.json

Study groups are confined to a directory. As you can see, this directory structure mirrors the data hierarchy intrinsic to autojob. Structured directories are created by StudyGroup.to_directory() and Study.to_directory() and populated with functions such as create_task_group_tree(), create_task_tree(), and Task.write_inputs(). The purpose of each of the files above is outlined below:

study_group.json

A JSON dictionary containing study group metadata such as the study group ID, the study IDs of its constituent studies, and the name of the study group. The file is a JSON-serialized version of an instance of StudyGroup whose creation can be replicated with:

import json
from pathlib import Path

from autojob import SETTINGS
from autojob.study_group import StudyGroup

sg = StudyGroup()
metadata_file = SETTINGS.STUDY_GROUP_METADATA_FILE

with Path(metadata_file).open(mode="w", encoding="utf-8") as f:
    json.dump(sg.model_dump(), f)
study.json

A JSON dictionary containing study metadata such as the study group ID, the study ID, the task group IDs of its constituent task groups, and the name of the study. The file is a JSON-serialized version of an instance of Study whose creation can be replicated with:

import json
from pathlib import Path

from autojob import SETTINGS
from autojob.study import Study

study = Study()
metadata_file = SETTINGS.STUDY_METADATA_FILE

with Path(metadata_file).open(mode="w", encoding="utf-8") as f:
    json.dump(study.model_dump(), f)
parameterizations.json

a JSON dictionary mapping a workflow step ID to a Step (not implemented)

record.txt

a text file in which each line lists a task ID of a completed task (not implemented)

workflow.json

a dictionary mapping a workflow step ID to a list of workflow step IDs; a directed acyclic graph representing the studyโ€™s workflow (not implemented)

task_group.json

a JSON dictionary containing task group metadata such as the study ID, the task group IDs of its constituent task groups, and the name of the task group. The file is a JSON-serialized version of an instance of TaskMetadataBase with only those fields in task_base.TASK_GROUP_FIELDS present. In addition, the dictionary contains a "tasks" key that lists the tasks that are part of the task group. Creation of this file can be replicated with:

import json
from pathlib import Path

from autojob import SETTINGS
from autojob.bases.task_base import TASK_GROUP_FIELDS
from autojob.tasks.task import TaskMetadata

metadata = TaskMetadata()
metadata_file = SETTINGS.TASK_GROUP_METADATA_FILE

with Path(metadata_file).open(mode="w", encoding="utf-8") as f:
    json.dump(metadata.model_dump(include=TASK_GROUP_FIELDS), f)
task.json

A JSON dictionary containing task metadata such as the study group ID, the study ID, the task group ID, the task ID, and the name of the task. The file is a JSON-serialized version of an instance of TaskMetadataBase that is written when Task.write_inputs() is called. This file can be created with:

import json
from pathlib import Path

from autojob import SETTINGS
from autojob.tasks.task import TaskMetadata

metadata = TaskMetadata()
metadata_file = SETTINGS.TASK_GROUP_METADATA_FILE

with Path(metadata_file).open(mode="w", encoding="utf-8") as f:
    json.dump(metadata.model_dump(), f)

This file is read when loading tasks from a directory and is used to determine the type of task to load and construct the TaskMetadataBase for the task.

inputs.json

a JSON dictionary containing the task inputs. Exactly which keys appear in this file may differ depending on the type of task the inputs were created for. This file is written when Task.write_inputs() is called and can be created with:

import json
from pathlib import Path

from autojob import SETTINGS
from autojob.tasks.task import TaskInputs

inputs = TaskInputs()
metadata_file = SETTINGS.INPUTS_FILE

with Path(metadata_file).open(mode="w", encoding="utf-8") as f:
    json.dump(inputs.model_dump(), f)

This file is read when loading tasks from a directory and is used to determine the inputs of a task and construct the TaskInputsBase for the task.

Unstructured Directoriesยถ

Alternatively, if the required files are not found, then autojob can function in an unstructured mode. In this mode, metadata is only maintained for tasks. If task metadata files are missing, they are created. Tasks can still be harvested and restarted. But there is no support for running workflows in unstructured mode. This mode can be useful for quick scratchwork that still leverages metadata tracking, data harvesting, or task infrastructure.

To support this mode of use, autojob provides the CLI tool autojob new that can be used to quickly clone tasks from directories and create unstructured task directories.

Other Common Filesยถ

Other common files that autojob creates are:

archive.json

This is the default filename for saving autojob archives.

run.py

This is the default filename for calculation scripts that are used by Calculation.

run.sh

This is the default filename for task scripts that are used by Task.