Task structure
Before defining a new task in OpenProblems, it’s important to understand the typical structure of an OpenProblems task (Figure 1).
A task typically consists of a dataset processor, methods, control methods and metrics. Each component has a well-defined input-output interface, for which the file formats in the resulting AnnData are also described.
File and component formats
Path: src/tasks/<task_id>/api
This folder contains task-specific file formats and component interfaces. More specifically:
task_info.yaml
: The task name, description and other metadata.file_*.yaml
: File format specifications for the task.comp_*.yaml
: Component interface specifications for the task.thumbnail.svg
: A thumbnail image for the task.
Dataset processor
Path: src/tasks/<task_id>/process_dataset
The data processor transforms a common dataset into task-specific dataset objects. In supervised tasks, this component will usually output a solution, a training dataset and a test dataset. In unsupervised tasks, this component usually output a solution and a masked dataset.
Methods
Path: src/tasks/<task_id>/methods
This folder contains method components. Each method component outputs a prediction given the training and test datasets (when applicable).
Control methods
Path: src/tasks/<task_id>/control_methods
This folder contains control methods for the task. These components have the same interface as the regular methods but also receive the solution object as input. It serves as a starting point to test the relative accuracy of new methods in the task, and also as a quality control for the metrics defined in the task. A control method can either be a positive control or a negative control, which set a maximum and minimum threshold for performance, so any new method should perform better than the negative control methods and worse than the positive control method.
A positive control is a method where the expected results are known, thus resulting in the best possible value for any metric outcome measure.
A negative control is a simple, naive, or random method that does not rely on any sophisticated techniques or domain knowledge.
Metrics
Path: src/tasks/<task_id>/metrics
This folder contains metric components. Each metric component outputs one or more metric results given a solution object and a method output object.
Benchmarking pipeline
Path: src/tasks/<task_id>/workflows
This folder contains a Nextflow workflow defining the benchmarking workflow for this task.
Resource generation scripts
Path: src/tasks/<task_id>/resources_scripts
This folder contains scripts for generating benchmarking resources required for the task.
Test resource generation scripts
Path: src/tasks/<task_id>/resources_test_scripts
This folder contains scripts for generating test resources for the task.