dataset

A Dataset is the collection of data produced during one contiguous period of time (or, informally, a “session”).

It can include multiple Recording s, which are the data produced by a single device during the collection of the dataset. A Recording may consist of several different modalities of data, or “streams”, like a Video, a raw binary stream, and an accompanying metadata CSV.

The items within a recording are assumed to be of the same timebase. If a device produces multiple streams of data in different timebases (e.g. video and electrophysiology), then those should be considered separate recordings.

A dataset might consist of multiple recordings from different devices that need to be aligned (e.g. multiple cameras from multiple angles, multiple sensors receiving the same stream, etc.). The dataset can contain an alignment_map that maps a common, contiguous, monotonic index onto the indexes of individual recordings.

Recordings may be related to or derived from other recordings: e.g. A video can be indicated as being derived from a binary stream, a preprocessed, denoised, etc. video can be derived from the raw video, and so on. A derivation is indicated by a reference from the derived to source recording and the transformation that was applied.

Timestamps within a dataset are assumed to be in the same unit (e.g. datetimes or unix epoch floats) and in the same timezone, but not necessarily entirely equivalent (e.g. multiple machines with system clocks synchronized with NTP).

A dataset is assumed to be on disk, and only small, text-based streams are loaded into memory. The recordings within a dataset are thus primarily represented as paths, but provide iterators or other accessors to get their contents by slicing syntax.

class mio.models.dataset.Dataset(*, path: Path, recordings: dict[str, ~mio.models.dataset.Recording]=<factory>, alignment_map: DataFrame | None = None)

A single capture from a mio device, including any videos, metadata tables, and other byproducts

align(recordings: list[Recording] | list[str], write: bool = False) Self

Create an alignment map, or return an already-existing alignment map

alignment_map: DataFrame | None

A dataframe with a column “index” that is the common index for frames within recordings, and columns for each recording name containing the index that the mapped index corresponds to such that each frame within a row was captured at the same time.

Stored as alignment_map.csv in the dataset directory

E.g. if a dataset contains two videos “a” and “b”, and “b” started 5 frames before “a”, then the alignment map would look like:

index | a | b |
—– | - | - |
0 | 0 | 5 |
1 | 1 | 6 |
classmethod from_directory(path: Path) Dataset

Read a dataset from a directory

classmethod from_recordings(recordings: list[Recording]) Dataset

Instantiate a dataset from recordings, loading any alignment map found.

get_stitched(recordings: list[Recording] | list[str]) StitchedRecording

Get a stitched recording of a set of recordings, if it exists, otherwise throw a KeyError

model_config = {'arbitrary_types_allowed': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

path: Path

The directory where the files within the dataset are contained

recordings: dict[str, Recording]

Recordings within this dataset

stitch(recordings: list[Recording] | list[str]) Self

Combine multiple recordings from the same device into a single recording by selecting the best matching frame from each.

See stitch() for more details.

class mio.models.dataset.RawVideoRecording(*, name: str, type: Literal['raw'] = 'raw', video: Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)], metadata: DataFrame | None = None, timestamps: DataFrame | None = None, noise: DataFrame | None = None, binary: Path | None = None, derived_from: RecordingDerivation | None = None)

A raw video

model_config = {'arbitrary_types_allowed': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

type: Literal['raw']

What type of recording this is

class mio.models.dataset.Recording(*, name: str, type: Literal['raw', 'stitched'], video: Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)], metadata: DataFrame | None = None, timestamps: DataFrame | None = None, noise: DataFrame | None = None, binary: Path | None = None, derived_from: RecordingDerivation | None = None)

A single set of matching data streams from a device within a dataset.

binary: Path | None

Path to any raw binary version of the data in the video

derived_from: RecordingDerivation | None
classmethod from_video(path: Path) Annotated[Annotated[RawVideoRecording, Tag(tag=raw)] | Annotated[StitchedRecording, Tag(tag=stitched)], Discriminator(discriminator=_recording_discriminator, custom_error_type=None, custom_error_message=None, custom_error_context=None)]

Find the adjoining files from the video path

metadata: DataFrame | None

Metadata for frames within the video

model_config = {'arbitrary_types_allowed': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

name: str

The name of the recording used in filenames to group them together

noise: DataFrame | None

Framewise noise measurements (created with score_noise() ).

property paths: RecordingPaths

Given some video, the expected paths for its related components

score_noise(config: NoisePatchConfig | None = None, progress: bool = False, force: bool = False) DataFrame

Score the noise level in each frame with score_noise(), saving as a csv with {name}_noise.csv

timestamps: DataFrame | None

Timestamps table, (currently) stored as {video_name}_timestamps.csv next to the video. When instantiating a recording, if a metadata file exists but timestamps do not, they are automatically generated.

type: Literal['raw', 'stitched']

What type of recording this is

video: Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)]

A video created as part of this recording

class mio.models.dataset.RecordingDerivation(*, type: Literal['stitched'], sources: set[str])

How a recording was derived from other recordings

model_config = {}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

sources: set[str]

Which other recordings this recording was derived from

type: Literal['stitched']
class mio.models.dataset.RecordingPaths

Filenames for potential parts of a recording

binary: Path

{stem}.bin

metadata: Path

{stem}.csv

noise: Path

{stem}_noise.csv

timestamps: Path

{stem}_timestamps.csv

video: Path

{stem}.avi

class mio.models.dataset.StitchedRecording(*, name: str, type: Literal['stitched'] = 'stitched', video: Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)], metadata: DataFrame, timestamps: DataFrame | None = None, noise: DataFrame | None = None, binary: Path | None = None, derived_from: RecordingDerivation, scores: DataFrame, debug_video: Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | None = None)

Multiple video recordings stitched together, picking one best aligned frame from each

debug_video: Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | Annotated[VideoProxy, GetPydanticSchema(get_pydantic_core_schema=__get_pydantic_core_schema__, get_pydantic_json_schema=None)] | None

An optional debug video that shows the source videos side by side with differences marked

derived_from: RecordingDerivation

A derivation reference that indicates which videos this stitch was derived from

classmethod from_video(path: Path) StitchedRecording

Determine which videos we were derived from using the path name

metadata: DataFrame

Metadata for frames within the video

model_config = {'arbitrary_types_allowed': True, 'validate_default': True}

Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

scores: DataFrame

A csv that indicates which recording each stitched frame was selected from

type: Literal['stitched']

What type of recording this is

mio.models.dataset.paths_from_video(video: Path) RecordingPaths

Given some path to a root video, create the expected paths for its components