dataset.py

Module for reading and analyzing Mide Instrumentation Data Exchange (MIDE) files.

Created on Sep 26, 2013

author

dstokes

Classes

class idelib.dataset.Dataset(stream, name=None, quiet=True, attributes=None)

A collection of sensor data and associated configuration info. Typically represents a single MIDE EMBL file.

Dictionary attributes are all keyed by the relevant ID (sensor ID, channel ID, etc.).

Variables
  • loading – Boolean; True if a file is still loading (or has not yet been loaded).

  • fileDamaged – Boolean; True if the file ended prematurely.

  • loadCancelled – Boolean; True if the file loading was aborted part way through.

  • sessions – A list of individual Session objects in the data set. A valid file will have at least one, even if there are no Session elements in the data.

  • sensors – A dictionary of Sensors.

  • channels – A dictionary of individual Sensor channels.

  • plots – A dictionary of individual Plots, the modified output of a Channel (or even another plot).

  • transforms – A dictionary of functions (or function-like objects) for adjusting/calibrating sensor data.

Constructor. Typically, these objects will be instantiated by functions in the importer module.

Parameters
  • stream – A file-like stream object containing EBML data.

  • name – An optional name for the Dataset. Defaults to the base name of the file (if applicable).

  • quiet – If True, non-fatal errors (e.g. schema/file version mismatches) are suppressed.

  • attributes – A dictionary of arbitrary attributes, e.g. Attribute elements parsed from the file. Typically used for diagnostic data.

addChannel(channelId=None, parser=None, channelClass=None, **kwargs)

Add a Channel to a Sensor. Note that the channelId and parser keyword arguments are not optional.

Parameters
  • channelId – An unique ID number for the channel.

  • parser – The Channel’s data parser

  • channelClass – An alternate (sub)class of channel. Defaults to None, which creates a standard Channel.

addSensor(sensorId=None, name=None, sensorClass=None, traceData=None, transform=None, attributes=None, bandwidthLimitId=None)

Create a new Sensor object, and add it to the dataset, and return it. If the given sensor ID already exists, the existing sensor is returned instead. To modify a sensor or add a sensor object created elsewhere, use Dataset.sensors[sensorId] directly.

Note that the sensorId keyword argument is not optional.

Parameters
  • sensorId – The ID of the new sensor.

  • name – The new sensor’s name

  • sensorClass – An alternate (sub)class of sensor. Defaults to None, which creates a Sensor.

Returns

The new sensor.

addSession(startTime=None, endTime=None, utcStartTime=None)

Create a new session, add it to the Dataset, and return it. Part of the import process.

addTransform(transform)

Add a transform (calibration, etc.) to the dataset. Various child objects will reference them by ID. Note: unlike the other add methods, this does not instantiate new objects.

addWarning(warningId=None, channelId=None, subchannelId=None, low=None, high=None, **kwargs)

Add a WarningRange to the dataset, which indicates when a sensor is reporting values outside of a given range.

Parameters
  • warningId – A unique numeric ID for the WarningRange.

  • channelId – The channel ID of the source being monitored.

  • subchannelId – The monitored source’s subchannel ID.

  • low – The minimum value of the acceptable range.

  • high – The maximum value of the acceptable range.

Returns

The new WarningRange instance.

property channels

A dictionary of individual Sensor channels.

close()

Close the recording file.

property closed

Has the recording file been closed?

endSession()

Set the current session’s start/end times. Part of the import process.

property exitCondition

The numeric code number for the condition that stopped the recording.

getPlots(subchannels=True, plots=True, debug=True, sort=True)

Get all plotable data sources: sensor SubChannels and/or Plots.

Parameters
  • subchannels – Include subchannels if True.

  • plots – Include Plots if True.

  • debug – If False, exclude debugging/diagnostic channels.

  • sort – Sort the plots by name if True.

hasSession(sessionId)

Does the Dataset contain a specific session number?

hierarchy()

Get a list of parents/grandparents all the way back to the root. The root is the first item in the list.

property lastSession

Retrieve the latest Session.

path()

Get the combined names of all the object’s parents/grandparents.

updateTransforms()

Update the transforms (e.g. the calibration functions) in this dataset. This should be called before utilizing data in the set.

class idelib.dataset.Channel(dataset, channelId=None, parser=None, sensor=None, name=None, units=None, transform=None, displayRange=None, sampleRate=None, cache=False, singleSample=None, attributes=None)

Output from a Sensor, containing one or more SubChannels. A Sensor contains one or more Channels. SubChannels of a Channel can be accessed by index like a list or tuple.

Variables
  • types – A tuple with the type of data in each of the Channel’s Subchannels.

  • displayRange – The possible ranges of each subchannel, dictated by the parser. Not necessarily the same as the range of actual values recorded in the file!

Constructor. This should generally be done indirectly via Dataset.addChannel().

Parameters
  • sensor – The parent sensor, if this Channel contains only data from a single sensor.

  • channelId – The channel’s ID, unique within the file.

  • parser – The channel’s EBML data parser.

  • name – A custom name for this channel.

  • units – The units measured in this channel, used if units are not explicitly indicated in the Channel’s SubChannels.

  • transform – A Transform object for adjusting sensor readings at the Channel level.

  • displayRange – A ‘hint’ to the minimum and maximum values of data in this channel.

  • cache – If True, this channel’s data will be kept in memory rather than lazy-loaded.

  • singleSample – A ‘hint’ that the data blocks for this channel each contain only a single sample (e.g. temperature/ pressure on an SSX). If None, this will be determined from the sample data.

  • attributes – A dictionary of arbitrary attributes, e.g. Attribute elements parsed from the file.

addSubChannel(subchannelId=None, channelClass=None, **kwargs)

Create a new SubChannel of the Channel.

getSession(sessionId=None)

Retrieve data recorded in a Session.

Parameters

sessionId – The ID of the session to retrieve.

Returns

The recorded data.

Return type

EventArray

getSubChannel(subchannelId)

Retrieve one of the Channel’s SubChannels. All Channels have at least one. A SubChannel object will be automatically generated if one hasn’t already explicitly been defined.

Parameters

subchannelId

Returns

The SubChannel matching the given ID.

getTransforms(id_=None, _tlist=None)

Get a list of all transforms applied to the data, from first (the lowest-level parent) to last (the transform, if any, on the object itself).

hierarchy()

Get a list of parents/grandparents all the way back to the root. The root is the first item in the list.

parseBlock(block, start=None, end=None, step=1, subchannel=None)

Parse subsamples out of a data block. Used internally.

Parameters
  • block – The data block from which to parse subsamples.

  • start – The first block index to retrieve.

  • end – The last block index to retrieve.

  • step – The number of steps between samples.

  • subchannel – If supplied, return only the values for a specific subchannel (i.e. the method is being called by a SubChannel).

Returns

A list of tuples, one for each subsample.

parseBlockByIndex(block, indices, subchannel=None)
Convert raw data into a set of subchannel values, returning only

specific items from the result by index.

Parameters
  • block – The data block element to parse.

  • indices – A list of sample index numbers to retrieve.

  • subchannel – If supplied, return only the values for a specific subchannel

Returns

A list of tuples, one for each subsample.

path()

Get the combined names of all the object’s parents/grandparents.

setTransform(transform, update=True)

Set the transforming function/object. This does not change the value of raw, however; the new transform will not be applied unless it is True.

updateTransforms()

Recompute cached transform functions.

class idelib.dataset.SubChannel(parent, subchannelId, name=None, units=('', ''), transform=None, displayRange=None, sensorId=None, warningId=None, axisName=None, attributes=None, color=None)

Output from a sensor, derived from a channel containing multiple pieces of data (e.g. the Y from an accelerometer’s XYZ). Looks like a ‘real’ channel.

Constructor. This should generally be done indirectly via Channel.addSubChannel().

Parameters
  • sensor – The parent sensor.

  • channelId – The channel’s ID, unique within the file.

  • parser – The channel’s payload data parser.

  • name – A custom name for this channel.

  • units – The units measured in this channel, used if units are not explicitly indicated in the Channel’s SubChannels. A tuple containing the ‘axis name’ (e.g. ‘Acceleration’) and the unit symbol (‘g’).

  • transform – A Transform object for adjusting sensor readings at the Channel level.

  • displayRange – A ‘hint’ to the minimum and maximum values of data in this channel.

  • sensorId – The ID of the sensor that generates this SubChannel’s data.

  • warningId – The ID of the WarningRange that indicates conditions that may adversely affect data recorded in this SubChannel.

  • axisName – The name of the axis this SubChannel represents. Use if the name contains additional text (e.g. “X” if the name is “Accelerometer X (low-g)”).

  • attributes – A dictionary of arbitrary attributes, e.g. Attribute elements parsed from the file.

addSubChannel(*args, **kwargs)

Create a new SubChannel of the Channel.

getSession(sessionId=None)

Retrieve a session by ID. If none is provided, the last session in the Dataset is returned.

getSubChannel(*args, **kwargs)

Retrieve one of the Channel’s SubChannels. All Channels have at least one. A SubChannel object will be automatically generated if one hasn’t already explicitly been defined.

Parameters

subchannelId

Returns

The SubChannel matching the given ID.

getTransforms(id_=None, _tlist=None)

Get a list of all transforms applied to the data, from first (the lowest-level parent) to last (the transform, if any, on the object itself).

hierarchy()

Get a list of parents/grandparents all the way back to the root. The root is the first item in the list.

parseBlock(block, start=None, end=None, step=1)

Parse subsamples out of a data block. Used internally.

Parameters
  • block – The data block from which to parse subsamples.

  • start – The first block index to retrieve.

  • end – The last block index to retrieve.

  • step – The number of steps between samples.

parseBlockByIndex(block, indices)

Parse specific subsamples out of a data block. Used internally.

Parameters
  • block – The data block from which to parse subsamples.

  • indices – A list of individual index numbers to get.

path()

Get the combined names of all the object’s parents/grandparents.

setTransform(transform, update=True)

Set the transforming function/object. This does not change the value of raw, however; the new transform will not be applied unless it is True.

updateTransforms()

Recompute cached transform functions.

class idelib.dataset.EventArray(parentChannel, session=None, parentList=None)

A list-like object containing discrete time/value pairs. Data is dynamically read from the underlying EBML file.

Constructor. This should almost always be done indirectly via the getSession() method of Channel and SubChannel objects.

append(block)

Add one data block’s contents to the Channel’s list of data. Note that this doesn’t double-check the channel ID specified in the data, but it is inadvisable to include data from different channels.

Attention

Added elements must be in chronological order!

arrayJitterySlice(start=None, end=None, step=1, jitter=0.5, display=False)

Create an array of events within a range of indices.

Parameters
  • start – The first index in the range, or a slice.

  • end – The last index in the range. Not used if start is a slice.

  • step – The step increment. Not used if start is a slice.

  • jitter – The amount by which to vary the sample time, as a normalized percentage of the regular time between samples.

  • display – If True, the EventArray transform (i.e. the ‘display’ transform) will be applied to the data.

Returns

a structured array of events in the specified index range.

arrayMinMeanMax(startTime=None, endTime=None, padding=0, times=True, display=False, iterator=<built-in function iter>)

Get the minimum, mean, and maximum values for blocks within a specified interval.

Todo

Remember what padding was for, and either implement or remove it completely. Related to plotting; see plots.

Parameters
  • startTime – The first time (in microseconds by default), None to start at the beginning of the session.

  • endTime – The second time, or None to use the end of the session.

  • times – If True (default), the results include the block’s starting time.

  • display – If True, the final ‘display’ transform (e.g. unit conversion) will be applied to the results.

Returns

A structured array of data block statistics (min, mean, and max, respectively).

arrayRange(startTime=None, endTime=None, step=1, display=False)

Get a set of data occurring in a given time interval.

Parameters
  • startTime – The first time (in microseconds by default), None to start at the beginning of the session.

  • endTime – The second time, or None to use the end of the session.

Returns

a structured array of events in the specified time interval.

arrayResampledRange(startTime, stopTime, maxPoints, padding=0, jitter=0, display=False)

Retrieve the events occurring within a given interval, undersampled as to not exceed a given length (e.g. the size of the data viewer’s screen width).

Todo

Optimize iterResampledRange(); not very efficient, particularly not with single-sample blocks.

arraySlice(start=None, end=None, step=1, display=False)

Create an array of events within a range of indices.

Parameters
  • start – The first index in the range, or a slice.

  • end – The last index in the range. Not used if start is a slice.

  • step – The step increment. Not used if start is a slice.

  • display – If True, the EventArray transform (i.e. the ‘display’ transform) will be applied to the data.

Returns

a structured array of events in the specified index range.

arrayValues(start=None, end=None, step=1, subchannels=True, display=False)

Get all values in the given index range (w/o times).

Parameters
  • start – The first index in the range, or a slice.

  • end – The last index in the range. Not used if start is a slice.

  • step – The step increment. Not used if start is a slice.

  • subchannels – A list of subchannel IDs or Boolean. True will return all subchannels in native order.

  • display – If True, the EventArray transform (i.e. the ‘display’ transform) will be applied to the data.

Returns

a structured array of values in the specified index range.

copy(newParent=None)

Create a shallow copy of the event list.

exportCsv(stream, start=None, stop=None, step=1, subchannels=True, callback=None, callbackInterval=0.01, timeScalar=1, raiseExceptions=False, dataFormat='%.6f', delimiter=', ', useUtcTime=False, useIsoFormat=False, headers=False, removeMean=None, meanSpan=None, display=False, noBivariates=False)

Export events as CSV to a stream (e.g. a file).

Parameters
  • stream – The stream object to which to write CSV data.

  • start – The first event index to export.

  • stop – The last event index to export.

  • step – The number of events between exported lines.

  • subchannels – A sequence of individual subchannel numbers to export. Only applicable to objects with subchannels. True (default) exports them all.

  • callback – A function (or function-like object) to notify as work is done. It should take four keyword arguments: count (the current line number), total (the total number of lines), error (an exception, if raised during the export), and done (will be True when the export is complete). If the callback object has a cancelled attribute that is True, the CSV export will be aborted. The default callback is None (nothing will be notified).

  • callbackInterval – The frequency of update, as a normalized percent of the total lines to export.

  • timeScalar – A scaling factor for the event times. The default is 1 (microseconds).

  • raiseExceptions

  • dataFormat – The number of decimal places to use for the data. This is the same format as used when formatting floats.

  • useUtcTime – If True, times are written as the UTC timestamp. If False, times are relative to the recording.

  • useIsoFormat – If True, the time column is written as the standard ISO date/time string. Only applies if useUtcTime is True.

  • headers – If True, the first line of the CSV will contain the names of each column.

  • removeMean – Overrides the EventArray’s mean removal for the export.

  • meanSpan – The span of the mean removal for the export. -1 removes the total mean.

  • display – If True, export using the EventArray’s ‘display’ transform (e.g. unit conversion).

Returns

Tuple: The number of rows exported and the elapsed time.

getEventIndexBefore(t)

Get the index of an event occurring on or immediately before the specified time.

Parameters

t – The time (in microseconds)

Returns

The index of the event preceding the given time, -1 if the time occurs before the first event.

getEventIndexNear(t)

The the event occurring closest to a specific time.

Parameters

t – The time (in microseconds)

Returns

getInterval()

Get the first and last event times in the set.

getMax(startTime=None, endTime=None, display=False, iterator=<built-in function iter>)

Get the event with the maximum value, optionally within a specified time range. For Channels, returns the maximum among all Subchannels.

Parameters
  • startTime – The starting time. Defaults to the start.

  • endTime – The ending time. Defaults to the end.

  • display – If True, the final ‘display’ transform (e.g. unit conversion) will be applied to the results.

Returns

The event with the maximum value.

getMean(startTime=None, endTime=None, display=False, iterator=<built-in function iter>)

Get the mean value of all events, optionally within a specified time range. For Channels, returns the mean among all Subchannels.

Parameters
  • startTime – The starting time. Defaults to the start.

  • endTime – The ending time. Defaults to the end.

  • display – If True, the final ‘display’ transform (e.g. unit conversion) will be applied to the results.

Returns

The event with the minimum value.

getMeanNear(t, outOfRange=False)

Retrieve the mean value near a given time.

getMin(startTime=None, endTime=None, display=False, iterator=<built-in function iter>)

Get the event with the minimum value, optionally within a specified time range. For Channels, returns the minimum among all Subchannels.

Parameters
  • startTime – The starting time. Defaults to the start.

  • endTime – The ending time. Defaults to the end.

  • display – If True, the final ‘display’ transform (e.g. unit conversion) will be applied to the results.

Returns

The event with the minimum value.

getMinMeanMax(startTime=None, endTime=None, padding=0, times=True, display=False, iterator=<built-in function iter>)

Get the minimum, mean, and maximum values for blocks within a specified interval. (Currently an alias of arrayMinMeanMax.)

Todo

Remember what padding was for, and either implement or remove it completely. Related to plotting; see plots.

Parameters
  • startTime – The first time (in microseconds by default), None to start at the beginning of the session.

  • endTime – The second time, or None to use the end of the session.

  • times – If True (default), the results include the block’s starting time.

  • display – If True, the final ‘display’ transform (e.g. unit conversion) will be applied to the results.

Returns

A structured array of data block statistics (min, mean, and max, respectively).

getRange(startTime=None, endTime=None, display=False)

Get a set of data occurring in a given time interval. (Currently an alias of arrayRange.)

Parameters
  • startTime – The first time (in microseconds by default), None to start at the beginning of the session.

  • endTime – The second time, or None to use the end of the session.

Returns

a collection of events in the specified time interval.

getRangeIndices(startTime, endTime)

Get the first and last event indices that fall within the specified interval.

Parameters
  • startTime – The first time (in microseconds by default), None to start at the beginning of the session.

  • endTime – The second time, or None to use the end of the session.

getRangeMinMeanMax(startTime=None, endTime=None, subchannel=None, display=False, iterator=<built-in function iter>)

Get the single minimum, mean, and maximum value for blocks within a specified interval. Note: Using this with a parent channel without specifying a subchannel number can produce meaningless data if the channels use different units or are on different scales.

Parameters
  • startTime – The first time (in microseconds by default), None to start at the beginning of the session.

  • endTime – The second time, or None to use the end of the session.

  • subchannel – The subchannel ID to retrieve, if the EventArray’s parent has subchannels.

  • display – If True, the final ‘display’ transform (e.g. unit conversion) will be applied to the results.

Returns

A namedtuple of aggregated event statistics (min, mean, and max, respectively).

getSampleRate(idx=None)

Get the channel’s sample rate. This is either supplied as part of the channel definition or calculated from the actual data and cached.

Parameters

idx – Because it is possible for sample rates to vary within a channel, an event index can be specified; the sample rate for that event and its siblings will be returned.

Returns

The sample rate, as samples per second (float)

getSampleTime(idx=None)

Get the time between samples.

Parameters

idx – Because it is possible for sample rates to vary within a channel, an event index can be specified; the time between samples for that event and its siblings will be returned.

Returns

The time between samples (us)

getTransforms(id_=None, _tlist=None)

Get a list of all transforms applied to the data, from first (the lowest-level parent) to last (the transform, if any, on the object itself).

getValueAt(at, outOfRange=False, display=False)

Retrieve the value at a specific time, interpolating between existing events.

Todo

Optimize. This creates a bottleneck in the calibration.

Parameters
  • at – The time at which to take the sample.

  • outOfRange – If False, times before the first sample or after the last will raise an IndexError. If True, the first or last time, respectively, is returned.

hierarchy()

Get a list of parents/grandparents all the way back to the root. The root is the first item in the list.

iterJitterySlice(start=None, end=None, step=1, jitter=0.5, display=False)

Create an iterator producing events for a range of indices.

Parameters
  • start – The first index in the range, or a slice.

  • end – The last index in the range. Not used if start is a slice.

  • step – The step increment. Not used if start is a slice.

  • jitter – The amount by which to vary the sample time, as a normalized percentage of the regular time between samples.

  • display – If True, the EventArray transform (i.e. the ‘display’ transform) will be applied to the data.

Returns

an iterable of events in the specified index range.

iterMinMeanMax(startTime=None, endTime=None, padding=0, times=True, display=False)

Get the minimum, mean, and maximum values for blocks within a specified interval.

Todo

Remember what padding was for, and either implement or remove it completely. Related to plotting; see plots.

Parameters
  • startTime – The first time (in microseconds by default), None to start at the beginning of the session.

  • endTime – The second time, or None to use the end of the session.

  • times – If True (default), the results include the block’s starting time.

  • display – If True, the final ‘display’ transform (e.g. unit conversion) will be applied to the results.

Returns

An iterator producing sets of three events (min, mean, and max, respectively).

iterRange(startTime=None, endTime=None, step=1, display=False)

Get a set of data occurring in a given interval.

Parameters
  • startTime – The first time (in microseconds by default), None to start at the beginning of the session.

  • endTime – The second time, or None to use the end of the session.

iterResampledRange(startTime, stopTime, maxPoints, padding=0, jitter=0, display=False)

Retrieve the events occurring within a given interval, undersampled as to not exceed a given length (e.g. the size of the data viewer’s screen width).

Todo

Optimize iterResampledRange(); not very efficient, particularly not with single-sample blocks.

iterSlice(start=None, end=None, step=1, display=False)

Create an iterator producing events for a range of indices.

Parameters
  • start – The first index in the range, or a slice.

  • end – The last index in the range. Not used if start is a slice.

  • step – The step increment. Not used if start is a slice.

  • display – If True, the EventArray transform (i.e. the ‘display’ transform) will be applied to the data.

Returns

an iterable of events in the specified index range.

itervalues(start=None, end=None, step=1, subchannels=True, display=False)

Iterate all values in the given index range (w/o times).

Parameters
  • start – The first index in the range, or a slice.

  • end – The last index in the range. Not used if start is a slice.

  • step – The step increment. Not used if start is a slice.

  • subchannels – A list of subchannel IDs or Boolean. True will return all subchannels in native order.

  • display – If True, the EventArray transform (i.e. the ‘display’ transform) will be applied to the data.

Returns

an iterable of structured array value blocks in the specified index range.

path()

Get the combined names of all the object’s parents/grandparents.

setTransform(transform, update=True)

Set the transforming function/object. This does not change the value of raw, however; the new transform will not be applied unless it is True.

updateTransforms(recurse=True)

(Re-)Build and (re-)apply the transformation functions.

class idelib.dataset.WarningRange(dataset, warningId=None, channelId=None, subchannelId=None, low=None, high=None, attributes=None)

An object for indicating when a set of events goes outside of a given range. Originally created for flagging periods of extreme temperatures that will affect accelerometer readings.

For efficiency, the source data should have relatively few samples (e.g. a low sample rate).

Constructor.

property displayName

A nice, human-readable description of this warning range, for use with user interfaces.

getRange(start=None, end=None, sessionId=None, iterator=<built-in function iter>)

Retrieve the invalid periods within a given range of events.

Returns

A list of invalid periods’ [start, end] times.

getSessionSource(sessionId=None)
getValueAt(at, sessionId=None, source=None)

Retrieve the value at a specific time.