rhythmic_segments.segments¶
Utility functions for working with rhythmic segments.
This module provides helpers for constructing and manipulating n-gram segments that represent consecutive rhythmic events.
Classes¶
Immutable container for rhythmic segment matrices. |
Functions¶
|
Return a vectorized sliding-window matrix of interval segments. |
|
Normalize each segment to sum to one and return scaling factors. |
|
Return numeric data and processed metadata extracted from data. |
Module Contents¶
- rhythmic_segments.segments.extract_segments(intervals: Iterable[float], length: int, *, warn_on_short: bool = True, copy: bool = True, check_zero_intervals: bool = True, check_nan_intervals: bool = True) numpy.ndarray¶
Return a vectorized sliding-window matrix of interval segments.
Parameters¶
- intervalsIterable[float]
Contiguous numeric intervals. Inputs containing
np.nanmust be pre-split viarhythmic_segments.helpers.split_into_blocks().- lengthint
Window size of each produced segment.
- warn_on_shortbool, optional
Emit a
UserWarningwhen the data is shorter thanlengthand no segments can be formed.- copybool, optional
Return a copy of the data (default) instead of a view.
- check_zero_intervals, check_nan_intervalsbool, optional
Enable validation that forbids zero or NaN intervals, respectively.
Returns¶
- np.ndarray
Matrix of shape
(n_segments, length)containing the extracted segments.
Examples¶
>>> import numpy as np >>> extract_segments(np.arange(1, 6, dtype=float), 3) array([[1., 2., 3.], [2., 3., 4.], [3., 4., 5.]]) >>> extract_segments([1, 0, 2], 2, check_zero_intervals=False) array([[1., 0.], [0., 2.]])
- rhythmic_segments.segments.normalize_segments(segments: Iterable[Iterable[float]]) Tuple[numpy.ndarray, numpy.ndarray]¶
Normalize each segment to sum to one and return scaling factors.
>>> normalize_segments([[1, 1], [2, 1]]) (array([[0.5 , 0.5 ], [0.66666667, 0.33333333]]), array([2., 3.]))
- rhythmic_segments.segments.process_input_data(data: Any, *, column: str | None, meta: Any | None, meta_columns: Iterable[str] | None, meta_constants: collections.abc.Mapping[str, Any] | None, data_label: str) Tuple[numpy.ndarray, pandas.DataFrame | None]¶
Return numeric data and processed metadata extracted from data.
When column is provided and meta is supplied, the explicit metadata takes precedence over the inferred DataFrame columns. Metadata selection via meta_columns and constant assignments from meta_constants are applied before returning the DataFrame.
- class rhythmic_segments.segments.RhythmicSegments¶
Immutable container for rhythmic segment matrices.
>>> rs = RhythmicSegments.from_intervals([0.5, 1.0, 0.75, 1.25], length=2) >>> rs.segments.shape (3, 2) >>> rs.durations array([1.5 , 1.75, 2. ], dtype=float32)
- segments: numpy.ndarray¶
- patterns: numpy.ndarray¶
- durations: numpy.ndarray¶
- length: int¶
- meta: pandas.DataFrame¶
- static from_segments(segments: Iterable[Iterable[float]], *, length: int | None = None, meta: Any | None = None, meta_columns: Iterable[str] | None = None, meta_constants: collections.abc.Mapping[str, Any] | None = None, dtype=np.dtype('float32')) RhythmicSegments¶
Create an instance from a precomputed segment matrix.
Parameters¶
- segmentsIterable[Iterable[float]]
Matrix of segment data.
- lengthOptional[int]
Expected segment length. Required when
segmentsis empty and must be at least2.- metaOptional[Any]
Per-segment metadata; anything convertible to
pandas.DataFramewith one row per segment.- meta_columnsOptional[Iterable[str]], optional
Names of metadata columns to retain. When
Noneall columns are kept.- meta_constantsOptional[Mapping[str, Any]], optional
Constant metadata columns to add.
- dtypedata-type, optional
Target dtype for the internal arrays. Defaults to
np.float32.
Examples¶
>>> rs = RhythmicSegments.from_segments([[1, 2], [3, 4]], meta={'label': ['a', 'b']}) >>> rs.segments array([[1., 2.], [3., 4.]], dtype=float32) >>> list(rs.meta['label']) ['a', 'b']
- static from_intervals(intervals: Iterable[Any], length: int, *, split_at_nan: bool = True, warn_on_short: bool = True, copy: bool = True, check_zero_intervals: bool = True, column: str | None = None, meta: Any | None = None, meta_columns: Iterable[str] | None = None, meta_constants: collections.abc.Mapping[str, Any] | None = None, meta_agg: rhythmic_segments.meta.Aggregator | None = _AGG_COPY, **from_segments_kwargs: Any) RhythmicSegments¶
Create an instance from sequential interval data.
Parameters¶
- intervalsIterable[Any]
Contiguous numeric intervals to window. Inputs containing
np.nandelimiters can be handled by enablingsplit_at_nan.- lengthint
Segment length. Must be at least
2.- split_at_nanbool, optional
If
True(default) split the interval stream onnp.nanboundaries before extraction.- warn_on_short, copy, check_zero_intervalsbool, optional
Forwarded to
extract_segments()(see that function for details).- columnOptional[str], optional
When provided,
intervalsmust be DataFrame-like and the selected column supplies the numeric intervals. All remaining columns are treated as metadata.- metaOptional[Any]
Optional metadata with one row per input interval. Anything that can be converted to
pandas.DataFrameis accepted. Ignored whencolumnis provided. Rows corresponding tonp.nanboundaries are dropped automatically whensplit_at_nanisTrue.- meta_columnsOptional[Iterable[str]], optional
Names of metadata columns to retain. When
Noneall columns are kept.- meta_constantsOptional[Mapping[str, Any]], optional
Constant metadata columns to add to each resulting segment.
- meta_aggAggregator
Aggregation function that converts per-interval metadata into a single record for each produced segment. Defaults to
get_aggregator("copy")().- **from_segments_kwargsAny
Additional keyword arguments forwarded to
from_segments().
Examples¶
>>> rs = RhythmicSegments.from_intervals([0.5, 1.0, 0.75, 1.25], length=2) >>> rs.segments array([[0.5 , 1. ], [1. , 0.75], [0.75, 1.25]], dtype=float32) >>> rs.patterns array([[0.33333334, 0.6666667 ], [0.5714286 , 0.42857143], [0.375 , 0.625 ]], dtype=float32) >>> rs.durations array([1.5 , 1.75, 2. ], dtype=float32)
By default, np.nan values are treated as boundaries between blocks of intervals. Segments are not allowed to cross such boundaries, as in the following example. This behaviour can be disabled using split_at_nan=False.
>>> intervals = [1, 2, 3, np.nan, 4, 5, np.nan, 6, 7, 8] >>> rs = RhythmicSegments.from_intervals(intervals, length=2) >>> rs.segments array([[1., 2.], [2., 3.], [4., 5.], [6., 7.], [7., 8.]], dtype=float32)
You can also pass metadata. It has to have the same shape as the intervals: rows corresponding to NaN intervals will be dropped, essentially. An aggregator function specifies how meta rows for all intervals in a segment are combined into the metadata for that segment. Here is an example where the labels of intervals in a segment are joined by dashes to form a segment label.
>>> intervals = [0.5, 1.0, np.nan, 0.75, 1.0] >>> meta = {'label': ['a', 'b', 'nan', 'c', 'd']} >>> agg = lambda df: {'labels': '-'.join(df['label'])} >>> rs = RhythmicSegments.from_intervals(intervals, length=2, meta=meta, meta_agg=agg) >>> rs.segments array([[0.5 , 1. ], [0.75, 1. ]], dtype=float32) >>> list(rs.meta['labels']) ['a-b', 'c-d']
If the number of intervals is smaller than the segment length, a warning is thrown, this can be turned off using the warn_on_short flag:
>>> RhythmicSegments.from_intervals([1, 2], length=3) Traceback (most recent call last): ... ValueError: Not enough intervals to form a segment of length 3; requires at least 3 intervals.
>>> RhythmicSegments.from_intervals([1, 2], length=2) RhythmicSegments(segment_length=2, n_segments=1, n_meta_cols=0, segments=[[1., 2.]])
- static from_events(events: Iterable[Any], length: int, *, drop_nan: bool = False, column: str | None = None, meta: Any | None = None, meta_columns: Iterable[str] | None = None, meta_constants: collections.abc.Mapping[str, Any] | None = None, interval_meta_agg: rhythmic_segments.meta.Aggregator | None = _AGG_FIRST, segment_meta_agg: rhythmic_segments.meta.Aggregator | None = _AGG_COPY, **from_intervals_kwargs: Any) RhythmicSegments¶
Create an instance from timestamped event data.
Parameters¶
- eventsIterable[Any]
Monotonic (or at least ordered) series of onset timestamps. Must be convertible to
float.- lengthint
Segment length passed to
from_intervals(). Must be at least2.- drop_nanbool, optional
Remove
NaNtimestamps before differencing. WhenFalse(default), the resulting interval stream will containNaNmarkers wherever the original event data did, which in turn act as block boundaries forfrom_intervals().- columnOptional[str], optional
When provided,
eventsmust be DataFrame-like and the specified column supplies the timestamp values. All remaining columns are treated as metadata.- metaOptional[Any]
Optional metadata aligned with the input events. Anything that can be converted to
pandas.DataFrameis accepted. Ignored whencolumnis provided. Whendrop_nan=Truethe rows corresponding to dropped events are removed automatically.- meta_columnsOptional[Iterable[str]], optional
Names of metadata columns to retain. When
Noneall columns are preserved.- meta_constantsOptional[Mapping[str, Any]], optional
Constant metadata columns to add to each resulting segment.
- interval_meta_aggAggregator
Aggregation function that combines metadata for pairs of consecutive events into per-interval records. Defaults to
get_aggregator("first")().- segment_meta_aggAggregator
Forwarded to
from_intervals()to transform per-interval metadata into per-segment records. Defaults toget_aggregator("copy")().- **from_intervals_kwargsAny
Additional keyword arguments forwarded to
from_intervals(), such assplit_at_nanordtype.
Examples¶
>>> events = [0.0, 0.5, 1.0, np.nan, 1.5, 2.0, 2.5] >>> rs = RhythmicSegments.from_events(events, length=2) >>> rs.segments array([[0.5, 0.5], [0.5, 0.5]], dtype=float32)
Segments never span the
np.nanboundary. To discard the boundary entirely, enabledrop_nan=True:>>> RhythmicSegments.from_events(events, length=2, drop_nan=True).segments array([[0.5, 0.5], [0.5, 0.5], [0.5, 0.5], [0.5, 0.5]], dtype=float32)
Note that passing
split_at_nan=Falsewhile retaining theNaNintervals will raise an error becausefrom_intervals()forbids segments crossing the boundary:>>> RhythmicSegments.from_events(events, length=2, split_at_nan=False) Traceback (most recent call last): ... ValueError: Intervals contain NaN values; enable split_at_nan or preprocess via split_into_blocks().
- static concat(*segments: RhythmicSegments, source_col: str | None = None) RhythmicSegments¶
Concatenate multiple
RhythmicSegmentsobjects.Metadata columns are merged using
pandas.concat(); missing values are filled withNaNas usual.Parameters¶
- segmentsRhythmicSegments
Objects to concatenate.
- source_colOptional[str]
Name of a metadata column storing the positional index of the source object.
Nonedisables the column.
Examples¶
>>> rs1 = RhythmicSegments.from_segments([[1, 2]], meta=dict(label=['a'])) >>> rs2 = RhythmicSegments.from_segments([[3, 4]], meta=dict(label=['b'])) >>> merged = RhythmicSegments.concat(rs1, rs2, source_col='source') >>> merged.segments array([[1., 2.], [3., 4.]], dtype=float32) >>> list(merged.meta['source']) [0, 1]
- property count: int¶
Number of stored segments.
- take(idx: numpy.ndarray | List[int]) RhythmicSegments¶
Return a new instance containing only the segments at idx.
>>> rs = RhythmicSegments.from_segments([[1, 2], [3, 4]], meta=dict(id=[0, 1])) >>> rs.take([1]).segments array([[3., 4.]], dtype=float32) >>> list(rs.take([1]).meta['id']) [1]
- filter(mask: numpy.ndarray | pandas.Series) RhythmicSegments¶
Return a new instance containing segments where mask is true.
>>> rs = RhythmicSegments.from_segments([[1, 2], [3, 4]], meta=dict(id=[0, 1])) >>> rs.filter([True, False]).segments array([[1., 2.]], dtype=float32)
- filter_by_duration(min_value: float | None = None, max_value: float | None = None, min_quantile: float | None = None, max_quantile: float | None = None) RhythmicSegments¶
Return a new instance filtered by duration thresholds.
Parameters¶
- min_value, max_valueOptional[float], optional
Absolute duration bounds (inclusive). When supplied, these override the corresponding quantile parameters.
- min_quantile, max_quantileOptional[float], optional
Quantile-based bounds (inclusive) used when explicit
min_valueormax_valueare not provided. PassNoneto disable a bound.
Examples¶
>>> rs = RhythmicSegments.from_segments([[1, 2], [3, 4], [5, 6]]) >>> rs.durations array([ 3., 7., 11.], dtype=float32) >>> short = rs.filter_by_duration(max_quantile=0.5) >>> short.durations array([3., 7.], dtype=float32) >>> rs.filter_by_duration(min_value=8.0).durations array([11.], dtype=float32) >>> rs.filter_by_duration(min_value=3.0, max_value=8.0).durations array([3., 7.], dtype=float32) >>> rs.filter_by_duration() Traceback (most recent call last): ... ValueError: At least one duration bound must be specified
- with_meta(**cols: Any) RhythmicSegments¶
Return a new instance with additional metadata columns.
>>> rs = RhythmicSegments.from_segments([[1, 2], [3, 4]]) >>> rs.with_meta(label=['a', 'b']).meta['label'].tolist() ['a', 'b']
- query(expr: str, **query_kwargs: Any) RhythmicSegments¶
Return a new instance filtered by evaluating expr on the metadata.
>>> rs = RhythmicSegments.from_segments([[1, 2], [3, 4]], meta={'id': [0, 1]}) >>> rs.query('id == 1').segments array([[3., 4.]], dtype=float32)
- shuffle(random_state: int | None = None) RhythmicSegments¶
Return a new instance with rows shuffled uniformly at random.
>>> rs = RhythmicSegments.from_segments([[1, 2], [3, 4]]) >>> rs.shuffle(random_state=3).segments array([[3., 4.], [1., 2.]], dtype=float32)