Operations

RhythmicSegments supports several convenience operations, such as selecting subsets, filtering, querying metadata, and shuffling. Each of these returns a new instance with the metadata kept in sync.

from rhythmic_segments import RhythmicSegments

Taking subsets

intervals = [1, 2, 3, 4, 5, 6, 7, 8, 9]
meta = { "label": list('abcdefghi') }
rs = RhythmicSegments.from_intervals(intervals, length=2, meta=meta)
rs.segments
array([[1., 2.],
       [2., 3.],
       [3., 4.],
       [4., 5.],
       [5., 6.],
       [6., 7.],
       [7., 8.],
       [8., 9.]], dtype=float32)
# You can do very basic indexing: select the first three segments:
head = rs.take([0, 1, 2])
head.segments
array([[1., 2.],
       [2., 3.],
       [3., 4.]], dtype=float32)

Metadata is preserved as well:

head.meta
label_1 label_2
0 a b
1 b c
2 c d

Filtering & querying

You can also filter segments based on some condition. For example, selecting all segments with a certain duration:

rs.filter(rs.durations > 10).segments
array([[5., 6.],
       [6., 7.],
       [7., 8.],
       [8., 9.]], dtype=float32)

Since this is a common use case, filter_by_duration allows you to filter the duration using a min/max_value, or using a quantile set by min/max_quantile:

# Remove all segments with a duration below 6
rs.filter_by_duration(min_value=6).segments
array([[3., 4.],
       [4., 5.],
       [5., 6.],
       [6., 7.],
       [7., 8.],
       [8., 9.]], dtype=float32)
# Keep only the shortest 80% of the segments
rs.filter_by_duration(max_quantile=.8).segments
array([[1., 2.],
       [2., 3.],
       [3., 4.],
       [4., 5.],
       [5., 6.],
       [6., 7.]], dtype=float32)

You can also query metadata using pandas query syntax. For example, selecting all segments whose label contains the letter ‘b’:

rs.query("label_1.str.contains('b') | label_2.str.contains('b')").meta
label_1 label_2
0 a b
1 b c

Shuffling

Or shuffle segments:

rs.shuffle(random_state=1).segments
array([[6., 7.],
       [1., 2.],
       [2., 3.],
       [5., 6.],
       [3., 4.],
       [7., 8.],
       [4., 5.],
       [8., 9.]], dtype=float32)