Operations¶
RhythmicSegments
supports several convenience operations, such as selecting subsets, filtering, querying metadata, and shuffling. Each of these returns a new instance with the metadata kept in sync.
from rhythmic_segments import RhythmicSegments
Taking subsets¶
intervals = [1, 2, 3, 4, 5, 6, 7, 8, 9]
meta = { "label": list('abcdefghi') }
rs = RhythmicSegments.from_intervals(intervals, length=2, meta=meta)
rs.segments
array([[1., 2.],
[2., 3.],
[3., 4.],
[4., 5.],
[5., 6.],
[6., 7.],
[7., 8.],
[8., 9.]], dtype=float32)
# You can do very basic indexing: select the first three segments:
head = rs.take([0, 1, 2])
head.segments
array([[1., 2.],
[2., 3.],
[3., 4.]], dtype=float32)
Metadata is preserved as well:
head.meta
label_1 | label_2 | |
---|---|---|
0 | a | b |
1 | b | c |
2 | c | d |
Filtering & querying¶
You can also filter segments based on some condition. For example, selecting all segments with a certain duration:
rs.filter(rs.durations > 10).segments
array([[5., 6.],
[6., 7.],
[7., 8.],
[8., 9.]], dtype=float32)
Since this is a common use case, filter_by_duration
allows you to filter the duration using a min/max_value
, or using a quantile set by min/max_quantile
:
# Remove all segments with a duration below 6
rs.filter_by_duration(min_value=6).segments
array([[3., 4.],
[4., 5.],
[5., 6.],
[6., 7.],
[7., 8.],
[8., 9.]], dtype=float32)
# Keep only the shortest 80% of the segments
rs.filter_by_duration(max_quantile=.8).segments
array([[1., 2.],
[2., 3.],
[3., 4.],
[4., 5.],
[5., 6.],
[6., 7.]], dtype=float32)
You can also query metadata using pandas query syntax. For example, selecting all segments whose label contains the letter ‘b’:
rs.query("label_1.str.contains('b') | label_2.str.contains('b')").meta
label_1 | label_2 | |
---|---|---|
0 | a | b |
1 | b | c |
Shuffling¶
Or shuffle segments:
rs.shuffle(random_state=1).segments
array([[6., 7.],
[1., 2.],
[2., 3.],
[5., 6.],
[3., 4.],
[7., 8.],
[4., 5.],
[8., 9.]], dtype=float32)