Handling metadata¶
RhythmicSegments really shines when you need to handle metadata. Internally, it holds a pandas DataFrame of segments metadata in which every row corresponds to a segment. Metadata will always be kept aligned to the segments, also when taking subsets, filtering, querying and so on. This notebook demonstrates how to handle metadata with RhythmicSegments.
import numpy as np
import pandas as pd
from rhythmic_segments import RhythmicSegments
You can pass metadata using the meta keyword; it will accept anything that can be turned into a DataFrame.
segments = [[2, 8], [.3, .6], [1, 1]]
meta = dict(label=["a", "b", "c"])
rs = RhythmicSegments.from_segments(segments, meta=meta)
rs.meta
| label | |
|---|---|
| 0 | a |
| 1 | b |
| 2 | c |
You can choose which columns in the metadata you want to copy using meta_columns, or pass metadata that’s constant for all segments using meta_constants:
meta = dict(label=["a", "b", "c"], intensity=[.1, .2, .3], foo=['A', 'B', 'C'])
rs = RhythmicSegments.from_segments(
segments,
meta=meta,
meta_columns=['label', 'intensity'],
meta_constants=dict(instrument="guitar")
)
rs.meta
| label | intensity | instrument | |
|---|---|---|---|
| 0 | a | 0.1 | guitar |
| 1 | b | 0.2 | guitar |
| 2 | c | 0.3 | guitar |
Aggregating metadata¶
When using from_segments, handling metadata is simple: you already have one metadata entry per segment. However, when using from_intervals or from_events things get more complicated. Suppose individual intervals carry annotations (for example, performer labels) and you want to derive segment-level metadata from them. You then have to aggregate interval-level metadata to segment-level metadata, for example by concatenating the labels of all intervals to get a label for the segment.
You can supply an aggregator that receives the per-interval metadata for each segment and returns a dictionary describing the segment metadata. Every key in that dictionary becomes a column in the final meta DataFrame.
Suppose we need to aggregate the following interval-level metadata to get the metadata of the corresponding segment:
interval_meta = pd.DataFrame(dict(label=["a", "b", "c"], color=["r", "g", "b"]))
interval_meta
| label | color | |
|---|---|---|
| 0 | a | r |
| 1 | b | g |
| 2 | c | b |
We define a custom aggregator that will create three metadata columns:
segment_label: a concatenation of the labels of the intervals in the segmentfirst_label: the label of the first interval in the segmentlast_color: the color of the last interval
def my_aggregator(interval_meta):
return {
"segment_label": "-".join(interval_meta['label']),
"first_label": interval_meta.iloc[0]['label'],
"last_color": interval_meta.iloc[-1]['color']
}
my_aggregator(interval_meta)
{'segment_label': 'a-b-c', 'first_label': 'a', 'last_color': 'b'}
If you now pass both the metadata and the aggregator to from_intervals, you get segment-level metadata. Note that it respects block boundaries: segments do not cross the np.nan values.
intervals = [1, 2, 3, 4, np.nan, 5, 6, 7]
meta = dict(
label=['a', 'b', 'c', 'd', np.nan, 'e', 'f', 'g'],
color=['A', 'B', 'C', 'D', np.nan, 'E', 'F', 'G']
)
rs = RhythmicSegments.from_intervals(intervals, length=3, meta=meta, meta_agg=my_aggregator)
rs.meta
| segment_label | first_label | last_color | |
|---|---|---|---|
| 0 | a-b-c | a | C |
| 1 | b-c-d | b | D |
| 2 | e-f-g | e | G |
Default aggregators¶
You can define your own aggregators as above, but you can also use some predefined aggregators using get_aggregator. The "copy" aggregator is the default aggregator and will copy all metadata of all intervals. A suffix is added to indicate the corresponding interval (e.g., color_3 will be the color of the third interval in a segment)
from rhythmic_segments import get_aggregator
agg_copy = get_aggregator("copy")
agg_copy(interval_meta)
{'label_1': 'a',
'color_1': 'r',
'label_2': 'b',
'color_2': 'g',
'label_3': 'c',
'color_3': 'b'}
The "first" aggregator simply returns the metadata of the first interval in the segment, the "last" aggregator similarly returns the metadata of the last interval, "index" returns a particular index (0=first, -1=last), and "list" will combine all the metadata of all intervals in a list. All of them accept a columns argument. If set, it will only use those columns. The names argument allows you to rename the resulting columns
agg_first = get_aggregator("first", columns=["color"], names=['first_color'])
agg_first(interval_meta)
{'first_color': 'r'}
agg_middle = get_aggregator("index", index=1, columns=["label"], names=['middle_label'])
agg_middle(interval_meta)
{'middle_label': 'b'}
Interval aggregators for from_events¶
Handling metadata using from_events requires two aggregators: interval_meta_agg to turn event-level metadata into interval-level metadata, and segment_meta_agg to turn that into segment-level metadata. By default, interval_meta_agg will use metadata of the first event as the metadata for the interval.
events = [1, 2, 4, 8, 16, 32]
meta = dict(event_label=["a", "b", "c", "d", "e", "f"])
rs = RhythmicSegments.from_events(events, length=3, meta=meta)
rs.meta
| event_label_1 | event_label_2 | event_label_3 | |
|---|---|---|---|
| 0 | a | b | c |
| 1 | b | c | d |
| 2 | c | d | e |
If you wish to combine metadata of both events to create the interval metadata, specify an aggregator. For example, here we create a label by concatenating interval labels with a dash in between. Interval labels themselves are just concatenation of event labels:
interval_meta_agg = get_aggregator('join', separator="")
segment_meta_agg = get_aggregator('join', separator="-", names=["segment_label"])
rs = RhythmicSegments.from_events(
events,
length=3,
meta=meta,
interval_meta_agg=interval_meta_agg,
segment_meta_agg=segment_meta_agg)
rs.meta
| segment_label | |
|---|---|
| 0 | ab-bc-cd |
| 1 | bc-cd-de |
| 2 | cd-de-ef |