Handling metadata

RhythmicSegments really shines when you need to handle metadata. Internally, it holds a pandas DataFrame of segments metadata in which every row corresponds to a segment. Metadata will always be kept aligned to the segments, also when taking subsets, filtering, querying and so on. This notebook demonstrates how to handle metadata with RhythmicSegments.

import numpy as np
import pandas as pd

from rhythmic_segments import RhythmicSegments

You can pass metadata using the meta keyword; it will accept anything that can be turned into a DataFrame.

segments = [[2, 8], [.3, .6], [1, 1]]
meta = dict(label=["a", "b", "c"])
rs = RhythmicSegments.from_segments(segments, meta=meta)
rs.meta
label
0 a
1 b
2 c

You can choose which columns in the metadata you want to copy using meta_columns, or pass metadata that’s constant for all segments using meta_constants:

meta = dict(label=["a", "b", "c"], intensity=[.1, .2, .3], foo=['A', 'B', 'C'])
rs = RhythmicSegments.from_segments(
    segments, 
    meta=meta, 
    meta_columns=['label', 'intensity'],
    meta_constants=dict(instrument="guitar")
)
rs.meta
label intensity instrument
0 a 0.1 guitar
1 b 0.2 guitar
2 c 0.3 guitar

Aggregating metadata

When using from_segments, handling metadata is simple: you already have one metadata entry per segment. However, when using from_intervals or from_events things get more complicated. Suppose individual intervals carry annotations (for example, performer labels) and you want to derive segment-level metadata from them. You then have to aggregate interval-level metadata to segment-level metadata, for example by concatenating the labels of all intervals to get a label for the segment. You can supply an aggregator that receives the per-interval metadata for each segment and returns a dictionary describing the segment metadata. Every key in that dictionary becomes a column in the final meta DataFrame.

Suppose we need to aggregate the following interval-level metadata to get the metadata of the corresponding segment:

interval_meta = pd.DataFrame(dict(label=["a", "b", "c"], color=["r", "g", "b"]))
interval_meta
label color
0 a r
1 b g
2 c b

We define a custom aggregator that will create three metadata columns:

  • segment_label: a concatenation of the labels of the intervals in the segment

  • first_label: the label of the first interval in the segment

  • last_color: the color of the last interval

def my_aggregator(interval_meta):
    return {
        "segment_label": "-".join(interval_meta['label']), 
        "first_label": interval_meta.iloc[0]['label'],
        "last_color": interval_meta.iloc[-1]['color']
    }

my_aggregator(interval_meta)
{'segment_label': 'a-b-c', 'first_label': 'a', 'last_color': 'b'}

If you now pass both the metadata and the aggregator to from_intervals, you get segment-level metadata. Note that it respects block boundaries: segments do not cross the np.nan values.

intervals = [1, 2, 3, 4, np.nan, 5, 6, 7]
meta = dict(
  label=['a', 'b', 'c', 'd', np.nan, 'e', 'f', 'g'], 
  color=['A', 'B', 'C', 'D', np.nan, 'E', 'F', 'G']
)
rs = RhythmicSegments.from_intervals(intervals, length=3, meta=meta, meta_agg=my_aggregator)
rs.meta
segment_label first_label last_color
0 a-b-c a C
1 b-c-d b D
2 e-f-g e G

Default aggregators

You can define your own aggregators as above, but you can also use some predefined aggregators using get_aggregator. The "copy" aggregator is the default aggregator and will copy all metadata of all intervals. A suffix is added to indicate the corresponding interval (e.g., color_3 will be the color of the third interval in a segment)

from rhythmic_segments import get_aggregator

agg_copy = get_aggregator("copy")
agg_copy(interval_meta)
{'label_1': 'a',
 'color_1': 'r',
 'label_2': 'b',
 'color_2': 'g',
 'label_3': 'c',
 'color_3': 'b'}

The "first" aggregator simply returns the metadata of the first interval in the segment, the "last" aggregator similarly returns the metadata of the last interval, "index" returns a particular index (0=first, -1=last), and "list" will combine all the metadata of all intervals in a list. All of them accept a columns argument. If set, it will only use those columns. The names argument allows you to rename the resulting columns

agg_first = get_aggregator("first", columns=["color"], names=['first_color'])
agg_first(interval_meta) 
{'first_color': 'r'}
agg_middle = get_aggregator("index", index=1, columns=["label"], names=['middle_label'])
agg_middle(interval_meta) 
{'middle_label': 'b'}

Interval aggregators for from_events

Handling metadata using from_events requires two aggregators: interval_meta_agg to turn event-level metadata into interval-level metadata, and segment_meta_agg to turn that into segment-level metadata. By default, interval_meta_agg will use metadata of the first event as the metadata for the interval.

events = [1, 2, 4, 8, 16, 32]
meta = dict(event_label=["a", "b", "c", "d", "e", "f"])
rs = RhythmicSegments.from_events(events, length=3, meta=meta)
rs.meta
event_label_1 event_label_2 event_label_3
0 a b c
1 b c d
2 c d e

If you wish to combine metadata of both events to create the interval metadata, specify an aggregator. For example, here we create a label by concatenating interval labels with a dash in between. Interval labels themselves are just concatenation of event labels:

interval_meta_agg = get_aggregator('join', separator="")
segment_meta_agg = get_aggregator('join', separator="-", names=["segment_label"])
rs = RhythmicSegments.from_events(
  events, 
  length=3, 
  meta=meta, 
  interval_meta_agg=interval_meta_agg,
  segment_meta_agg=segment_meta_agg)

rs.meta
segment_label
0 ab-bc-cd
1 bc-cd-de
2 cd-de-ef