Handling metadata¶

RhythmicSegments really shines when you need to handle metadata. Internally, it holds a pandas DataFrame of segments metadata in which every row corresponds to a segment. Metadata will always be kept aligned to the segments, also when taking subsets, filtering, querying and so on. This notebook demonstrates how to handle metadata with RhythmicSegments.

import numpy as np
import pandas as pd

from rhythmic_segments import RhythmicSegments

You can pass metadata using the meta keyword; it will accept anything that can be turned into a DataFrame.

segments = [[2, 8], [.3, .6], [1, 1]]
meta = dict(label=["a", "b", "c"])
rs = RhythmicSegments.from_segments(segments, meta=meta)
rs.meta

	label
0	a
1	b
2	c

You can choose which columns in the metadata you want to copy using meta_columns, or pass metadata that’s constant for all segments using meta_constants:

meta = dict(label=["a", "b", "c"], intensity=[.1, .2, .3], foo=['A', 'B', 'C'])
rs = RhythmicSegments.from_segments(
    segments, 
    meta=meta, 
    meta_columns=['label', 'intensity'],
    meta_constants=dict(instrument="guitar")
)
rs.meta

	label	intensity	instrument
0	a	0.1	guitar
1	b	0.2	guitar
2	c	0.3	guitar

Aggregating metadata¶

When using from_segments, handling metadata is simple: you already have one metadata entry per segment. However, when using from_intervals or from_events things get more complicated. Suppose individual intervals carry annotations (for example, performer labels) and you want to derive segment-level metadata from them. You then have to aggregate interval-level metadata to segment-level metadata, for example by concatenating the labels of all intervals to get a label for the segment. You can supply an aggregator that receives the per-interval metadata for each segment and returns a dictionary describing the segment metadata. Every key in that dictionary becomes a column in the final meta DataFrame.

Suppose we need to aggregate the following interval-level metadata to get the metadata of the corresponding segment:

interval_meta = pd.DataFrame(dict(label=["a", "b", "c"], color=["r", "g", "b"]))
interval_meta

	label	color
0	a	r
1	b	g
2	c	b

We define a custom aggregator that will create three metadata columns:

segment_label: a concatenation of the labels of the intervals in the segment
first_label: the label of the first interval in the segment
last_color: the color of the last interval

def my_aggregator(interval_meta):
    return {
        "segment_label": "-".join(interval_meta['label']), 
        "first_label": interval_meta.iloc[0]['label'],
        "last_color": interval_meta.iloc[-1]['color']
    }

my_aggregator(interval_meta)

{'segment_label': 'a-b-c', 'first_label': 'a', 'last_color': 'b'}

If you now pass both the metadata and the aggregator to from_intervals, you get segment-level metadata. Note that it respects block boundaries: segments do not cross the np.nan values.

intervals = [1, 2, 3, 4, np.nan, 5, 6, 7]
meta = dict(
  label=['a', 'b', 'c', 'd', np.nan, 'e', 'f', 'g'], 
  color=['A', 'B', 'C', 'D', np.nan, 'E', 'F', 'G']
)
rs = RhythmicSegments.from_intervals(intervals, length=3, meta=meta, meta_agg=my_aggregator)
rs.meta

	segment_label	first_label	last_color
0	a-b-c	a	C
1	b-c-d	b	D
2	e-f-g	e	G

Default aggregators¶

You can define your own aggregators as above, but you can also use some predefined aggregators using get_aggregator. The "copy" aggregator is the default aggregator and will copy all metadata of all intervals. A suffix is added to indicate the corresponding interval (e.g., color_3 will be the color of the third interval in a segment)

from rhythmic_segments import get_aggregator

agg_copy = get_aggregator("copy")
agg_copy(interval_meta)

{'label_1': 'a',
 'color_1': 'r',
 'label_2': 'b',
 'color_2': 'g',
 'label_3': 'c',
 'color_3': 'b'}

The "first" aggregator simply returns the metadata of the first interval in the segment, the "last" aggregator similarly returns the metadata of the last interval, "index" returns a particular index (0=first, -1=last), and "list" will combine all the metadata of all intervals in a list. All of them accept a columns argument. If set, it will only use those columns. The names argument allows you to rename the resulting columns

agg_first = get_aggregator("first", columns=["color"], names=['first_color'])
agg_first(interval_meta) 

{'first_color': 'r'}

agg_middle = get_aggregator("index", index=1, columns=["label"], names=['middle_label'])
agg_middle(interval_meta) 

{'middle_label': 'b'}

Interval aggregators for `from_events`¶

Handling metadata using from_events requires two aggregators: interval_meta_agg to turn event-level metadata into interval-level metadata, and segment_meta_agg to turn that into segment-level metadata. By default, interval_meta_agg will use metadata of the first event as the metadata for the interval.

events = [1, 2, 4, 8, 16, 32]
meta = dict(event_label=["a", "b", "c", "d", "e", "f"])
rs = RhythmicSegments.from_events(events, length=3, meta=meta)
rs.meta

	event_label_1	event_label_2	event_label_3
0	a	b	c
1	b	c	d
2	c	d	e

If you wish to combine metadata of both events to create the interval metadata, specify an aggregator. For example, here we create a label by concatenating interval labels with a dash in between. Interval labels themselves are just concatenation of event labels:

interval_meta_agg = get_aggregator('join', separator="")
segment_meta_agg = get_aggregator('join', separator="-", names=["segment_label"])
rs = RhythmicSegments.from_events(
  events, 
  length=3, 
  meta=meta, 
  interval_meta_agg=interval_meta_agg,
  segment_meta_agg=segment_meta_agg)

rs.meta

	segment_label
0	ab-bc-cd
1	bc-cd-de
2	cd-de-ef

Handling metadata¶

Aggregating metadata¶

Default aggregators¶

Interval aggregators for from_events¶

Interval aggregators for `from_events`¶