Monday, September 2nd, 2024 (about 1 year ago)
Xarray now supports grouping by multiple variables (docs). 🎉 😱 🤯 🥳. Try it out!
Install xarray>=2024.09.0 and optionally flox for better performance with reductions.
Simple grouping by multiple categorical variables is easy:
1import xarray as xr 2from xarray.groupers import UniqueGrouper 3 4da = xr.DataArray( 5 np.array([1, 2, 3, 0, 2, np.nan]), 6 dims="d", 7 coords=dict( 8 labels1=("d", np.array(["a", "b", "c", "c", "b", "a"])), 9 labels2=("d", np.array(["x", "y", "z", "z", "y", "x"])), 10 ), 11) 12 13gb = da.groupby(["labels1", "labels2"]) 14gb 15
<DataArrayGroupBy, grouped over 2 grouper(s), 9 groups in total: 'labels1': 3 groups with labels 'a', 'b', 'c' 'labels2': 3 groups with labels 'x', 'y', 'z'>
Reductions work as usual:
1gb.mean() 2
So does map:
1gb.map(lambda x: x[0]) 2
Grouping by multiple /virtual/ variables like "time.month" is also supported:
1import xarray as xr 2 3ds = xr.tutorial.open_dataset("air_temperature") 4ds.groupby(["time.year", "time.month"]).mean() 5
The above syntax da.groupby(["labels1", "labels2"]) is a short cut for using Grouper objects.
1da.groupby(labels1=UniqueGrouper(), labels2=UniqueGrouper()) 2
Grouper objects allow you to express more complicated GroupBy problems.
For example, combining different grouper types is allowed.
That is you can combine categorical grouping with UniqueGrouper,
binning with BinGrouper, and
resampling with TimeResampler.
1from xarray.groupers import BinGrouper 2 3ds = xr.Dataset( 4 {"foo": (("x", "y"), np.arange(12).reshape((4, 3)))}, 5 coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}, 6 ) 7gb = ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()) 8gb 9
<DatasetGroupBy, grouped over 2 grouper(s), 4 groups in total: 'x_bins': 2 groups with labels (5,, 15], (15,, 25] 'letters': 2 groups with labels 'a', 'b'>
Now reduce as usual
1gb.mean() 2