Xarray now supports grouping by multiple variables (docs). 🎉 😱 🤯 🥳. Try it out!
Install xarray>=2024.09.0
and optionally flox for better performance with reductions.
Simple grouping by multiple categorical variables is easy:
1import xarray as xr 2from xarray.groupers import UniqueGrouper 3 4da = xr.DataArray( 5 np.array([1, 2, 3, 0, 2, np.nan]), 6 dims="d", 7 coords=dict( 8 labels1=("d", np.array(["a", "b", "c", "c", "b", "a"])), 9 labels2=("d", np.array(["x", "y", "z", "z", "y", "x"])), 10 ), 11) 12 13gb = da.groupby(["labels1", "labels2"]) 14gb 15
<DataArrayGroupBy, grouped over 2 grouper(s), 9 groups in total: 'labels1': 3 groups with labels 'a', 'b', 'c' 'labels2': 3 groups with labels 'x', 'y', 'z'>
Reductions work as usual:
1gb.mean() 2
So does map
:
1gb.map(lambda x: x[0]) 2
Grouping by multiple /virtual/ variables like "time.month"
is also supported:
1import xarray as xr 2 3ds = xr.tutorial.open_dataset("air_temperature") 4ds.groupby(["time.year", "time.month"]).mean() 5
The above syntax da.groupby(["labels1", "labels2"])
is a short cut for using Grouper objects.
1da.groupby(labels1=UniqueGrouper(), labels2=UniqueGrouper()) 2
Grouper objects allow you to express more complicated GroupBy problems.
For example, combining different grouper types is allowed.
That is you can combine categorical grouping with UniqueGrouper
,
binning with BinGrouper
, and
resampling with TimeResampler
.
1from xarray.groupers import BinGrouper 2 3ds = xr.Dataset( 4 {"foo": (("x", "y"), np.arange(12).reshape((4, 3)))}, 5 coords={"x": [10, 20, 30, 40], "letters": ("x", list("abba"))}, 6 ) 7gb = ds.groupby(x=BinGrouper(bins=[5, 15, 25]), letters=UniqueGrouper()) 8gb 9
<DatasetGroupBy, grouped over 2 grouper(s), 4 groups in total: 'x_bins': 2 groups with labels (5,, 15], (15,, 25] 'letters': 2 groups with labels 'a', 'b'>
Now reduce as usual
1gb.mean() 2