1 year ago
#379784
lusev
xarray.Dataset assignement using .loc fails for DataArray containing non-standard calendar cftime data
I am working with several models from CMIP6 (data available here: https://esgf-node.llnl.gov/projects/cmip6/ ) which are stored under netcdf files on an archive and am manipulating them using xarray. I was first trying to open them with xarray.open_mfdataset()
using the following code:
datestart = '1980-10-01'
dateend = '2010-03-31'
dat = xr.open_mfdataset(filepath, coords="minimal", join="override", decode_times=True, use_cftime=True).sel(time=slice(datestart, dateend))
Which then led to some issues as some of them are using non-standard calendars (i.e. no leap years, 360 days,...). For instance, trying to read the sfcWindmax
variable from the r1i1p1f2
member from the UKESM1-0-LL
(a model with a 360Day-years calendar) raises the following ValueError
:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-54-8085c4e5f17d> in <module>
----> 1 dat = xr.open_mfdataset(filepath, coords="minimal", join="override", decode_times=True, use_cftime=True).sel(time=slice(datestart, dateend))
~/.local/lib/python3.9/site-packages/xarray/core/dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
2499 """
2500 indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 2501 pos_indexers, new_indexes = remap_label_indexers(
2502 self, indexers=indexers, method=method, tolerance=tolerance
2503 )
~/.local/lib/python3.9/site-packages/xarray/core/coordinates.py in remap_label_indexers(obj, indexers, method, tolerance, **indexers_kwargs)
419 }
420
--> 421 pos_indexers, new_indexes = indexing.remap_label_indexers(
422 obj, v_indexers, method=method, tolerance=tolerance
423 )
~/.local/lib/python3.9/site-packages/xarray/core/indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance)
119 for dim, index in indexes.items():
120 labels = grouped_indexers[dim]
--> 121 idxr, new_idx = index.query(labels, method=method, tolerance=tolerance)
122 pos_indexers[dim] = idxr
123 if new_idx is not None:
~/.local/lib/python3.9/site-packages/xarray/core/indexes.py in query(self, labels, method, tolerance)
206
207 if isinstance(label, slice):
--> 208 indexer = _query_slice(self.index, label, coord_name, method, tolerance)
209 elif is_dict_like(label):
210 raise ValueError(
~/.local/lib/python3.9/site-packages/xarray/core/indexes.py in _query_slice(index, label, coord_name, method, tolerance)
95 "cannot use ``method`` argument if any indexers are slice objects"
96 )
---> 97 indexer = index.slice_indexer(
98 _sanitize_slice_element(label.start),
99 _sanitize_slice_element(label.stop),
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/pandas/core/indexes/base.py in slice_indexer(self, start, end, step, kind)
5275 slice(1, 3, None)
5276 """
-> 5277 start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)
5278
5279 # return a slice
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/pandas/core/indexes/base.py in slice_locs(self, start, end, step, kind)
5480 end_slice = None
5481 if end is not None:
-> 5482 end_slice = self.get_slice_bound(end, "right", kind)
5483 if end_slice is None:
5484 end_slice = len(self)
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_slice_bound(self, label, side, kind)
5384 # For datetime indices label may be a string that has to be converted
5385 # to datetime boundary according to its resolution.
-> 5386 label = self._maybe_cast_slice_bound(label, side, kind)
5387
5388 # we need to look up the label
~/.local/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in _maybe_cast_slice_bound(self, label, side, kind)
478 return label
479
--> 480 parsed, resolution = _parse_iso8601_with_reso(self.date_type, label)
481 start, end = _parsed_string_to_bounds(self.date_type, resolution, parsed)
482 if self.is_monotonic_decreasing and len(self) > 1:
~/.local/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in _parse_iso8601_with_reso(date_type, timestr)
135 replace[attr] = int(value)
136 resolution = attr
--> 137 return default.replace(**replace), resolution
138
139
cftime/_cftime.pyx in cftime._cftime.datetime.replace()
cftime/_cftime.pyx in cftime._cftime.Datetime360Day.__init__()
cftime/_cftime.pyx in cftime._cftime.assert_valid_date()
ValueError: invalid day number provided in cftime.Datetime360Day(2010, 3, 31, 0, 0, 0, 0)
I then tried to first standardize the calendar using xarray.Dataset.convert_calendar()
and then interpolate it using xaray.Dataset.interp
with a standard xarray.cftime_range
as the target index (following the xarray documentation: https://docs.xarray.dev/en/stable/user-guide/weather-climate.html ):
def read_cmip6(filepath,datestart,dateend):
#open netcdf dataset
dat = xr.open_mfdataset(filepath, coords="minimal", join="override", decode_times = True, use_cftime=True)
#convert calendar to standard, setting missing values as NaNs
dat = dat.convert_calendar("standard", use_cftime=True, align_on="date",missing=np.nan)
#interpolate the dataset using cftim_range
dateidx = xr.cftime_range(datestart,dateend,freq='D',calendar="standard")
dat = dat.interp(time=dateidx,method="nearest")
#take slice
dat = dat.sel(time=slice(datestart, dateend))
return dat
datUKESM1 = read_cmip6(filepath,datestart,dateend)
This did not raise any error and yielded the following dataset:
<xarray.Dataset>
Dimensions: (time: 10774, lat: 144, bnds: 2, lon: 192)
Coordinates:
* lat (lat) float64 -89.38 -88.12 -86.88 -85.62 ... 86.88 88.12 89.38
* lon (lon) float64 0.9375 2.812 4.688 6.562 ... 355.3 357.2 359.1
height float64 10.0
* time (time) object 1980-10-01 00:00:00 ... 2010-03-31 00:00:00
Dimensions without coordinates: bnds
Data variables:
lat_bnds (time, lat, bnds) float64 dask.array<chunksize=(10774, 144, 2), meta=np.ndarray>
lon_bnds (time, lon, bnds) float64 dask.array<chunksize=(10774, 192, 2), meta=np.ndarray>
sfcWindmax (time, lat, lon) float32 dask.array<chunksize=(10774, 144, 192), meta=np.ndarray>
time_bnds (time, bnds) object dask.array<chunksize=(10774, 2), meta=np.ndarray>
Attributes: (12/46)
Conventions: CF-1.7 CMIP-6.2
activity_id: CMIP
branch_method: standard
branch_time_in_child: 0.0
branch_time_in_parent: 144000.0
creation_date: 2019-06-24T15:04:56Z
... ...
title: UKESM1-0-LL output prepared for CMIP6
variable_id: sfcWindmax
variant_label: r1i1p1f2
license: CMIP6 model data produced by the Met Office Hadle...
cmor_version: 3.4.0
tracking_id: hdl:21.14100/3d7355e1-c91e-46fd-8dca-2b08da8c57d6
with the following time index:
<xarray.DataArray 'time' (time: 10774)>
array([cftime.DatetimeGregorian(1980, 10, 1, 0, 0, 0, 0),
cftime.DatetimeGregorian(1980, 10, 2, 0, 0, 0, 0),
cftime.DatetimeGregorian(1980, 10, 3, 0, 0, 0, 0), ...,
cftime.DatetimeGregorian(2010, 3, 29, 0, 0, 0, 0),
cftime.DatetimeGregorian(2010, 3, 30, 0, 0, 0, 0),
cftime.DatetimeGregorian(2010, 3, 31, 0, 0, 0, 0)], dtype=object)
Coordinates:
height float64 10.0
* time (time) object 1980-10-01 00:00:00 ... 2010-03-31 00:00:00
Attributes:
bounds: time_bnds
axis: T
long_name: time
standard_name: time
I then carried on, re-interpolated to a regular grid and tried to save it to a new empty dataset, in order to gather more than 1 member into a single dataset:
#constants
var = 'sfcWindmax' #variable
nmems = 3 # number of members
memout = np.arange(0,nmems) # member range
daysid = xr.cftime_range(datestart,dateend,freq='D',calendar='standard') #days index
ndays = len(daysid) #nb of days
#read netcdf
u = read_cmip6(filepath, datestart, dateend)
#u = xr.open_mfdataset(filepath, coords="minimal", join="override", decode_times = True, use_cftime=True).sel(time=slice(datestart, dateend))
#select variable
utemp = u[var]
#get lat and lon
latout = utemp.lat
lonout = utemp.lon
#initiate empty dataframe
uzmem = xr.DataArray(np.zeros([ ndays , latout.size, lonout.size, nmems]), coords=[daysid, latout, lonout, memout],
dims=['day','lat','lon','member'], name="historical")
uzmem.loc[dict(member=0)]=utemp
But the assignment using the .loc
keyword led to the following error:
---------------------------------------------------------------------------
UFuncTypeError Traceback (most recent call last)
<ipython-input-73-99f7ff0346e5> in <module>
21
22 #interpolate to a regular gri
---> 23 uzmem.loc[dict(member=0)]=utemp
~/.local/lib/python3.9/site-packages/xarray/core/dataarray.py in __setitem__(self, key, value)
206
207 pos_indexers, _ = remap_label_indexers(self.data_array, key)
--> 208 self.data_array[pos_indexers] = value
209
210
~/.local/lib/python3.9/site-packages/xarray/core/dataarray.py in __setitem__(self, key, value)
756 for k, v in self._item_key_to_dict(key).items()
757 }
--> 758 self.variable[key] = value
759
760 def __delitem__(self, key: Any) -> None:
~/.local/lib/python3.9/site-packages/xarray/core/variable.py in __setitem__(self, key, value)
854
855 indexable = as_indexable(self._data)
--> 856 indexable[index_tuple] = value
857
858 @property
~/.local/lib/python3.9/site-packages/xarray/core/indexing.py in __setitem__(self, key, value)
1164 array, key = self._indexing_array_and_key(key)
1165 try:
-> 1166 array[key] = value
1167 except ValueError:
1168 # More informative exception if read-only view
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/array/core.py in __array__(self, dtype, **kwargs)
1474
1475 def __array__(self, dtype=None, **kwargs):
-> 1476 x = self.compute()
1477 if dtype and x.dtype != dtype:
1478 x = x.astype(dtype)
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/base.py in compute(self, **kwargs)
282 dask.base.compute
283 """
--> 284 (result,) = compute(self, traverse=False, **kwargs)
285 return result
286
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/base.py in compute(*args, **kwargs)
564 postcomputes.append(x.__dask_postcompute__())
565
--> 566 results = schedule(dsk, keys, **kwargs)
567 return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
568
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
77 pool = MultiprocessingPoolExecutor(pool)
78
---> 79 results = get_async(
80 pool.submit,
81 pool._max_workers,
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/local.py in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
512 _execute_task(task, data) # Re-execute locally
513 else:
--> 514 raise_exception(exc, tb)
515 res, worker_id = loads(res_info)
516 state["cache"][key] = res
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/local.py in reraise(exc, tb)
323 if exc.__traceback__ is not tb:
324 raise exc.with_traceback(tb)
--> 325 raise exc
326
327
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
221 try:
222 task, data = loads(task_info)
--> 223 result = _execute_task(task, data)
224 id = get_id()
225 result = dumps((result, id))
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
119 # temporaries by their reference count and can execute certain
120 # operations in-place.
--> 121 return func(*(_execute_task(a, cache) for a in args))
122 elif not ishashable(arg):
123 return arg
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/core.py in <genexpr>(.0)
119 # temporaries by their reference count and can execute certain
120 # operations in-place.
--> 121 return func(*(_execute_task(a, cache) for a in args))
122 elif not ishashable(arg):
123 return arg
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
119 # temporaries by their reference count and can execute certain
120 # operations in-place.
--> 121 return func(*(_execute_task(a, cache) for a in args))
122 elif not ishashable(arg):
123 return arg
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/optimization.py in __call__(self, *args)
961 if not len(args) == len(self.inkeys):
962 raise ValueError("Expected %d args, got %d" % (len(self.inkeys), len(args)))
--> 963 return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
964
965 def __reduce__(self):
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/core.py in get(dsk, out, cache)
149 for key in toposort(dsk):
150 task = dsk[key]
--> 151 result = _execute_task(task, cache)
152 cache[key] = result
153 result = _execute_task(out, cache)
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
119 # temporaries by their reference count and can execute certain
120 # operations in-place.
--> 121 return func(*(_execute_task(a, cache) for a in args))
122 elif not ishashable(arg):
123 return arg
/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/utils.py in apply(func, args, kwargs)
32 def apply(func, args, kwargs=None):
33 if kwargs:
---> 34 return func(*args, **kwargs)
35 else:
36 return func(*args)
~/.local/lib/python3.9/site-packages/xarray/core/missing.py in _dask_aware_interpnd(var, interp_func, interp_kwargs, localize, *coords)
822 var = var.data
823
--> 824 return _interpnd(var, x, new_x, interp_func, interp_kwargs)
825
826
~/.local/lib/python3.9/site-packages/xarray/core/missing.py in _interpnd(var, x, new_x, func, kwargs)
778
779 def _interpnd(var, x, new_x, func, kwargs):
--> 780 x, new_x = _floatize_x(x, new_x)
781
782 if len(x) == 1:
~/.local/lib/python3.9/site-packages/xarray/core/missing.py in _floatize_x(x, new_x)
588 xmin = x[i].values.min()
589 x[i] = x[i]._to_numeric(offset=xmin, dtype=np.float64)
--> 590 new_x[i] = new_x[i]._to_numeric(offset=xmin, dtype=np.float64)
591 return x, new_x
592
~/.local/lib/python3.9/site-packages/xarray/core/variable.py in _to_numeric(self, offset, datetime_unit, dtype)
2484 See duck_array_ops.datetime_to_numeric
2485 """
-> 2486 numeric_array = duck_array_ops.datetime_to_numeric(
2487 self.data, offset, datetime_unit, dtype
2488 )
~/.local/lib/python3.9/site-packages/xarray/core/duck_array_ops.py in datetime_to_numeric(array, offset, datetime_unit, dtype)
432 # For np.datetime64, this can silently yield garbage due to overflow.
433 # One option is to enforce 1970-01-01 as the universal offset.
--> 434 array = array - offset
435
436 # Scalar is converted to 0d-array
UFuncTypeError: ufunc 'subtract' cannot use operands with types dtype('O') and dtype('<M8[ns]')
However, I get no errors when trying the same thing with a model which is using a standard calendar (e.g. member r10i1p1f2
from the CNRM-CM6-1
model) and calling directly
u = xr.open_mfdataset(filepath, coords="minimal", join="override", decode_times = True, use_cftime=True).sel(time=slice(datestart, dateend))
instead of my read_cmip6
function. It seems then that the .loc
is not happy with whatever datetime object that came out of the xarray.convert_calendar
function, even though the time indexes of the data look exactly similar:
<xarray.DataArray 'time' (time: 10774)>
array([cftime.DatetimeGregorian(1980, 10, 1, 12, 0, 0, 0),
cftime.DatetimeGregorian(1980, 10, 2, 12, 0, 0, 0),
cftime.DatetimeGregorian(1980, 10, 3, 12, 0, 0, 0), ...,
cftime.DatetimeGregorian(2010, 3, 29, 12, 0, 0, 0),
cftime.DatetimeGregorian(2010, 3, 30, 12, 0, 0, 0),
cftime.DatetimeGregorian(2010, 3, 31, 12, 0, 0, 0)], dtype=object)
Coordinates:
height float64 10.0
* time (time) object 1980-10-01 12:00:00 ... 2010-03-31 12:00:00
Attributes:
axis: T
standard_name: time
long_name: Time axis
time_origin: 1850-01-01 00:00:00
bounds: time_bounds
using u = xr.open_mfdataset(filepath, coords="minimal", join="override", decode_times = True, use_cftime=True).sel(time=slice(datestart, dateend))
.
Vs:
<xarray.DataArray 'time' (time: 10774)>
array([cftime.DatetimeGregorian(1980, 10, 1, 0, 0, 0, 0),
cftime.DatetimeGregorian(1980, 10, 2, 0, 0, 0, 0),
cftime.DatetimeGregorian(1980, 10, 3, 0, 0, 0, 0), ...,
cftime.DatetimeGregorian(2010, 3, 29, 0, 0, 0, 0),
cftime.DatetimeGregorian(2010, 3, 30, 0, 0, 0, 0),
cftime.DatetimeGregorian(2010, 3, 31, 0, 0, 0, 0)], dtype=object)
Coordinates:
height float64 10.0
* time (time) object 1980-10-01 00:00:00 ... 2010-03-31 00:00:00
Attributes:
axis: T
standard_name: time
long_name: Time axis
time_origin: 1850-01-01 00:00:00
bounds: time_bounds
using u = read_cmip6(filepath, datestart, dateend)
.
I tried converting to every other calendar type available within the cftime library and also to force the conversion to numpy.datetime64[ns]
using .astype('datetime64[ns]')
(see: https://bobhowto.wordpress.com/2020/04/02/why-i-hate-python-ufunc-subtract-cannot-use-operands-with-types-dtypem8ns-and-dtypeo/#8230) but this did not solve the error.
Does anyone have any idea where this issue is arising from and how I could fix this?
Any help would be much appreciated.
datetime
calendar
netcdf
python-xarray
weather
0 Answers
Your Answer