1 year ago

#379784

test-img

lusev

xarray.Dataset assignement using .loc fails for DataArray containing non-standard calendar cftime data

I am working with several models from CMIP6 (data available here: https://esgf-node.llnl.gov/projects/cmip6/ ) which are stored under netcdf files on an archive and am manipulating them using xarray. I was first trying to open them with xarray.open_mfdataset() using the following code:

datestart = '1980-10-01'
dateend = '2010-03-31'
dat = xr.open_mfdataset(filepath, coords="minimal", join="override", decode_times=True, use_cftime=True).sel(time=slice(datestart, dateend))

Which then led to some issues as some of them are using non-standard calendars (i.e. no leap years, 360 days,...). For instance, trying to read the sfcWindmax variable from the r1i1p1f2 member from the UKESM1-0-LL (a model with a 360Day-years calendar) raises the following ValueError:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-54-8085c4e5f17d> in <module>
----> 1 dat = xr.open_mfdataset(filepath, coords="minimal", join="override", decode_times=True, use_cftime=True).sel(time=slice(datestart, dateend))

~/.local/lib/python3.9/site-packages/xarray/core/dataset.py in sel(self, indexers, method, tolerance, drop, **indexers_kwargs)
   2499         """
   2500         indexers = either_dict_or_kwargs(indexers, indexers_kwargs, "sel")
-> 2501         pos_indexers, new_indexes = remap_label_indexers(
   2502             self, indexers=indexers, method=method, tolerance=tolerance
   2503         )

~/.local/lib/python3.9/site-packages/xarray/core/coordinates.py in remap_label_indexers(obj, indexers, method, tolerance, **indexers_kwargs)
    419     }
    420 
--> 421     pos_indexers, new_indexes = indexing.remap_label_indexers(
    422         obj, v_indexers, method=method, tolerance=tolerance
    423     )

~/.local/lib/python3.9/site-packages/xarray/core/indexing.py in remap_label_indexers(data_obj, indexers, method, tolerance)
    119     for dim, index in indexes.items():
    120         labels = grouped_indexers[dim]
--> 121         idxr, new_idx = index.query(labels, method=method, tolerance=tolerance)
    122         pos_indexers[dim] = idxr
    123         if new_idx is not None:

~/.local/lib/python3.9/site-packages/xarray/core/indexes.py in query(self, labels, method, tolerance)
    206 
    207         if isinstance(label, slice):
--> 208             indexer = _query_slice(self.index, label, coord_name, method, tolerance)
    209         elif is_dict_like(label):
    210             raise ValueError(

~/.local/lib/python3.9/site-packages/xarray/core/indexes.py in _query_slice(index, label, coord_name, method, tolerance)
     95             "cannot use ``method`` argument if any indexers are slice objects"
     96         )
---> 97     indexer = index.slice_indexer(
     98         _sanitize_slice_element(label.start),
     99         _sanitize_slice_element(label.stop),

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/pandas/core/indexes/base.py in slice_indexer(self, start, end, step, kind)
   5275         slice(1, 3, None)
   5276         """
-> 5277         start_slice, end_slice = self.slice_locs(start, end, step=step, kind=kind)
   5278 
   5279         # return a slice

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/pandas/core/indexes/base.py in slice_locs(self, start, end, step, kind)
   5480         end_slice = None
   5481         if end is not None:
-> 5482             end_slice = self.get_slice_bound(end, "right", kind)
   5483         if end_slice is None:
   5484             end_slice = len(self)

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/pandas/core/indexes/base.py in get_slice_bound(self, label, side, kind)
   5384         # For datetime indices label may be a string that has to be converted
   5385         # to datetime boundary according to its resolution.
-> 5386         label = self._maybe_cast_slice_bound(label, side, kind)
   5387 
   5388         # we need to look up the label

~/.local/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in _maybe_cast_slice_bound(self, label, side, kind)
    478             return label
    479 
--> 480         parsed, resolution = _parse_iso8601_with_reso(self.date_type, label)
    481         start, end = _parsed_string_to_bounds(self.date_type, resolution, parsed)
    482         if self.is_monotonic_decreasing and len(self) > 1:

~/.local/lib/python3.9/site-packages/xarray/coding/cftimeindex.py in _parse_iso8601_with_reso(date_type, timestr)
    135             replace[attr] = int(value)
    136             resolution = attr
--> 137     return default.replace(**replace), resolution
    138 
    139 

cftime/_cftime.pyx in cftime._cftime.datetime.replace()

cftime/_cftime.pyx in cftime._cftime.Datetime360Day.__init__()

cftime/_cftime.pyx in cftime._cftime.assert_valid_date()

ValueError: invalid day number provided in cftime.Datetime360Day(2010, 3, 31, 0, 0, 0, 0)

I then tried to first standardize the calendar using xarray.Dataset.convert_calendar() and then interpolate it using xaray.Dataset.interp with a standard xarray.cftime_range as the target index (following the xarray documentation: https://docs.xarray.dev/en/stable/user-guide/weather-climate.html ):

def read_cmip6(filepath,datestart,dateend):
    #open netcdf dataset
    dat = xr.open_mfdataset(filepath, coords="minimal", join="override", decode_times = True, use_cftime=True)  
    #convert calendar to standard, setting missing values as NaNs
    dat = dat.convert_calendar("standard", use_cftime=True, align_on="date",missing=np.nan)
    #interpolate the dataset using cftim_range
    dateidx = xr.cftime_range(datestart,dateend,freq='D',calendar="standard") 
    dat = dat.interp(time=dateidx,method="nearest") 
    #take slice
    dat = dat.sel(time=slice(datestart, dateend))
    return dat

datUKESM1 = read_cmip6(filepath,datestart,dateend)

This did not raise any error and yielded the following dataset:

<xarray.Dataset>
Dimensions:     (time: 10774, lat: 144, bnds: 2, lon: 192)
Coordinates:
  * lat         (lat) float64 -89.38 -88.12 -86.88 -85.62 ... 86.88 88.12 89.38
  * lon         (lon) float64 0.9375 2.812 4.688 6.562 ... 355.3 357.2 359.1
    height      float64 10.0
  * time        (time) object 1980-10-01 00:00:00 ... 2010-03-31 00:00:00
Dimensions without coordinates: bnds
Data variables:
    lat_bnds    (time, lat, bnds) float64 dask.array<chunksize=(10774, 144, 2), meta=np.ndarray>
    lon_bnds    (time, lon, bnds) float64 dask.array<chunksize=(10774, 192, 2), meta=np.ndarray>
    sfcWindmax  (time, lat, lon) float32 dask.array<chunksize=(10774, 144, 192), meta=np.ndarray>
    time_bnds   (time, bnds) object dask.array<chunksize=(10774, 2), meta=np.ndarray>
Attributes: (12/46)
    Conventions:            CF-1.7 CMIP-6.2
    activity_id:            CMIP
    branch_method:          standard
    branch_time_in_child:   0.0
    branch_time_in_parent:  144000.0
    creation_date:          2019-06-24T15:04:56Z
    ...                     ...
    title:                  UKESM1-0-LL output prepared for CMIP6
    variable_id:            sfcWindmax
    variant_label:          r1i1p1f2
    license:                CMIP6 model data produced by the Met Office Hadle...
    cmor_version:           3.4.0
    tracking_id:            hdl:21.14100/3d7355e1-c91e-46fd-8dca-2b08da8c57d6

with the following time index:

<xarray.DataArray 'time' (time: 10774)>
array([cftime.DatetimeGregorian(1980, 10, 1, 0, 0, 0, 0),
       cftime.DatetimeGregorian(1980, 10, 2, 0, 0, 0, 0),
       cftime.DatetimeGregorian(1980, 10, 3, 0, 0, 0, 0), ...,
       cftime.DatetimeGregorian(2010, 3, 29, 0, 0, 0, 0),
       cftime.DatetimeGregorian(2010, 3, 30, 0, 0, 0, 0),
       cftime.DatetimeGregorian(2010, 3, 31, 0, 0, 0, 0)], dtype=object)
Coordinates:
    height   float64 10.0
  * time     (time) object 1980-10-01 00:00:00 ... 2010-03-31 00:00:00
Attributes:
    bounds:         time_bnds
    axis:           T
    long_name:      time
    standard_name:  time

I then carried on, re-interpolated to a regular grid and tried to save it to a new empty dataset, in order to gather more than 1 member into a single dataset:

#constants
var = 'sfcWindmax' #variable
nmems = 3 # number of members
memout = np.arange(0,nmems) # member range
daysid = xr.cftime_range(datestart,dateend,freq='D',calendar='standard') #days index
ndays = len(daysid) #nb of days

#read netcdf
u = read_cmip6(filepath, datestart, dateend)
#u = xr.open_mfdataset(filepath, coords="minimal", join="override", decode_times = True, use_cftime=True).sel(time=slice(datestart, dateend))
#select variable 
utemp = u[var]

#get lat and lon
latout = utemp.lat
lonout = utemp.lon

#initiate empty dataframe
uzmem = xr.DataArray(np.zeros([ ndays , latout.size, lonout.size, nmems]), coords=[daysid, latout, lonout, memout],
          dims=['day','lat','lon','member'], name="historical")

uzmem.loc[dict(member=0)]=utemp

But the assignment using the .loc keyword led to the following error:

---------------------------------------------------------------------------
UFuncTypeError                            Traceback (most recent call last)
<ipython-input-73-99f7ff0346e5> in <module>
     21 
     22 #interpolate to a regular gri
---> 23 uzmem.loc[dict(member=0)]=utemp

~/.local/lib/python3.9/site-packages/xarray/core/dataarray.py in __setitem__(self, key, value)
    206 
    207         pos_indexers, _ = remap_label_indexers(self.data_array, key)
--> 208         self.data_array[pos_indexers] = value
    209 
    210 

~/.local/lib/python3.9/site-packages/xarray/core/dataarray.py in __setitem__(self, key, value)
    756                 for k, v in self._item_key_to_dict(key).items()
    757             }
--> 758             self.variable[key] = value
    759 
    760     def __delitem__(self, key: Any) -> None:

~/.local/lib/python3.9/site-packages/xarray/core/variable.py in __setitem__(self, key, value)
    854 
    855         indexable = as_indexable(self._data)
--> 856         indexable[index_tuple] = value
    857 
    858     @property

~/.local/lib/python3.9/site-packages/xarray/core/indexing.py in __setitem__(self, key, value)
   1164         array, key = self._indexing_array_and_key(key)
   1165         try:
-> 1166             array[key] = value
   1167         except ValueError:
   1168             # More informative exception if read-only view

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/array/core.py in __array__(self, dtype, **kwargs)
   1474 
   1475     def __array__(self, dtype=None, **kwargs):
-> 1476         x = self.compute()
   1477         if dtype and x.dtype != dtype:
   1478             x = x.astype(dtype)

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/base.py in compute(self, **kwargs)
    282         dask.base.compute
    283         """
--> 284         (result,) = compute(self, traverse=False, **kwargs)
    285         return result
    286 

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/base.py in compute(*args, **kwargs)
    564         postcomputes.append(x.__dask_postcompute__())
    565 
--> 566     results = schedule(dsk, keys, **kwargs)
    567     return repack([f(r, *a) for r, (f, a) in zip(results, postcomputes)])
    568 

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/threaded.py in get(dsk, result, cache, num_workers, pool, **kwargs)
     77             pool = MultiprocessingPoolExecutor(pool)
     78 
---> 79     results = get_async(
     80         pool.submit,
     81         pool._max_workers,

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/local.py in get_async(submit, num_workers, dsk, result, cache, get_id, rerun_exceptions_locally, pack_exception, raise_exception, callbacks, dumps, loads, chunksize, **kwargs)
    512                             _execute_task(task, data)  # Re-execute locally
    513                         else:
--> 514                             raise_exception(exc, tb)
    515                     res, worker_id = loads(res_info)
    516                     state["cache"][key] = res

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/local.py in reraise(exc, tb)
    323     if exc.__traceback__ is not tb:
    324         raise exc.with_traceback(tb)
--> 325     raise exc
    326 
    327 

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/local.py in execute_task(key, task_info, dumps, loads, get_id, pack_exception)
    221     try:
    222         task, data = loads(task_info)
--> 223         result = _execute_task(task, data)
    224         id = get_id()
    225         result = dumps((result, id))

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/core.py in <genexpr>(.0)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/optimization.py in __call__(self, *args)
    961         if not len(args) == len(self.inkeys):
    962             raise ValueError("Expected %d args, got %d" % (len(self.inkeys), len(args)))
--> 963         return core.get(self.dsk, self.outkey, dict(zip(self.inkeys, args)))
    964 
    965     def __reduce__(self):

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/core.py in get(dsk, out, cache)
    149     for key in toposort(dsk):
    150         task = dsk[key]
--> 151         result = _execute_task(task, cache)
    152         cache[key] = result
    153     result = _execute_task(out, cache)

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/core.py in _execute_task(arg, cache, dsk)
    119         # temporaries by their reference count and can execute certain
    120         # operations in-place.
--> 121         return func(*(_execute_task(a, cache) for a in args))
    122     elif not ishashable(arg):
    123         return arg

/usr/local/Miniconda3-envs/envs/2021/envs/iacpy3_2021/lib/python3.9/site-packages/dask/utils.py in apply(func, args, kwargs)
     32 def apply(func, args, kwargs=None):
     33     if kwargs:
---> 34         return func(*args, **kwargs)
     35     else:
     36         return func(*args)

~/.local/lib/python3.9/site-packages/xarray/core/missing.py in _dask_aware_interpnd(var, interp_func, interp_kwargs, localize, *coords)
    822         var = var.data
    823 
--> 824     return _interpnd(var, x, new_x, interp_func, interp_kwargs)
    825 
    826 

~/.local/lib/python3.9/site-packages/xarray/core/missing.py in _interpnd(var, x, new_x, func, kwargs)
    778 
    779 def _interpnd(var, x, new_x, func, kwargs):
--> 780     x, new_x = _floatize_x(x, new_x)
    781 
    782     if len(x) == 1:

~/.local/lib/python3.9/site-packages/xarray/core/missing.py in _floatize_x(x, new_x)
    588             xmin = x[i].values.min()
    589             x[i] = x[i]._to_numeric(offset=xmin, dtype=np.float64)
--> 590             new_x[i] = new_x[i]._to_numeric(offset=xmin, dtype=np.float64)
    591     return x, new_x
    592 

~/.local/lib/python3.9/site-packages/xarray/core/variable.py in _to_numeric(self, offset, datetime_unit, dtype)
   2484         See duck_array_ops.datetime_to_numeric
   2485         """
-> 2486         numeric_array = duck_array_ops.datetime_to_numeric(
   2487             self.data, offset, datetime_unit, dtype
   2488         )

~/.local/lib/python3.9/site-packages/xarray/core/duck_array_ops.py in datetime_to_numeric(array, offset, datetime_unit, dtype)
    432     # For np.datetime64, this can silently yield garbage due to overflow.
    433     # One option is to enforce 1970-01-01 as the universal offset.
--> 434     array = array - offset
    435 
    436     # Scalar is converted to 0d-array

UFuncTypeError: ufunc 'subtract' cannot use operands with types dtype('O') and dtype('<M8[ns]')

However, I get no errors when trying the same thing with a model which is using a standard calendar (e.g. member r10i1p1f2 from the CNRM-CM6-1 model) and calling directly

u = xr.open_mfdataset(filepath, coords="minimal", join="override", decode_times = True, use_cftime=True).sel(time=slice(datestart, dateend))

instead of my read_cmip6 function. It seems then that the .loc is not happy with whatever datetime object that came out of the xarray.convert_calendar function, even though the time indexes of the data look exactly similar:

<xarray.DataArray 'time' (time: 10774)>
array([cftime.DatetimeGregorian(1980, 10, 1, 12, 0, 0, 0),
       cftime.DatetimeGregorian(1980, 10, 2, 12, 0, 0, 0),
       cftime.DatetimeGregorian(1980, 10, 3, 12, 0, 0, 0), ...,
       cftime.DatetimeGregorian(2010, 3, 29, 12, 0, 0, 0),
       cftime.DatetimeGregorian(2010, 3, 30, 12, 0, 0, 0),
       cftime.DatetimeGregorian(2010, 3, 31, 12, 0, 0, 0)], dtype=object)
Coordinates:
    height   float64 10.0
  * time     (time) object 1980-10-01 12:00:00 ... 2010-03-31 12:00:00
Attributes:
    axis:           T
    standard_name:  time
    long_name:      Time axis
    time_origin:    1850-01-01 00:00:00
    bounds:         time_bounds

using u = xr.open_mfdataset(filepath, coords="minimal", join="override", decode_times = True, use_cftime=True).sel(time=slice(datestart, dateend)).

Vs:

<xarray.DataArray 'time' (time: 10774)>
array([cftime.DatetimeGregorian(1980, 10, 1, 0, 0, 0, 0),
       cftime.DatetimeGregorian(1980, 10, 2, 0, 0, 0, 0),
       cftime.DatetimeGregorian(1980, 10, 3, 0, 0, 0, 0), ...,
       cftime.DatetimeGregorian(2010, 3, 29, 0, 0, 0, 0),
       cftime.DatetimeGregorian(2010, 3, 30, 0, 0, 0, 0),
       cftime.DatetimeGregorian(2010, 3, 31, 0, 0, 0, 0)], dtype=object)
Coordinates:
    height   float64 10.0
  * time     (time) object 1980-10-01 00:00:00 ... 2010-03-31 00:00:00
Attributes:
    axis:           T
    standard_name:  time
    long_name:      Time axis
    time_origin:    1850-01-01 00:00:00
    bounds:         time_bounds

using u = read_cmip6(filepath, datestart, dateend).

I tried converting to every other calendar type available within the cftime library and also to force the conversion to numpy.datetime64[ns] using .astype('datetime64[ns]') (see: https://bobhowto.wordpress.com/2020/04/02/why-i-hate-python-ufunc-subtract-cannot-use-operands-with-types-dtypem8ns-and-dtypeo/#8230) but this did not solve the error.

Does anyone have any idea where this issue is arising from and how I could fix this?

Any help would be much appreciated.

datetime

calendar

netcdf

python-xarray

weather

0 Answers

Your Answer

Accepted video resources