1 year ago
#352118
TylerSingleton
accessing geospatial raster data with limited memory
I am following the Rasterio documentation to access the geospatial raster data downloaded from here -- a large tiff image. Unfortunately, I do not have enough memory so numpy throws an ArrayMemoryError.
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 77.8 GiB for an array with shape (1, 226112, 369478) and data type uint8
My code is as follow:
import rasterio
import rasterio.features
import rasterio.warp
file_path = r'Path\to\file\ESACCI-LC-L4-LC10-Map-10m-MEX.tif'
with rasterio.open(file_path) as dataset:
# Read the dataset's valid data mask as a ndarray.
mask = dataset.dataset_mask()
# Extract feature shapes and values from the array.
for geom, val in rasterio.features.shapes(
mask, transform=dataset.transform):
# Transform shapes from the dataset's own coordinate
# reference system to CRS84 (EPSG:4326).
geom = rasterio.warp.transform_geom(
dataset.crs, 'EPSG:4326', geom, precision=6)
# Print GeoJSON shapes to stdout.
print(geom)
I need a way to store the numpy array to disk, so I tried looking into numpy nemmap, but I do not understand how to implement it for this. Additionally, I do not need to the full geospacial data, I am only interested in the lat, long, and the type of land cover as I planed to merge this with another dataset.
Using python 3.9.
Edit: I updated my code to try using a window.
with rasterio.open(file_path) as dataset:
mask = dataset.read(1, window=Window(0, 0, 226112, 369478))
...
I can obviously adjust the window and upload the file in sections now. However, I do not understand how this has almost halved the memory required from 77.8 to 47.6.
python-3.x
rasterio
numpy-memmap
0 Answers
Your Answer