import xarray as xr
import pandas as pd
import numpy as np
Adding new data
Written by Minh Phan
We can also add new data to our ZARR file as long as the additional dataset shape fits into our original dataset shape but one dimension. By this, we can add data along one dimension at a time, but other dimensions and all the variables (including metadata) must be identical in size. For example, if our dataset has size 100 lat x 100 lon x 200 time with five variables, the new dataset that we can append to must also have the exact five variables, and two of the dimensions be the same size (in the most logical case, we append along the time dimension, so our new data must have 100 lat x 100 lon).
Sometimes, you are also recommended to rechunk the data after appending as unequal chunk sizes may cost computational operation time.
For demonstration purposes, I will not go through again process of creating another dataset, and instead provide an already cleaned dataset for us to practice on. Start by loading this cleaned dataset into our file, as well as the original dataset that we already exported (to compare and double check metadata before we export).
To keep our original dataset intact, I made a copy of our original Zarr file. Please load it instead.
Load in data
= xr.open_zarr('demonstrated data/final-sample-appending.zarr/')
og_ds = xr.open_zarr('demonstrated data/new-data-sample.zarr/') new_ds
Note that our new dataset does not have any metadata. As shown in the previous notebooks, metadata is added at the last step, so now we are going to copy all metadata from the original dataset to our new one.
Add metadata
# copy dataset metadata
= og_ds.attrs
new_ds.attrs
# copy variables/dimensions metadata
# make sure that all vars in new_ds exist in og_ds
for var in new_ds.variables:
= og_ds[var].attrs new_ds[var].attrs
# double-check
new_ds
<xarray.Dataset> Dimensions: (time: 2556, lat: 81, lon: 81) Coordinates: * lat (lat) float32 25.0 24.75 24.5 24.25 ... 5.75 5.5 5.25 5.0 * lon (lon) float32 60.0 60.25 60.5 60.75 ... 79.5 79.75 80.0 * time (time) datetime64[ns] 1993-01-01 1993-01-02 ... 1999-12-31 Data variables: (12/14) CHL (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> CHL_uncertainty (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> adt (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> air_temp (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> direction (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> sla (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> ... ... u_curr (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> u_wind (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> ug_curr (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> v_curr (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> v_wind (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> vg_curr (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> Attributes: (12/17) creator_email: minhphan@uw.edu creator_name: Minh Phan creator_type: person date_created: 2023-11-11 geospatial_lat_max: 25.0 geospatial_lat_min: 5.0 ... ... geospatial_lon_units: degrees_east source: OSCAR, ERA5 Reanalysis, Copernicus Climate Ch... summary: Daily mean of 0.25 x 0.25 degrees gridded dat... time_coverage_end: 2002-12-31T23:59:59 time_coverage_start: 2000-01-01T00:00:00 title: Sample of Climate Data for Coastal Upwelling ...
Appending data
'demonstrated data/final-sample-appending.zarr/', append_dim='time', mode='a') new_ds.to_zarr(
<xarray.backends.zarr.ZarrStore at 0x7f1455c51dd0>
Final result
'demonstrated data/final-sample-appending.zarr/') xr.open_zarr(
<xarray.Dataset> Dimensions: (time: 3287, lat: 81, lon: 81) Coordinates: * lat (lat) float32 25.0 24.75 24.5 24.25 ... 5.75 5.5 5.25 5.0 * lon (lon) float32 60.0 60.25 60.5 60.75 ... 79.5 79.75 80.0 * time (time) datetime64[ns] 2000-01-01 2000-01-02 ... 1999-12-31 Data variables: (12/14) CHL (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> CHL_uncertainty (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> adt (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> air_temp (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> direction (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> sla (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> ... ... u_curr (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> u_wind (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> ug_curr (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> v_curr (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> v_wind (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> vg_curr (time, lat, lon) float32 dask.array<chunksize=(100, 81, 81), meta=np.ndarray> Attributes: (12/17) creator_email: minhphan@uw.edu creator_name: Minh Phan creator_type: person date_created: 2023-11-11 geospatial_lat_max: 25.0 geospatial_lat_min: 5.0 ... ... geospatial_lon_units: degrees_east source: OSCAR, ERA5 Reanalysis, Copernicus Climate Ch... summary: Daily mean of 0.25 x 0.25 degrees gridded dat... time_coverage_end: 2002-12-31T23:59:59 time_coverage_start: 2000-01-01T00:00:00 title: Sample of Climate Data for Coastal Upwelling ...