import boto3
import botocore
import datetime
import matplotlib.pyplot as plt
import os
import xarray as xr
import numpy as np
import pandas as pd
import sys
Download Copernicus ERA5 Data with S3 without logging in
Written by Minh Phan
In this tutorial, you will Download Copernicus ERA5 Data with S3 without logging in. Copernicus ERA5 is one of the most well-known reanalysis datasets on modern climate, providing a numerical assessment of the modern climate. Although we mentioned previously that streaming data in S3 is time-consuming if you’re not in the local region, we had lots of luck using this dataset to get data quickly and seamlessly without much additional coding (slicing data temporally) as S3 streaming can handle big requests efficiently. Most of the codes we wrote in this notebook are modified from the original notebook here
Variables
The table below lists the 18 ERA5 variables that are available on S3. All variables are surface or single level parameters sourced from the HRES sub-daily forecast stream.
Variable Name | File Name | Variable type (fc/an) |
---|---|---|
10 metre U wind component | eastward_wind_at_10_metres.nc | an |
10 metre V wind component | northward_wind_at_10_metres.nc | an |
100 metre U wind component | eastward_wind_at_100_metres.nc | an |
100 metre V wind component | northward_wind_at_100_metres.nc | an |
2 metre dew point temperature | dew_point_temperature_at_2_metres.nc | an |
2 metre temperature | air_temperature_at_2_metres.nc | an |
2 metres maximum temperature since previous post-processing | air_temperature_at_2_metres_1hour_Maximum.nc | fc |
2 metres minimum temperature since previous post-processing | air_temperature_at_2_metres_1hour_Minimum.nc | fc |
Mean sea level pressure | air_pressure_at_mean_sea_level.nc | an |
Sea surface temperature | sea_surface_temperature.nc | an |
Mean wave period | sea_surface_wave_mean_period.nc | |
Mean direction of waves | sea_surface_wave_from_direction.nc | |
Significant height of combined wind waves and swell | significant_height_of_wind_and_swell_waves.nc | |
Snow density | snow_density.nc | an |
Snow depth | lwe_thickness_of_surface_snow_amount.nc | an |
Surface pressure | surface_air_pressure.nc | an |
Surface solar radiation downwards | integral_wrt_time_of_surface_direct_downwelling_shortwave_flux_in_air_1hour_Accumulation.nc | fc |
Total precipitation | precipitation_amount_1hour_Accumulation.nc | fc |
For my dataset, we collect air temperature (at 2m), sea surface temperature, and u and v wind components so that we can compute speed and direction later.
Import necessary libraries
Download data
= 'era5-pds'
era5_bucket = boto3.client('s3', config=botocore.client.Config(signature_version=botocore.UNSIGNED)) client
def download_era5_s3(var_era5, month_start, month_end, lat1=5, lat2=25, lon1=60, lon2=80):
"""
var_era5: variable name
month_start: formatted as YYYY-MM
month_end: formatted as YYYY-MM (right-exclusive)
"""
= '{year}/{month}/data/{var}.nc'
s3_data_ptrn
= 'demonstrated data/era5/temp'
path_temp_folder = f'demonstrated data/era5/{var_era5}'
path_var_folder if not os.path.exists(path_temp_folder):
os.makedirs(path_temp_folder)if not os.path.exists(path_var_folder):
os.makedirs(path_var_folder)
= os.path.join(path_temp_folder,'{year}{month}_{var}.nc')
data_file_ptrn = os.path.join(path_var_folder, '{year}{month}_{var}.nc')
sliced_data_file_ptrn = pd.date_range(month_start, month_end, freq='M')
months for month in months:
= s3_data_ptrn.format(year=month.year, month="{:02d}".format(month.month), var=var_era5)
s3_data_key = data_file_ptrn.format(year=month.year, month="{:02d}".format(month.month), var=var_era5)
data_file if not os.path.isfile(data_file): # check if file already exists
print("Downloading %s from S3..." % s3_data_key)
client.download_file(era5_bucket, s3_data_key, data_file)
= sliced_data_file_ptrn.format(year = month.year, month = "{:02d}".format(month.month), var= var_era5)
export_file =slice(lat2, lat1), lon=slice(lon1, lon2)).to_netcdf(export_file)
xr.open_dataset(data_file).sel(lat os.remove(data_file)
# download data for 4 variables we need
# consult available names in the table above in the file name column (remove .nc)
# month_end is not included in dataset (right-exclusive)
='eastward_wind_at_10_metres', month_start='2003-01', month_end='2003-03')
download_era5_s3(var_era5='northward_wind_at_10_metres', month_start='2003-01', month_end='2003-03') download_era5_s3(var_era5
Downloading 2003/01/data/eastward_wind_at_10_metres.nc from S3...
Downloading 2003/02/data/eastward_wind_at_10_metres.nc from S3...
Downloading 2003/01/data/northward_wind_at_10_metres.nc from S3...
Downloading 2003/02/data/northward_wind_at_10_metres.nc from S3...