Tutorial
Preprocessing of daily observations
First, we need to obtain some daily weather observations that we will use to detect heat wave events.
We will use a CSV file with daily historical climatic data (1901 - 2020) from the Thissio station (Athens, Greece) of the National Observatory of Athens (Founda, 2011, Founda et al., 2013). The dataset is puclic under CC-BY-SA 4.0.
After downloading our file, we should inspect its data and preprocess it so that it can be used with hotspell.
[1]:
import pandas as pd
raw_data = "/my_path/hcd_noa.csv" # Replace with your path
df = pd.read_csv(raw_data)
df.head()
[1]:
YEAR | MONTH | DAY | Tmax (oC) | Tmin (oC) | RH (%) | Rain (mm) | |
---|---|---|---|---|---|---|---|
0 | 1901 | 1 | 1 | 14.3 | 5.9 | 67.0 | 0.2 |
1 | 1901 | 1 | 2 | 15.0 | 8.9 | 80.0 | 5.8 |
2 | 1901 | 1 | 3 | 10.3 | 4.8 | 76.0 | 5.0 |
3 | 1901 | 1 | 4 | 7.1 | 4.8 | 78.0 | 3.2 |
4 | 1901 | 1 | 5 | 10.4 | 5.2 | 83.0 | 6.6 |
As we can see, the dataset includes 7 columns.
The first 3 columns correspond to the year, month and day of the observations, columns 4 and 5 to the daily maximum (Tmax) and daily minimum (Tmin) air temperature (in °C), column 6 to to relative humidity (RH, expressed as percent) and the last column to precipitation (Rain, in mm).
We need to drop humidity and rain and rearrange Tmax and Tmin:
[2]:
df = df[["YEAR", "MONTH", "DAY", "Tmin (oC)", "Tmax (oC)"]]
df.head()
[2]:
YEAR | MONTH | DAY | Tmin (oC) | Tmax (oC) | |
---|---|---|---|---|---|
0 | 1901 | 1 | 1 | 5.9 | 14.3 |
1 | 1901 | 1 | 2 | 8.9 | 15.0 |
2 | 1901 | 1 | 3 | 4.8 | 10.3 |
3 | 1901 | 1 | 4 | 4.8 | 7.1 |
4 | 1901 | 1 | 5 | 5.2 | 10.4 |
Suppose that there are nodata values included in the timeseries which have been set as -9999.
We should delete these measurements:
[3]:
nodata_value = -9999
df = df.loc[
(df["Tmin (oC)"] != nodata_value ) & (df["Tmax (oC)"] != nodata_value)
]
Now, we are ready to save our processed file; we must not write out the header and the index of the DataFrame.
[4]:
processed_data = "/my_path/hcd_noa_processed.csv" # Replace with your path
df.to_csv(processed_data, header=False, index=False)
Detect heat waves
We have finished the preprocessing of our data and we are now ready to use hotspell.
Create a heat index
First we must initialize the heat wave index we want to use. For this first example we are going to use the index CTX90PCT.
[5]:
import hotspell
index_name = "ctx90pct"
ctx90pct = hotspell.index(name=index_name)
This index uses as a threshold the calendar day 90th percentile value of the maximum temperature based on a 15-day moving window. A heat wave occurs when the threshold is exceeded for at least 3 consecutive days.
For the complete list of the available heat wave indices to use see here.
Find heat wave events and compute annual metrics
Using our heat wave index and the output CSV from the first part of the tutorial we will find the heat wave events for the Thissio station.
[6]:
hw = hotspell.get_heatwaves(filename=processed_data, hw_index=ctx90pct)
Above we used the default arguments for the parameters of get_heatwaves:
the base period, used to calculate the percentile values, was 1961 to 1990
we limited our interest only to months June to August
we chose to compute the annual metrics and to export our results in csv files
For a list of available choises see the documentation for hotspell.get_heatwaves.
Let’s examine our results.
The hw.events
attribute is a DataFrame that contains the dates of detected heat wave events, as well as their basic characteristics (duration and temperature statistics).
[7]:
hw.events.head()
[7]:
begin_date | end_date | duration | avg_tmax | std_tmax | max_tmax | |
---|---|---|---|---|---|---|
index | ||||||
1901-08-01 | 1901-08-01 | 1901-08-03 | 3 | 37.9 | 0.8 | 38.8 |
1902-07-22 | 1902-07-22 | 1902-07-24 | 3 | 38.2 | 1.8 | 40.3 |
1903-08-14 | 1903-08-14 | 1903-08-16 | 3 | 36.0 | 0.2 | 36.2 |
1904-08-09 | 1904-08-09 | 1904-08-12 | 4 | 36.5 | 0.3 | 36.8 |
1905-08-26 | 1905-08-26 | 1905-08-31 | 6 | 36.5 | 0.9 | 38.0 |
[8]:
hw.events.describe()
[8]:
duration | avg_tmax | std_tmax | max_tmax | |
---|---|---|---|---|
count | 198.000000 | 198.000000 | 198.000000 | 198.000000 |
mean | 4.717172 | 36.745455 | 1.121212 | 38.145455 |
std | 2.172932 | 1.538327 | 0.637774 | 2.020794 |
min | 3.000000 | 32.500000 | 0.100000 | 32.800000 |
25% | 3.000000 | 36.000000 | 0.700000 | 36.900000 |
50% | 4.000000 | 36.950000 | 1.000000 | 38.000000 |
75% | 6.000000 | 37.600000 | 1.400000 | 39.200000 |
max | 13.000000 | 41.000000 | 3.600000 | 44.800000 |
From the above we see that the CTX90PCT index resulted in 198 heat wave events between 1901 and 2020, with an average duration of nearly 5 days and an average temperature of 36.7 °C.
The hw.metrics
attribute is a DataFrame with the annual heat waves properties.
hwn: number of events
hwf: number of days
hwd: duration of longest event
hwdm: mean duration of events
hwm: mean normalized magnitude
hwma: mean absolute magnitude
hwa: normalized magnitude of hottest day
hwaa: absolute magnitude of hottest day
For a more detailed description of heat waves metrics see the documentation.
[9]:
hw.metrics.head()
[9]:
hwn | hwf | hwd | hwdm | hwm | hwma | hwa | hwaa | |
---|---|---|---|---|---|---|---|---|
year | ||||||||
1901 | 1 | 3 | 3.0 | 3.0 | 7.2 | 38.8 | 7.2 | 38.8 |
1902 | 1 | 3 | 3.0 | 3.0 | 8.7 | 40.3 | 8.7 | 40.3 |
1903 | 1 | 3 | 3.0 | 3.0 | 4.6 | 36.2 | 4.6 | 36.2 |
1904 | 1 | 4 | 4.0 | 4.0 | 5.2 | 36.8 | 5.2 | 36.8 |
1905 | 1 | 6 | 6.0 | 6.0 | 6.4 | 38.0 | 6.4 | 38.0 |
Detect heat waves using a custom index
Let’s repeat the procedure devising a custom index, that we will call extreme, that aims to capture only the most severe cases of heat. We can define the heat waves under this index as the period of at least 4 concecutive days with maximum temperatures above 40 °C.
[10]:
extreme = hotspell.index(
name="extreme",
var="tmax",
fixed_thres=40,
min_duration=4
)
hw_extreme = hotspell.get_heatwaves(filename=processed_data, hw_index=extreme)
We see that at this extreme case study only two events satisfied the heat wave criteria.
[11]:
hw_extreme.events
[11]:
begin_date | end_date | duration | avg_tmax | std_tmax | max_tmax | |
---|---|---|---|---|---|---|
index | ||||||
1987-07-21 | 1987-07-21 | 1987-07-27 | 7 | 41.6 | 0.7 | 42.8 |
2007-07-22 | 2007-07-22 | 2007-07-25 | 4 | 41.4 | 0.5 | 41.9 |