Tutorial

Preprocessing of daily observations

First, we need to obtain some daily weather observations that we will use to detect heat wave events.

We will use a CSV file with daily historical climatic data (1901 - 2020) from the Thissio station (Athens, Greece) of the National Observatory of Athens (Founda, 2011, Founda et al., 2013). The dataset is puclic under CC-BY-SA 4.0.

After downloading our file, we should inspect its data and preprocess it so that it can be used with hotspell.

[1]:

import pandas as pd

raw_data = "/my_path/hcd_noa.csv"  # Replace with your path

df = pd.read_csv(raw_data)
df.head()

[1]:

	YEAR	MONTH	DAY	Tmax (oC)	Tmin (oC)	RH (%)	Rain (mm)
0	1901	1	1	14.3	5.9	67.0	0.2
1	1901	1	2	15.0	8.9	80.0	5.8
2	1901	1	3	10.3	4.8	76.0	5.0
3	1901	1	4	7.1	4.8	78.0	3.2
4	1901	1	5	10.4	5.2	83.0	6.6

As we can see, the dataset includes 7 columns.

The first 3 columns correspond to the year, month and day of the observations, columns 4 and 5 to the daily maximum (Tmax) and daily minimum (Tmin) air temperature (in °C), column 6 to to relative humidity (RH, expressed as percent) and the last column to precipitation (Rain, in mm).

We need to drop humidity and rain and rearrange Tmax and Tmin:

[2]:

df = df[["YEAR", "MONTH", "DAY", "Tmin (oC)", "Tmax (oC)"]]
df.head()

[2]:

	YEAR	MONTH	DAY	Tmin (oC)	Tmax (oC)
0	1901	1	1	5.9	14.3
1	1901	1	2	8.9	15.0
2	1901	1	3	4.8	10.3
3	1901	1	4	4.8	7.1
4	1901	1	5	5.2	10.4

Suppose that there are nodata values included in the timeseries which have been set as -9999.

We should delete these measurements:

[3]:

nodata_value = -9999
df = df.loc[
    (df["Tmin (oC)"] != nodata_value ) & (df["Tmax (oC)"] != nodata_value)
]

Now, we are ready to save our processed file; we must not write out the header and the index of the DataFrame.

[4]:

processed_data = "/my_path/hcd_noa_processed.csv"  # Replace with your path

df.to_csv(processed_data, header=False, index=False)

Detect heat waves

We have finished the preprocessing of our data and we are now ready to use hotspell.

Create a heat index

First we must initialize the heat wave index we want to use. For this first example we are going to use the index CTX90PCT.

[5]:

import hotspell

index_name = "ctx90pct"
ctx90pct = hotspell.index(name=index_name)

This index uses as a threshold the calendar day 90th percentile value of the maximum temperature based on a 15-day moving window. A heat wave occurs when the threshold is exceeded for at least 3 consecutive days.

For the complete list of the available heat wave indices to use see here.

Find heat wave events and compute annual metrics

Using our heat wave index and the output CSV from the first part of the tutorial we will find the heat wave events for the Thissio station.

[6]:

hw = hotspell.get_heatwaves(filename=processed_data, hw_index=ctx90pct)

Above we used the default arguments for the parameters of get_heatwaves:

the base period, used to calculate the percentile values, was 1961 to 1990
we limited our interest only to months June to August
we chose to compute the annual metrics and to export our results in csv files

For a list of available choises see the documentation for hotspell.get_heatwaves.

Let’s examine our results.

The hw.events attribute is a DataFrame that contains the dates of detected heat wave events, as well as their basic characteristics (duration and temperature statistics).

[7]:

hw.events.head()

[7]:

	begin_date	end_date	duration	avg_tmax	std_tmax	max_tmax
index
1901-08-01	1901-08-01	1901-08-03	3	37.9	0.8	38.8
1902-07-22	1902-07-22	1902-07-24	3	38.2	1.8	40.3
1903-08-14	1903-08-14	1903-08-16	3	36.0	0.2	36.2
1904-08-09	1904-08-09	1904-08-12	4	36.5	0.3	36.8
1905-08-26	1905-08-26	1905-08-31	6	36.5	0.9	38.0

[8]:

hw.events.describe()

[8]:

	duration	avg_tmax	std_tmax	max_tmax
count	198.000000	198.000000	198.000000	198.000000
mean	4.717172	36.745455	1.121212	38.145455
std	2.172932	1.538327	0.637774	2.020794
min	3.000000	32.500000	0.100000	32.800000
25%	3.000000	36.000000	0.700000	36.900000
50%	4.000000	36.950000	1.000000	38.000000
75%	6.000000	37.600000	1.400000	39.200000
max	13.000000	41.000000	3.600000	44.800000

From the above we see that the CTX90PCT index resulted in 198 heat wave events between 1901 and 2020, with an average duration of nearly 5 days and an average temperature of 36.7 °C.

The hw.metrics attribute is a DataFrame with the annual heat waves properties.

hwn: number of events
hwf: number of days
hwd: duration of longest event
hwdm: mean duration of events
hwm: mean normalized magnitude
hwma: mean absolute magnitude
hwa: normalized magnitude of hottest day
hwaa: absolute magnitude of hottest day

For a more detailed description of heat waves metrics see the documentation.

[9]:

hw.metrics.head()

[9]:

	hwn	hwf	hwd	hwdm	hwm	hwma	hwa	hwaa
year
1901	1	3	3.0	3.0	7.2	38.8	7.2	38.8
1902	1	3	3.0	3.0	8.7	40.3	8.7	40.3
1903	1	3	3.0	3.0	4.6	36.2	4.6	36.2
1904	1	4	4.0	4.0	5.2	36.8	5.2	36.8
1905	1	6	6.0	6.0	6.4	38.0	6.4	38.0

Detect heat waves using a custom index

Let’s repeat the procedure devising a custom index, that we will call extreme, that aims to capture only the most severe cases of heat. We can define the heat waves under this index as the period of at least 4 concecutive days with maximum temperatures above 40 °C.

[10]:

extreme = hotspell.index(
    name="extreme",
    var="tmax",
    fixed_thres=40,
    min_duration=4
)

hw_extreme = hotspell.get_heatwaves(filename=processed_data, hw_index=extreme)

We see that at this extreme case study only two events satisfied the heat wave criteria.

[11]:

hw_extreme.events

[11]:

	begin_date	end_date	duration	avg_tmax	std_tmax	max_tmax
index
1987-07-21	1987-07-21	1987-07-27	7	41.6	0.7	42.8
2007-07-22	2007-07-22	2007-07-25	4	41.4	0.5	41.9