Tutorial

Preprocessing of daily observations

First, we need to obtain some daily weather observations that we will use to detect heat wave events.

We will use a CSV file with daily historical climatic data (1901 - 2020) from the Thissio station (Athens, Greece) of the National Observatory of Athens (Founda, 2011, Founda et al., 2013). The dataset is puclic under CC-BY-SA 4.0.

After downloading our file, we should inspect its data and preprocess it so that it can be used with hotspell.

[1]:
import pandas as pd

raw_data = "/my_path/hcd_noa.csv"  # Replace with your path

df = pd.read_csv(raw_data)
df.head()
[1]:
YEAR MONTH DAY Tmax (oC) Tmin (oC) RH (%) Rain (mm)
0 1901 1 1 14.3 5.9 67.0 0.2
1 1901 1 2 15.0 8.9 80.0 5.8
2 1901 1 3 10.3 4.8 76.0 5.0
3 1901 1 4 7.1 4.8 78.0 3.2
4 1901 1 5 10.4 5.2 83.0 6.6

As we can see, the dataset includes 7 columns.

The first 3 columns correspond to the year, month and day of the observations, columns 4 and 5 to the daily maximum (Tmax) and daily minimum (Tmin) air temperature (in °C), column 6 to to relative humidity (RH, expressed as percent) and the last column to precipitation (Rain, in mm).

We need to drop humidity and rain and rearrange Tmax and Tmin:

[2]:
df = df[["YEAR", "MONTH", "DAY", "Tmin (oC)", "Tmax (oC)"]]
df.head()
[2]:
YEAR MONTH DAY Tmin (oC) Tmax (oC)
0 1901 1 1 5.9 14.3
1 1901 1 2 8.9 15.0
2 1901 1 3 4.8 10.3
3 1901 1 4 4.8 7.1
4 1901 1 5 5.2 10.4

Suppose that there are nodata values included in the timeseries which have been set as -9999.

We should delete these measurements:

[3]:
nodata_value = -9999
df = df.loc[
    (df["Tmin (oC)"] != nodata_value ) & (df["Tmax (oC)"] != nodata_value)
]

Now, we are ready to save our processed file; we must not write out the header and the index of the DataFrame.

[4]:
processed_data = "/my_path/hcd_noa_processed.csv"  # Replace with your path

df.to_csv(processed_data, header=False, index=False)

Detect heat waves

We have finished the preprocessing of our data and we are now ready to use hotspell.

Create a heat index

First we must initialize the heat wave index we want to use. For this first example we are going to use the index CTX90PCT.

[5]:
import hotspell

index_name = "ctx90pct"
ctx90pct = hotspell.index(name=index_name)

This index uses as a threshold the calendar day 90th percentile value of the maximum temperature based on a 15-day moving window. A heat wave occurs when the threshold is exceeded for at least 3 consecutive days.

For the complete list of the available heat wave indices to use see here.

Find heat wave events and compute annual metrics

Using our heat wave index and the output CSV from the first part of the tutorial we will find the heat wave events for the Thissio station.

[6]:
hw = hotspell.get_heatwaves(filename=processed_data, hw_index=ctx90pct)

Above we used the default arguments for the parameters of get_heatwaves:

  • the base period, used to calculate the percentile values, was 1961 to 1990

  • we limited our interest only to months June to August

  • we chose to compute the annual metrics and to export our results in csv files

For a list of available choises see the documentation for hotspell.get_heatwaves.

Let’s examine our results.

The hw.events attribute is a DataFrame that contains the dates of detected heat wave events, as well as their basic characteristics (duration and temperature statistics).

[7]:
hw.events.head()
[7]:
begin_date end_date duration avg_tmax std_tmax max_tmax
index
1901-08-01 1901-08-01 1901-08-03 3 37.9 0.8 38.8
1902-07-22 1902-07-22 1902-07-24 3 38.2 1.8 40.3
1903-08-14 1903-08-14 1903-08-16 3 36.0 0.2 36.2
1904-08-09 1904-08-09 1904-08-12 4 36.5 0.3 36.8
1905-08-26 1905-08-26 1905-08-31 6 36.5 0.9 38.0
[8]:
hw.events.describe()
[8]:
duration avg_tmax std_tmax max_tmax
count 198.000000 198.000000 198.000000 198.000000
mean 4.717172 36.745455 1.121212 38.145455
std 2.172932 1.538327 0.637774 2.020794
min 3.000000 32.500000 0.100000 32.800000
25% 3.000000 36.000000 0.700000 36.900000
50% 4.000000 36.950000 1.000000 38.000000
75% 6.000000 37.600000 1.400000 39.200000
max 13.000000 41.000000 3.600000 44.800000

From the above we see that the CTX90PCT index resulted in 198 heat wave events between 1901 and 2020, with an average duration of nearly 5 days and an average temperature of 36.7 °C.

The hw.metrics attribute is a DataFrame with the annual heat waves properties.

  • hwn: number of events

  • hwf: number of days

  • hwd: duration of longest event

  • hwdm: mean duration of events

  • hwm: mean normalized magnitude

  • hwma: mean absolute magnitude

  • hwa: normalized magnitude of hottest day

  • hwaa: absolute magnitude of hottest day

For a more detailed description of heat waves metrics see the documentation.

[9]:
hw.metrics.head()
[9]:
hwn hwf hwd hwdm hwm hwma hwa hwaa
year
1901 1 3 3.0 3.0 7.2 38.8 7.2 38.8
1902 1 3 3.0 3.0 8.7 40.3 8.7 40.3
1903 1 3 3.0 3.0 4.6 36.2 4.6 36.2
1904 1 4 4.0 4.0 5.2 36.8 5.2 36.8
1905 1 6 6.0 6.0 6.4 38.0 6.4 38.0

Detect heat waves using a custom index

Let’s repeat the procedure devising a custom index, that we will call extreme, that aims to capture only the most severe cases of heat. We can define the heat waves under this index as the period of at least 4 concecutive days with maximum temperatures above 40 °C.

[10]:
extreme = hotspell.index(
    name="extreme",
    var="tmax",
    fixed_thres=40,
    min_duration=4
)

hw_extreme = hotspell.get_heatwaves(filename=processed_data, hw_index=extreme)

We see that at this extreme case study only two events satisfied the heat wave criteria.

[11]:
hw_extreme.events
[11]:
begin_date end_date duration avg_tmax std_tmax max_tmax
index
1987-07-21 1987-07-21 1987-07-27 7 41.6 0.7 42.8
2007-07-22 2007-07-22 2007-07-25 4 41.4 0.5 41.9