We present the dataset we are working on in this section. An interested reader can get a sense of the location, the weather and the blight related weather conditions and the disease outbreak data.


Packages needed for the data preparation are loaded. If the libraries do not exist locally, they will be downloaded.

Primary disease outbreak data is acquired from Teagasc breeding program field trial records at Oak Park, Carlow, Ireland.

Bio Data

Planting date and first observation of the disease are loaded.

Weather Data

Historical weather data from Met Éireann synoptic weather station at Oak Park was used for model evaluation. The trial sites were in the radius of up to 500 m from the station in all years.

date short_date year month day i_rain rain i_temp temp i_wetb wetb dewpt vappr i_rhum rhum msl i_wdsp wdsp i_wddir wddir
2007-01-01 00:00:00 2007-01-01 2007 1 1 0 0.2 0 4.8 0 4.1 3.0 7.6 0 87 1006.9 2 11 2 250
2007-01-01 01:00:00 2007-01-01 2007 1 1 0 0.0 0 5.4 0 4.4 3.0 7.6 0 84 1007.6 2 7 2 240
2007-01-01 02:00:00 2007-01-01 2007 1 1 0 0.0 0 5.8 0 4.7 3.2 7.7 0 83 1008.4 2 6 2 250
2007-01-01 03:00:00 2007-01-01 2007 1 1 0 0.1 0 5.2 0 4.3 2.8 7.5 0 84 1009.7 2 15 2 260
2007-01-01 04:00:00 2007-01-01 2007 1 1 0 0.0 0 5.6 0 4.2 2.3 7.2 0 79 1010.9 2 15 2 260

Additional variables needed for the analysis.

Subset the data to exclude the months of the year which we do not need for the analysis.

OP <- subset(OP, month > 3 & month < 10)

Missing values

Get a summary of missing values for the variables of interest.

Year NA_rain NA_temp NA_rhum
2007 0 0 0
2008 0 0 0
2009 6 6 6
2010 0 0 0
2011 0 0 0
2012 0 0 0
2013 0 0 0
2014 0 0 1
2015 0 0 0
2016 0 0 0

Missing value imputation with cubic spline function works very well up to 8 consecutive values, for variables that have some seasonal frequency, temperature and relative humidity in our case.

year_var NA_rain NA_temp NA_rhum
2007 0 0 0
2008 0 0 0
2009 6 0 0
2010 0 0 0
2011 0 0 0
2012 0 0 0
2013 0 0 0
2014 0 0 0
2015 0 0 0
2016 0 0 0

Rain is somewhat harder to impute but there are ways around this problem, especially when there are only a few values missing. Since rain data is required only in certain rare situations for the model to run, defined within the model, we can use the same conditions to impute missing values outside of that range. We are certain that rain is irrelevant if relative humidity is below 88 % and temperature of 8 C, and these values can then be replaced with 0. This way we will know if rain data is missing in areas of interest.

year_var NA_rain NA_temp NA_rhum
2007 0 0 0
2008 0 0 0
2009 0 0 0
2010 0 0 0
2011 0 0 0
2012 0 0 0
2013 0 0 0
2014 0 0 0
2015 0 0 0
2016 0 0 0

There are no missing values and it is safe to proceed with the analysis.
Save the infilled weather data for the analysis for the analysis.

save(OP, file = here::here("data", "op_2007_16",  "OP_2007-2016_infilled.RData"))

Weather QC

Do some weather data quality control.

Histograms of variables used in the analysis.

We can see that there is no suspicious values in the data. Rain is an often occurrence at Oak Park, although those are not heavy rains, which makes it a very good ground for the spread of crop pathogens.

Weather Summaries

We shall summarise the weather data to get an idea about the general and potato late blight related climate conditions at Oak Park.

 OP %>%
    filter(month %in% c(4:10)) %>%
    summarize(temp = round(mean(temp),1),
              rhum = round(mean(rhum),1)
              ) %>% 
   bind_cols(., OP_rain) %>% 
  kable(caption = "10 Year Averages Weather at Oak Park",format = "html") %>% 
  kableExtra::kable_styling(latex_options = "striped",full_width = FALSE)
10 Year Averages Weather at Oak Park
temp rhum rain
13 80.2 398.31

Night-time temperatures remain low April and May.

Night-time Averages 8PM:6AM
month temp rhum
4 6.6 86.7
5 9.2 86.5
6 11.7 88.7
7 13.4 89.6
8 13.1 89.8
9 11.5 90.6
Monthly Averages
month temp rhum min_temp rain
4 8.8 78.0 4.5 46.3
5 11.3 78.0 7.2 55.9
6 13.9 79.8 9.7 73.0
7 15.4 80.9 11.6 80.1
8 15.0 81.3 11.4 86.2
9 13.3 83.3 9.6 56.8

Night temperatures are low are low in Months of April and May, and only start to kick off in June.

Lets take a closer look into variations in different seasons.

## Picking joint bandwidth of 0.512

Minimum temperatures are interesting because they are a limiting factor for primary cycles of P. infestans early in the season.

Let us take a look into sums of hours with blight favorable conditions.

The number of wet hours that are within potato late blight biological range is rising from April to July. Mid-July is when the blight epidemics usually kick off.

Full Weather Plot

Finally, the full daily weather data is presented below.

