6 Preparing data for analysis

There are multiple ways of preparing data for analysis.

  • create_custom_interpolation: The most simple is to merge the data together (at a specified time resolution) and interpolated or not.
  • create_rolling_window: It’s also possible to derive summary statistics using a rolling window, which progresses across the timeseries to make calculations.
  • create_summary_statistics: Where there is a particular pattern that need to be extracted from the data such as sustained pressure change or activity, this function derives summary statistics for these periods

6.1 Merge sensor data together

Because data from different sensors are collected at different temporal resolutions (e.g. 5 minutes, 30 mintues or4 hours), reducePAM formats data to the same time intervals as a specified variable (e.g. pressure) by summarising finer resolution data (median, sum or skip) and interpolating (or not) lower resolution data.

6.1.1 Interpolation

Format it for every 30 mins and interpolate data with larger intervals, and provide median for data with smaller intervals.

Table 6.1: A table of the first 10 rows of a reducePAM dataset.
date pressure light pit act temperature gX gY gZ mX mY mZ
2015-08-01 00:00:00 1004 0 24 0 33 796.00 -1993.000 -4741 -2016.000 11156 12528.0
2015-08-01 00:30:00 1004 0 24 0 33 854.75 -1630.875 -4661 -2759.125 10517 11372.5
2015-08-01 01:00:00 1004 0 24 0 33 913.50 -1268.750 -4581 -3502.250 9878 10217.0
2015-08-01 01:30:00 1004 0 24 0 33 972.25 -906.625 -4501 -4245.375 9239 9061.5
2015-08-01 02:00:00 1004 0 24 0 33 1031.00 -544.500 -4421 -4988.500 8600 7906.0
2015-08-01 02:30:00 1004 0 24 0 33 1089.75 -182.375 -4341 -5731.625 7961 6750.5

6.1.2 No interpolation

Format it for every 5 minutes and don’t interpolate anything

6.2 Rolling window

Interpolation is not always advisable (especially linear), and another alternative for formatting data for analysis is to use a rolling window with create_rolling_window, which progresses across all the timeseries and creates summary statistics for the data contained within that window of a certain time.

Derived variables include:

  • median : Median
  • sd : Standard deviation
  • sum : Cumulative sum of values
  • min : Minimum
  • max : Maximum
  • range : Range (i.e. maximum - minimum)
  • cumu_diff : Cumulative difference (i.e. sum of absolute differences)

6.2.1 Interpolation

Create a 2h window with summary statistics every 15 minutes. Because sensors such as the magnetometer record every 4 hours, we can avoid spaces in the dataset by interpolating between points (linearly) and then calculating summary statistics for these interpolated points.

Table 6.2: A table of the first 10 rows of a reducePAM dataset.
date pressure light pit act temperature gX gY gZ mX mY mZ median_pressure median_light median_pit median_act median_temperature median_gX median_gY median_gZ median_mX median_mY median_mZ sd_pressure sd_light sd_pit sd_act sd_temperature sd_gX sd_gY sd_gZ sd_mX sd_mY sd_mZ sum_pressure sum_light sum_pit sum_act sum_temperature sum_gX sum_gY sum_gZ sum_mX sum_mY sum_mZ min_pressure min_light min_pit min_act min_temperature min_gX min_gY min_gZ min_mX min_mY min_mZ max_pressure max_light max_pit max_act max_temperature max_gX max_gY max_gZ max_mX max_mY max_mZ cumu_diff_pressure cumu_diff_light cumu_diff_pit cumu_diff_act cumu_diff_temperature cumu_diff_gX cumu_diff_gY cumu_diff_gZ cumu_diff_mX cumu_diff_mY cumu_diff_mZ range_pressure range_light range_pit range_act range_temperature range_gX range_gY range_gZ range_mX range_mY range_mZ
2015-08-01 00:45:00 1004 0 24 0 33 884.125 -1449.8125 -4621 -3130.688 10197.5 10794.75 1004 0 24 0 33 898.8125 -1359.2812 -4601 -3316.469 10037.75 10505.875 0.0000000 0 0.3535534 0 0 71.95376 443.5107 97.97959 910.1385 782.612 1415.193 8032.0 0 191 0 264 7190.5 -10874.25 -36808 -26531.75 80302 84047 1004.0 0 23 0 33 796.000 -1993.000 -4741 -4616.938 8919.5 8483.75 1004 0 24 0 33 1001.625 -725.5625 -4461 -2016.000 11156.0 12528.00 0.0 0 2 0 0 205.625 1267.438 280 2600.938 2236.5 4044.25 0.0 0 1 0 0 205.625 1267.438 280 2600.938 2236.5 4044.25
2015-08-01 01:00:00 1004 0 23 0 33 913.500 -1268.7500 -4581 -3502.250 9878.0 10217.00 1004 0 24 0 33 928.1875 -1178.2188 -4561 -3688.031 9718.25 9928.125 0.0000000 0 0.3535534 0 0 71.95376 443.5107 97.97959 910.1385 782.612 1415.193 8032.0 0 191 0 264 7425.5 -9425.75 -36488 -29504.25 77746 79425 1004.0 0 23 0 33 825.375 -1811.938 -4701 -4988.500 8600.0 7906.00 1004 0 24 0 33 1031.000 -544.5000 -4421 -2387.562 10836.5 11950.25 0.0 0 2 0 0 205.625 1267.438 280 2600.938 2236.5 4044.25 0.0 0 1 0 0 205.625 1267.438 280 2600.938 2236.5 4044.25
2015-08-01 01:15:00 1004 0 24 0 33 942.875 -1087.6875 -4541 -3873.812 9558.5 9639.25 1004 0 24 0 33 957.5625 -997.1562 -4521 -4059.594 9398.75 9350.375 0.0000000 0 0.3535534 0 0 71.95376 443.5107 97.97959 910.1385 782.612 1415.193 8032.0 0 191 0 264 7660.5 -7977.25 -36168 -32476.75 75190 74803 1004.0 0 23 0 33 854.750 -1630.875 -4661 -5360.062 8280.5 7328.25 1004 0 24 0 33 1060.375 -363.4375 -4381 -2759.125 10517.0 11372.50 0.0 0 2 0 0 205.625 1267.438 280 2600.938 2236.5 4044.25 0.0 0 1 0 0 205.625 1267.438 280 2600.938 2236.5 4044.25
2015-08-01 01:30:00 1004 0 24 0 33 972.250 -906.6250 -4501 -4245.375 9239.0 9061.50 1004 0 24 0 33 986.9375 -816.0938 -4481 -4431.156 9079.25 8772.625 0.0000000 0 0.3535534 0 0 71.95376 443.5107 97.97959 910.1385 782.612 1415.193 8032.0 0 191 0 264 7895.5 -6528.75 -35848 -35449.25 72634 70181 1004.0 0 23 0 33 884.125 -1449.812 -4621 -5731.625 7961.0 6750.50 1004 0 24 0 33 1089.750 -182.3750 -4341 -3130.688 10197.5 10794.75 0.0 0 2 0 0 205.625 1267.438 280 2600.938 2236.5 4044.25 0.0 0 1 0 0 205.625 1267.438 280 2600.938 2236.5 4044.25
2015-08-01 01:45:00 1004 0 24 0 33 1001.625 -725.5625 -4461 -4616.938 8919.5 8483.75 1004 0 24 0 33 1016.3125 -635.0312 -4441 -4802.719 8759.75 8194.875 0.1767767 0 0.3535534 0 0 71.95376 443.5107 97.97959 910.1385 782.612 1415.193 8031.5 0 191 0 264 8130.5 -5080.25 -35528 -38421.75 70078 65559 1003.5 0 23 0 33 913.500 -1268.750 -4581 -6103.188 7641.5 6172.75 1004 0 24 0 33 1119.125 -1.3125 -4301 -3502.250 9878.0 10217.00 0.5 0 1 0 0 205.625 1267.438 280 2600.938 2236.5 4044.25 0.5 0 1 0 0 205.625 1267.438 280 2600.938 2236.5 4044.25
2015-08-01 02:00:00 1004 0 24 0 33 1031.000 -544.5000 -4421 -4988.500 8600.0 7906.00 1004 0 24 0 33 1045.6875 -453.9688 -4401 -5174.281 8440.25 7617.125 0.3720119 0 0.0000000 0 0 71.95376 443.5107 97.97959 910.1385 782.612 1415.193 8030.5 0 192 0 264 8365.5 -3631.75 -35208 -41394.25 67522 60937 1003.0 0 24 0 33 942.875 -1087.688 -4541 -6474.750 7322.0 5595.00 1004 0 24 0 33 1148.500 179.7500 -4261 -3873.812 9558.5 9639.25 1.0 0 0 0 0 205.625 1267.438 280 2600.938 2236.5 4044.25 1.0 0 0 0 0 205.625 1267.438 280 2600.938 2236.5 4044.25

6.2.2 No interpolation

However, there are many assumpations made with assumptions (i.e. is the data truly linear). One option is either to increase the window to be larger than the greatest data resolution (in this case more than 4 hours). Another is to simply leave the NAs in the data using interp = FALSE

6.3 Extracting statistics for specific data patterns

If working with bird data, pamlr offers some predefined functions for classifying behaviour.

  • Flight bouts can be characterised by:

    • continuous high activity which can be extracted from the data using create_summary_statistics( ... ,method = "flap")
    • endurance activity using create_summary_statistics( ... ,method = "endurance")
    • a pressure change greater than the background pressure changes due to weather using create_summary_statistics( ... ,method = "pressure")
    • a period of continuous light using create_summary_statistics( ... ,method = "light")
  • Incubation bouts can be characterised by:

    • periods of darkness using create_summary_statistics( ... ,method = "darkness")
    • periods of resting using create_summary_statistics( ... ,method = "rest")
Table 6.3: A table of the first 10 rows of a reducePAM dataset.
date start end duration total_daily_duration total_daily_event_number cum_pressure_change cum_altitude_change cum_altitude_up total_daily_P_change P_dep_arr pressure_range altitude_range mean_night_P sd_night_P mean_nextnight_P sd_nextnight_P night_P_diff median_activity sum_activity prop_resting prop_active mean_night_act sd_night_act sum_night_act mean_nextnight_act sd_nextnight_act sum_nextnight_act night_act_diff median_pitch sd_pitch median_light nightime median_gX median_gY median_gZ median_mX median_mY median_mZ median_temp sd_temp cum_temp_change
2015-08-01 2015-08-01 12:05:00 2015-08-01 12:30:00 0.4166667 0.9166667 4 0 0 0 0 0 0 0 1001.938 0.4425306 1004.938 0.4425306 3.00 16 100 0.1666667 0.8333333 0.0210526 0.1443214 2 0.0107527 0.1036952 1 0.0102999 20 7.842194 9984 0 NA NA NA NA NA NA 41 NA 0
2015-08-01 2015-08-01 15:10:00 2015-08-01 15:20:00 0.1666667 0.9166667 4 0 0 0 0 1 NA NA 1001.938 0.4425306 1004.938 0.4425306 3.00 27 56 0.0000000 1.0000000 0.0210526 0.1443214 2 0.0107527 0.1036952 1 0.0102999 36 10.392305 9984 0 NA NA NA NA NA NA NA NA 0
2015-08-01 2015-08-01 04:30:00 2015-08-01 04:40:00 0.1666667 0.9166667 4 0 0 0 0 0 0 0 1001.938 0.4425306 1004.938 0.4425306 3.00 30 76 0.3333333 0.6666667 0.0210526 0.1443214 2 0.0107527 0.1036952 1 0.0102999 20 9.451631 424 0 NA NA NA NA NA NA 34 NA 0
2015-08-01 2015-08-01 10:00:00 2015-08-01 10:10:00 0.1666667 0.9166667 4 0 0 0 0 0 0 0 1001.938 0.4425306 1004.938 0.4425306 3.00 43 113 0.3333333 0.6666667 0.0210526 0.1443214 2 0.0107527 0.1036952 1 0.0102999 13 4.509250 9984 0 NA NA NA NA NA NA 39 NA 0
2015-08-02 2015-08-02 11:00:00 2015-08-02 11:10:00 0.1666667 1.4166667 6 0 0 0 0 0 0 0 1004.938 0.4425306 1001.188 0.5439056 3.75 19 39 0.3333333 0.6666667 0.0107527 0.1036952 1 0.0319149 0.2296387 3 0.0211622 21 8.144528 9984 0 NA NA NA NA NA NA 40 NA 0
2015-08-02 2015-08-02 11:30:00 2015-08-02 11:50:00 0.3333333 1.4166667 6 0 0 0 0 0 0 0 1004.938 0.4425306 1001.188 0.5439056 3.75 23 109 0.2000000 0.8000000 0.0107527 0.1036952 1 0.0319149 0.2296387 3 0.0211622 30 6.913754 9984 0 NA NA NA NA NA NA 39 NA 0

These functions also calculate summary statistics for each event (e.g. flight bout).

These include:

  • date : Date (without time)
  • start : Start time and date of the event, POSIXct format
  • end : Time and date that the event finished, POSIXct format
  • duration : How long it lasted (in hours)
  • total_daily_duration : The total duration of all the events that occured that day (in hours)
  • total_daily_event_number : The total number of events which occured that day
  • cum_pressure_change : The cumulative change in atmospheric pressure during that event (in hectopascals)
  • cum_altitude_change : The cumulative change in altitude during that event (in metres)
  • cum_altitude_up : The cumulative number of metres that the bird went upwards during that event
  • total_daily_P_change : The cumulative change in atmospheric pressure for all the events for that date (in hectopascals)
  • P_dep_arr : The difference between atmospheric pressure at the start of the event, and at the end (in hectopascals)
  • pressure_range : The total range of the atmospheric pressure during that event (maximum minus miniimum - in hectopascals)
  • altitude_range : The total altitude range during that event (maximum minus miniimum - in metres)
  • mean_night_P : The mean pressure during the night before the event took place (in hectopascals)
  • sd_night_P : The standard deviation of pressure the night before the event took place (in hectopascals)
  • mean_nextnight_P : The mean pressure the night after the event took place (in hectopascals)
  • sd_nextnight_P : The standard deviation of pressure the night after the event took place (in hectopascals)
  • night_P_diff : The difference between the mean pressures of the night before and the night after the event took place (in hectopascals)
  • median_activity : The median activity during that event
  • sum_activity : The sum of the activity during that event
  • prop_resting : The propotion of time during that event where activity = 0
  • prop_active : The propotion of time during that event where activity > 0
  • mean_night_act : The mean activity during the night before the event took place
  • sd_night_act : The standard deviation of activity the night before the event took place
  • sum_night_act : The summed activity during the night before the event took place
  • mean_nextnight_act :The mean activity the night after the event took place
  • sd_nextnight_act : The standard deviation of activity the night after the event took place
  • sum_nextnight_act : The summed activity the night after the event took place
  • night_act_diff : The difference between the mean activity of the night before and the night after the event took place
  • median_pitch : The median pitch during that event
  • sd_pitch : The standard deviation of pitch during that event
  • median_light : The median light recordings during that event
  • nightime : Whether or not it was night during the majority of the event (1= night, 0 = day)
  • median_gX : Median raw acceledation on the x axis during the event
  • median_gY : Median raw acceledation on the y axis during the event
  • median_gZ : Median raw acceledation on the z axis during the event
  • median_mX : Median raw magnetic field on the x axis during the event
  • median_mY : Median raw magnetic field on the y axis during the event
  • median_mZ : Median raw magnetic field on the z axis
  • median_temp : Median temperature during the event (in celsius)
  • sd_temp : Standard deviation of temperature during the event (in celsius)
  • cum_temp_change : Cumulative absolute difference in temperature during the event (in celsius)