7 Classification methods
Once a behavioural pattern has been identified, there are a number of methods available for separating them from the data. Not all these methods should be used exclusively. Some can be used to complement each other, while others can be used interchangeably. Here we highlight changepoint analyses, principle component analyses, cluster analyses, hidden Markov models,
7.1 Changepoint
Changepoint analyses can be used to find points in the data series which delineate a difference in variance, in mean or in both. There are a number of available methods for doing this and it is possible to constrain the number of changepoints allowed in the data. In pamlr, the default setting in classify_changepoint()
are set for finding differences between migratory and non-migratory periods, but the user can customise these to fit their needs.
7.2 Cluster Analysis
Clustering algorithms aim to partition a set of points into a k number of groups. Points are grouped (or clustered) together in a way that all the points in one cluster are more similar with each other, than with points in other clusters. pamlr integrates three types of clustering algorithms, k-means, hierarchical and expectation-minimisation binary clustering.
7.2.1 k-means clustering
k-means clustering minimises the within-cluster sum of squares of the points and is one of the commonly used clustering algorithms (Hartigan & Wong, 1979). It can be implemented by using classify_summary_statistics( … , method= “kmeans”)
.
7.2.2 Hierarchical clustering
Hierarchical clustering analyses can either be agglomerative or divisive. The first relies on putting each datapoint in a single cluster and successively merging them until a stopping criterion is satisfied and can be implemented in pamlr using classify_summary_statistics( … , method= “agnes”)
. The divisive hierarchical clustering method starts by allocating all datapoints to the same cluster and splitting each cluster until a stopping criterion is met. This can be implemented using classify_summary_statistics( … , method= “diana”)
.
7.2.3 Expectation-minimisation binary clustering
Expectation-minimisation binary clustering (EMbC), clusters data based on geometry alone. More specifically binary delimiters are used to segregate the data along an axis, forcing centroids to lie within these binary regions. Analysis was undertaken using classify_summary_statistics( … , method= “embc”)
.