| Journal
of Agricultural, Biological, and Environmental Statistics A journal of applied statistics. Published by the American Statistical Association and the International Biometric Society. |
Recent advances in technology for sampling diving behavior have enabled enormous datasets to be collected on a variety of diving animals. Methods used to analyze these data vary considerably across studies, complicating interspecific comparisons. The primary problem is that methods for analyzing large datasets of dive profiles have not been clearly defined. This study examines various algorithms for analyzing multivariate observations and assesses their suitability for classifying diving data. These include k-means and fuzzy c-means clustering techniques from the field of statistics, and Kohonen self-organizing map (SOM) and fuzzy adaptive resonance theory (ART) from the field of artificial neural networks. A Monte Carlo simulation was performed on artificially generated data, with known solutions, to test clustering performance under various conditions (i.e., well defined or overlapping groups, varying numbers of variables, varying numbers of groups, and autocorrelated or independent variables). Performance was also tested on real datasets from Adélie penguins (Pygoscelis adeliae), southern elephant seals (Mirounga leonina), and Weddell seals (Leptonychotes weddellii). K-means, fuzzy c-means, and SOM all performed equally well on the artificially generated data while fuzzy ART had misclassification rates that were twice as high. All techniques showed decreasing performance with increasing overlap among groups and increasing numbers of groups, but increasing performance with increasing numbers of variables. Fuzzy ART was the most sensitive to the varying simulation parameters. When clustering real data, both c-means and SOM classified observations into clusters that were closer together (relative to k-means) and hence had less distinct boundaries separating the clusters. K-means performed as well as c-means and SOM, but its classification of real data was more logical when compared to the actual dive profiles. K-means is also the most readily available technique on statistical software packages. Considering all of these factors, k-means appears to be the best method among those examined for grouping dive profiles.
Key Words
Adaptive resonance theory; Adélie penguin; Air-breathing
vertebrate; ART; Cluster analysis; Diving behavior; Fuzzy c-means; k-means; Kohonen self-organizing map; Quantitative dive analysis; SOM; Southern elephant seal; Unsupervised
learning; Weddell seal.
Jason F. Schreer is Post Doctoral Fellow, Department of Biology, University of Waterloo, Waterloo, ON, Canada N2L 3G1 (E-mail: jfschree@sciborg.uwaterloo.ca). R. J. O'Hara Hines is Associate Professor, Department of Statistics and Actuarial Science and Department of Biology, University of Waterloo. Kit M. Kovacs is Associate Professor, UNIS, 9170 Longyearbyen, Svalbard, Norway.
Copyright © 1998 American Statistical Association and the International Biometric Society. All rights reserved.