PajaroLoco was a Mathematica package developed to perform linguistic analysis on birds’ tagged songs through the use of network theory and clustering techniques.
For around three years during my PhD under Edgar Vallejo’s supervision, we collaborated with Charle Taylor’s lab @ UCLA in developing software tools to understand birds’ vocalizations. Specifically, we were interested in studying how birds’ songs are structured by using network theory and clustering techniques.
Tools and Techniques
Most of our work revolved around developing tools that allowed evolutionary biologists to automate their analyses on animal vocalizations. With that in mind, we developed a user interface and various tutorials to allow researchers to use and extend our package. Additionally, we collaborated in some of the analyses for scientific publications.
Before doing any sorts of analysis, field biologists went to the field to record bird songs. This was done in specific regions and thorough notes were taken on birds’ behaviors and interactions with other individuals. Once these songs were recorded, they were taken back to lab and tagged by them into “linguistic units”, which would represent the different sounds birds uttered by birds in their songs. After this process took place, we were handed CSV files with the timing information for all the phrases sung by individuals.
The initial part of most of our analyses revolved around counting phrases frequencies, checking how often the song transitioned from one phrase to another, and checking for probability distributions. This was usually an exploratory and data-cleaning phase to check that everything was in good shape for more complex analyses.
Now, one of our most promising features was the analysis of probability networks between the transitions of all the phrases in individual’s songs. To do this, we’d calculate all the transition probabilities amongst the phrases used by birds across time, and then perform clustering analyses upon the resulting networks. Back in those days, Mathematica didn’t use to have built-in algorithms for clustering of weighted graphs, so we relied on calling sub-routines from iGraphR and kjahan’s implementation on Newman’s clustering algorithm.
With those sub-routines in place, we were able to determine the patterns in phrases that birds in different species would use when communicating in their social groups.
Collocations and Alignments
Additionally, we were also able to perform some more advanced analyses such as collocations and sequence alignments within our software package by calling the external libraries: NLTK and LingPy.