Clustering Mosquito Movement


Still being written. Sorry about the stubs and errors!

Still being written. Sorry about the stubs and errors!


Code Description

Detect Clusters

Now, we need to cluster our data in two ways: migration-based, and spatially. We will be using different frameworks for each, and…

Network Communities

For the network community detection, we used the cdlib, which allows us to use a plaethora of algorithms such as Girvan-Newmann, Markov clustering, eigenvector, infomap, spectral; amonst many others.

Spatial Clusters

To do the spatial clustering, we decided to use scikit-learn, several alternatives are available, such as k-means, agglomerative, DBSCAN, and spectral (amongst others).

Clusters Alignment

Now, we would like to know how much do both algorithms overlap in their detections. Unfortunately, there is no way to know which cluster corresponds to which community between the two algoritms, so we needed to figure out a way to automatically “align” the output from both.

The way we did this was with the use of a genetic algorithm. We encoded the problem with a list defined as:

chromosome = [clustersIDs] + [communitiesIDs]

where the length of the chromosome is 2n where n is the number of communities (or clusters). Every element of these lists maps one cluster to one community in the form:

chromosome[1]   : ID of the first cluster identified
chromosome[n]   : ID of the last cluster identified
chromosome[n+1] : ID of the first community identified
chromosome[2n]  : ID of the last community identified

And where we can retrieve the indices of the nodes in the map that belong to each grouping by drawing them with their IDs from their detection dictionaries:

clusters    = {1: nodesInClst_1, 2: nodesInClst_2, ..., n: nodesInClst_n}
communities = {1: nodesInComm_1, 2: nodesInComm_2, ..., n: nodesInComm_n}

The general idea behind the optimization routine is to order these lists in the way that maximizes the matches between the paired elements of the entries.


The way we initialize our chromosome populations is fairly straightforward, we simply generate shuffled versions of the lists of indices for clusters and communities, and assemble them together:

chromosome = sample([1, 2, ..., n-1, n]) + sample([1, 2, ..., n-1, n])


The mutation operator is also quite straigtforward, we just have to do a swap mutation operation with the caveat that we must do each one of the sections of our chromosome independently (no inter-zone mutation allowed).

chromosome = [ 1,  2, ...,  n-1,  n| n+1, n+2, ..., 2n-1, 2n]
             |---- swap within ----|----  swap within ------|


Same thing applies with our crossover operator, we are using an OX1 crossover function to each part of our chromosomes:

chromosomeA = [1, 2, ..., n-1, n | n+1, n+2, ..., 2n-1, 2n]
chromosomeB = [1, 2, ..., n-1, n | n+1, n+2, ..., 2n-1, 2n]

offspring = OX1(chromosomeA[1:n], chromosomeB[1:n]) + OX1(chromosomeA[n:2n], chromosomeB[n:2n])


Now, to calculate the fitness of our solution, or how close are we to find a good alignment between the clusters and communities, we take each pairwise mapping of indices, then retrieve the set of nodes contained in the structures’ indices, and calculate the size of the intersection between the two sets:

alignment[i] = len(clusters[chromosome[i]]  clusters[chromosome[n+i]])

We repeat the same process for all the pairwise elements of the list and add them all together:

fitness = sum(alignment)


Code Repo