Emergent classes

Using the exposure as a measure of interaction between classes, we can aggregate the different categories into classes based on their mutual interaction/repulsion.

Measures


cluster_categories (distribution, exposure)

At each step of the aggregation, we look for the pair $(\beta, \delta)$ of categories that has the highest exposure (renormalised by the maximum possible value). We aggregate the pair in a new category $\gamma$ whose exposure with the other categories $\alpha$ is given by

$$ E_{\alpha, \gamma} = \frac{1}{N_\beta + N_\delta} \left( N_\beta\, E_{\alpha, \beta} + N_\delta\, E_{\alpha, \delta} \right) $$

The function returns a linkage matrix that encodes the hierarchical tree. At the $i$th iteration of the algorithm, $L[i,0]$ and $L[i,1]$ are aggregated to form the $n+i$th cluster. The exposure between $L[i,1]$ and $L[i,0]$ is given by $L[i,3]$, the variance in the corresponding unsegregated city is given by $L[i,4]$.

Parameters

  • distribution dictionary

    Takes a dictionary of dictionaries with
    distribution[areal_unit][category] = number

  • exposure dictionary

    Takes a dictionary dictionary of dictionaries, the result of the exposure function with
    exposure[$\alpha$][$\beta$] = ($E_{\alpha \beta}$, $\mathrm{Var}(E_{\alpha \beta})$)

Output

  • linkage matrix

    Returns a list of lists. See above for the description of the linkage matrix's structure.


uncover_classes (distribution, exposure, ci_factor=10)

The classes are uncovered using the spatial repartition of individuals from different categories, using their relative exposure.

We only aggregate the pair $(\beta$,$\delta)$ in the same class if the two categories attract each other, that is if the exposure

$$E_{\beta, \delta} > 1 + 10\; \sigma_{\beta, \delta}$$

($99\%$ CI according to the Chebyshev inequality). The aggregation procedure may therefore stop before all categories are aggregated in one unique class, and output the repartition of the original categories into classes.

Parameters

  • distribution dictionary

    Takes a dictionary of dictionaries with
    distribution[areal_unit][category] = number

  • exposure dictionary

    Takes a dictionary dictionary of dictionaries, the result of the exposure function with
    exposure[$\alpha$][$\beta$] = ($E_{\alpha \beta}$, $\mathrm{Var}(E_{\alpha \beta})$)

  • ci_factor float

    Number of standard deviations over which we consider to have a $99\%$ confidence interval on the exposure value. The default value $10$, is the upper bound given by Chebyshev's inequality.

Output

  • classes nested lists

    Returns a list of lists. Each list corresponds to a class with
    classes[$i$] = [categories in class $i$]