Presence in areal units
Introduction
Maybe the most fundamental quantity in all studies of segregation is how we quantify the presence of a category in areal units. All other measures are then based on the measure of presence. Several indicators exist, and one needs to be aware of their meaning, their qualities and their shortcomings. Per se, if one's goal is to identify areal units with high levels of segregation everything else being equal, one should use the representation defined below. Other measures can have their use, but are biaised and thus cannot be used for an assessment of segregation.
Measures
concentration (distribution, classes=None)
The concentration measures the proportion of individuals from category
The concentration has the nice property to be composition-invariant, that is it
does not depend on the relative proportion of category
However, it strongly depends on the total population of the areal unit we are studying: more population areal units will mechanically lead to higher values of the concentration. Segregation measures based on the concentration (such as the dissimilarity index) will thefore be dominated by the values in highly population areal units.
Parameters
distribution
dictionaryTakes a dictionary of dictionaries with distribution[areal_unit][category] = number
classes
dictionary, optionalTakes a dictionary of lists with classes[class] = [cat1, cat2, ...]
If not specified, the algorithm will use the categories found indistribution
Output
concentration
dictionaryReturns a dictionary of dictionaries with
concentration[areal_unit][category] = value
proportion (distribution, classes=None)
Sometimes, however, we prefer to know the proportion of people of a given category in an unit. In our notations, it is defined as
Although the values of the proportion are easier to interpret (`x% of the individuals living in this areal units live in this neighbourhood'), they are not a good indicator of segregation. They strongly depend on the relative proportion of individuals of the category in the geographical area being studied.
Parameters
distribution
dictionaryTakes a dictionary of dictionaries with distribution[areal_unit][category] = number
classes
dictionary, optionalTakes a dictionary of lists with classes[class] = [cat1, cat2, ...]
If not specified, the algorithm will use the categories found indistribution
Output
proportion
dictionaryReturns a dictionary of dictionaries with
proportion[areal_unit][category] = value
representation (distribution, classes=None)
The representation solves the problems linked to both measures of concentration
and of representation. The idea behind the measure of representation is that
segregation is a departure from the situation where all categories would be
spatially distributed at random. The properties of such a `random', unsegregated
city are however well known, and the distribution of categories in each areal
unit is given by a binomial distribution. The representation is thus defined as
the number
In the perfectly unsegregated city (number of individuals equal to the mean of
the binomial distribution),
we would like to know the areal units that depart from the random situation with 99% confidence. Therefore, we will say that
$\alpha$ is overrepresented in$t$ iff$r_\alpha(t) > 1 + 2.57\,*\sigma_\alpha(t)$ $\alpha$ is underrepresented in$t$ iff$r_\alpha(t) < 1 + 2.57\,\sigma_\alpha(t)$
Beware
The knowledge of both
Parameters
distribution
dictionaryTakes a dictionary of dictionaries with
distribution[areal_unit][category] = numberclasses
dictionary, optionalTakes a dictionary of lists with classes[class] = [cat1, cat2, ...]
If not specified, the algorithm will use the categories found indistribution
Output
representation
dictionaryReturns a dictionary of dictionaries with representation[areal_unit][category] = (
$r_\alpha(t)$ ,$\mathrm{Var}\left[r_\alpha(t)\right]$ )
Examples
Let us look at how to compute the concentration, proportion and representation for the categories 0, 1 and 2 in a fictional region with two areal units A and B.
>>> import marble as mb
>>> city = {"A":{0: 10, 1:0, 2:23},
"B":{0: 0, 1:10, 2:8}}
>>> co = mb.concentration(city)
>>> print co['A'][0]
1.0
>>> pr = mb.proportion(city)
>>> print pr['A'][0]
0.303
>>> rep = mb.representation(city)
>>> print rep['A'][0]
(1.55, 0.054)