Example: Location Estimates of Population and Murder Rates¶
In [1]:
Copied!
from scipy.stats import trim_mean
import pandas as pd
import numpy as np
import wquantiles
from scipy.stats import trim_mean
import pandas as pd
import numpy as np
import wquantiles
In [2]:
Copied!
state = pd.read_csv('../data/state.csv')
state.sort_values("Population", ascending=False).head(10)
state = pd.read_csv('../data/state.csv')
state.sort_values("Population", ascending=False).head(10)
Out[2]:
| State | Population | Murder.Rate | Abbreviation | |
|---|---|---|---|---|
| 4 | California | 37253956 | 4.4 | CA |
| 42 | Texas | 25145561 | 4.4 | TX |
| 31 | New York | 19378102 | 3.1 | NY |
| 8 | Florida | 18801310 | 5.8 | FL |
| 12 | Illinois | 12830632 | 5.3 | IL |
| 37 | Pennsylvania | 12702379 | 4.8 | PA |
| 34 | Ohio | 11536504 | 4.0 | OH |
| 21 | Michigan | 9883640 | 5.4 | MI |
| 9 | Georgia | 9687653 | 5.7 | GA |
| 32 | North Carolina | 9535483 | 5.1 | NC |
In [3]:
Copied!
trim_mean(state["Population"], 0.1)
trim_mean(state["Population"], 0.1)
Out[3]:
np.float64(4783697.125)
In [4]:
Copied!
state["Population"].median()
state["Population"].median()
Out[4]:
np.float64(4436369.5)
In [5]:
Copied!
np.average(state["Murder.Rate"], weights=state["Population"])
np.average(state["Murder.Rate"], weights=state["Population"])
Out[5]:
np.float64(4.445833981123393)
- By using the weighted mean, we are actually answering the question, “If we had chosen a random residential area in the US, what would the murder rate be there?”
- If we had only taken the mean of
state["Murder.Rate"], for exapmle, we would have treated Alaska and California as regions with the same population.
In [6]:
Copied!
wquantiles.median(state["Murder.Rate"], weights=state["Population"])
wquantiles.median(state["Murder.Rate"], weights=state["Population"])
Out[6]:
np.float64(4.4)
In this context, median means that half of the Americans live in states with murder rates below this value. However, since we used the Population feature as a weight, result is more accurate and avoids the mistake mentioned above.