In [1]:
Copied!
import pandas as pd
import numpy as np
import pandas as pd
import numpy as np
In [2]:
Copied!
state = pd.read_csv('../data/state.csv')
murder = state["Murder.Rate"]
state = pd.read_csv('../data/state.csv')
murder = state["Murder.Rate"]
In [3]:
Copied!
murder.quantile([0.05, 0.25, 0.5, 0.75, 0.95])
murder.quantile([0.05, 0.25, 0.5, 0.75, 0.95])
Out[3]:
0.05 1.600 0.25 2.425 0.50 4.000 0.75 5.550 0.95 6.510 Name: Murder.Rate, dtype: float64
Percentiles are valuable for summarizing the entire distribution.
In [4]:
Copied!
ax = (state['Population']/1_000_000).plot.box()
ax.set_ylabel('Population (millions)')
ax = (state['Population']/1_000_000).plot.box()
ax.set_ylabel('Population (millions)')
Out[4]:
Text(0, 0.5, 'Population (millions)')
- The green line indicates the median,
- Top and bottom of the box represent the IQR,
- Whiskers extend up to $IQR \times 1.5$, from $Q1 - 1.5\times IQR$ to $Q3 + 1.5\times IQR$
- Any data outside of the whiskers is plotted as circles, often considered outliers.
Frequency Tables and Histograms¶
In [5]:
Copied!
binnedPop = pd.cut(state['Population'], 10)
binnedPop.value_counts()
binnedPop = pd.cut(state['Population'], 10)
binnedPop.value_counts()
Out[5]:
Population (526935.67, 4232659.0] 24 (4232659.0, 7901692.0] 14 (7901692.0, 11570725.0] 6 (11570725.0, 15239758.0] 2 (15239758.0, 18908791.0] 1 (18908791.0, 22577824.0] 1 (22577824.0, 26246857.0] 1 (33584923.0, 37253956.0] 1 (26246857.0, 29915890.0] 0 (29915890.0, 33584923.0] 0 Name: count, dtype: int64
In [6]:
Copied!
ax = state["Population"].plot.hist(figsize=(4,4))
ax.set_xlabel("Population")
ax = state["Population"].plot.hist(figsize=(4,4))
ax.set_xlabel("Population")
Out[6]:
Text(0.5, 0, 'Population')
Density Plots and Estimates¶
In [7]:
Copied!
ax = state["Murder.Rate"].plot.hist(density=True, xlim=[0,11], bins=range(0,12), figsize=[4,4])
state["Murder.Rate"].plot.density(ax=ax, bw_method=0.3)
ax.set_xlabel("Murder Rate (per 100,000)")
ax = state["Murder.Rate"].plot.hist(density=True, xlim=[0,11], bins=range(0,12), figsize=[4,4])
state["Murder.Rate"].plot.density(ax=ax, bw_method=0.3)
ax.set_xlabel("Murder Rate (per 100,000)")
Out[7]:
Text(0.5, 0, 'Murder Rate (per 100,000)')