Happy 2s Day.
I was chatting with my brother the other day, and he opined that I may live in Canada’s largest municipality that doesn’t have a municipal bus transit system. Of course I was interested to find out which is the biggest Canadian city without buses, so here we go.
I was very pleased to find that pandas will read an html page and output a list of dataframes that it was able to parse from tables. Pair this with wikipedia keeping a list of Canadian municipalities with public transport and we have a good start:
import pandas as pd
import requests
transit_url = "https://en.wikipedia.org/wiki/Public_transport_in_Canada"
r = requests.get(transit_url)
df_list = pd.read_html(r.text)
transit_municipalities = df_list[0]
Name | Municipalstatus[3][6] | County[15] | Incorporationyear[16] | 2021 Census of Population[15] | |||||
---|---|---|---|---|---|---|---|---|---|
Name | Municipalstatus[3][6] | County[15] | Incorporationyear[16] | Population(2021) | Population(2016) | Change | Land area(km²) | Populationdensity | |
0 | Charlottetown | City | Queens | 1855 | 38809 | 36094 | +7.5% | 44.27 | 2.0 |
1 | Summerside | City | Prince | 1877[c] | 16001 | 14839 | +7.8% | 28.21 | 2.0 |
2 | Alberton | Town | Prince | 1913 | 1301 | 1145 | +13.6% | 4.70 | 2.0 |
3 | Borden-Carleton | Town | Prince | 1995[d] | 788 | 724 | +8.8% | 12.94 | 2.0 |
4 | Cornwall | Town | Queens | 1995 | 6574 | 5348 | +22.9% | 28.21 | 2.0 |
Next, use the same technique to find a list of municipalities in each province.
provinces = ['British Columbia', 'Alberta', 'Saskatchewan', 'Manitoba', 'Ontario', 'Quebec',
'New Brunswick', 'Newfoundland and Labrador', 'Nova Scotia', 'Prince Edward Island']
p_dfs = {}
def p_url(p):
return f"https://en.wikipedia.org/wiki/List_of_municipalities_in_{p.replace(' ', '_')}"
for province in provinces:
print(f"getting municipalities for {province}")
r = requests.get(p_url(province))
df_list = pd.read_html(r.text)
if province == 'Ontario' or province == 'Manitoba': # different html for these two.
p_dfs[province] = df_list[1]
else:
p_dfs[province] = df_list[0]
print(f"{province} has {len(p_dfs[province])} municipalities")
Clean that data up a bit, and work around the annoyingness of MultiIndex
:
canada_dfs = []
def find_series_by_column(df, look_for):
for x in df.columns:
if isinstance(x, tuple):
if look_for in x[1]:
pop_column = x[0]
return df[x[0]][x[1]]
break
for province, df in p_dfs.items():
prov_series = pd.Series([province for x in range(len(df.index))])
pop_series = find_series_by_column(df, 'Population')
name_series = find_series_by_column(df, 'Name')
canada_dfs.append(pd.DataFrame({"Province": prov_series, "Name": name_series, "Population": pop_series}))
canada_df = pd.concat(canada_dfs)
Gives data like this:
Province | Name | Population | |
---|---|---|---|
378 | Ontario | Timmins | 41788 |
890 | Quebec | Saint-Thomas | 3249 |
145 | Saskatchewan | Tisdale | 2962.0 |
827 | Quebec | Saint-Patrice-de-Sherrington | 1960 |
45 | British Columbia | Terrace | 12017 |
41 | Manitoba | Louise | 2025 |
806 | Quebec | Saint-Modeste | 1162 |
88 | Saskatchewan | Langenburg | 1228.0 |
395 | Quebec | Mont-Tremblant | 9646 |
52 | Saskatchewan | Davidson | 1044.0 |
Do an anti-join to find the muncipalities without transit:
with_transit_df = canada_df.merge(trans_df, on=['Name', 'Province'], indicator=True, how='left')
without_transit_df = with_transit_df.loc[with_transit_df._merge == 'left_only', :].drop(columns='_merge')
A little data cleanup:
import re
without_transit_df['Population2'] = without_transit_df['Population'].fillna("0").map(lambda x:
str(re.sub("\[\d+\]", "", str(x)))
)
without_transit_df['Population2'] = without_transit_df['Population2'].map(
lambda x: str(x).replace(".0", "").replace("nan", "0").replace(",", "")
).replace('', 0).astype('int32')
And sort:
Province | Name | Population2 | |
---|---|---|---|
47 | British Columbia | Vancouver | 662248 |
44 | British Columbia | Surrey | 568322 |
1186 | Ontario | Markham | 328966 |
1356 | Ontario | Vaughan | 306233 |
2 | British Columbia | Burnaby | 249125 |
1156 | Ontario | Kitchener | 233222 |
41 | British Columbia | Richmond | 209937 |
1286 | Ontario | Richmond Hill | 195022 |
1251 | Ontario | Oshawa | 159458 |
7 | British Columbia | Coquitlam | 148625 |
19 | British Columbia | Kelowna | 144576 |
69 | British Columbia | Langley | 132603 |
1025 | Ontario | Cambridge | 129920 |
1371 | Ontario | Whitby | 128377 |
972 | Ontario | Ajax | 119677 |
85 | British Columbia | Saanich | 117735 |
2437 | Quebec | Terrebonne | 111575 |
2596 | Newfoundland and Labrador | St. John’s | 108860 |
11 | British Columbia | Delta | 108455 |
1360 | Ontario | Waterloo | 104986 |
2872 | Nova Scotia | Cape Breton | 94285 |
1047 | Ontario | Clarington | 92013 |
49 | British Columbia | Victoria | 91867 |
1268 | Ontario | Pickering | 91771 |
23 | British Columbia | Maple Ridge | 90990 |
78 | British Columbia | North Vancouver | 88168 |
1457 | Quebec | Brossard | 85721 |
1886 | Quebec | Repentigny | 84285 |
1221 | Ontario | Newmarket | 84224 |
2496 | New Brunswick | Moncton | 79470 |
28 | British Columbia | New Westminster | 78916 |
1145 | Ontario | Kawartha Lakes | 75423 |
2497 | New Brunswick | Saint John | 69895 |
1022 | Ontario | Caledon | 66502 |
1226 | Ontario | Norfolk | 64044 |
34 | British Columbia | Port Coquitlam | 61498 |
1113 | Ontario | Halton Hills | 61161 |
29 | British Columbia | North Vancouver | 58120 |
1436 | Quebec | Blainville | 56863 |
990 | Ontario | Aurora | 55445 |
1766 | Quebec | Mirabel | 50513 |
1526 | Quebec | Dollard-des-Ormeaux | 48899 |
2881 | Nova Scotia | Kings[g] | 47404 |
1751 | Quebec | Mascouche | 46692 |
21 | British Columbia | Langford | 46584 |
2472 | Quebec | Victoriaville | 46130 |
1372 | Ontario | Whitchurch-Stouffville | 45837 |
1112 | Ontario | Haldimand | 45608 |
1097 | Ontario | Georgina | 45418 |
This really isn’t a satisfactory answer for me to be honest. Most of the entries on this list were either missed in the join (e.g. Metro Vancouver not matching Vancouver, similar issue for York Region), or the municipalities listed are part of a larger municipality that is served by public transit (e.g. Markham). Fixing those issues would be possible, but a lot of work. It was time for the paretto principle. So I looked through the list. This isn’t definitive but I think that Kawartha Lakes, Ontario, (population ~75000), is the largest municipality in Canada without a municipal bus service. They do have a municipal service with these little half-bus, half-van things though, so maybe that counts.