The third dataset we’ll use is Natural Earth’s road dataset. We can obtain this dataset by downloading the shapefile
We will combine these datasets to create the following map of infrastructure in Alaska:
For these notes, all the data is inside a data/ directory at the same level as the notebook.
Import data
Let’s start by loading our libraries and then importing the datasets we will use.
import osimport pandas as pdimport matplotlib.pyplot as pltimport geopandas as gpdfrom shapely.geometry import box # To create polygon bounding boxpd.set_option("display.max.columns", None)# -------------------------------------# Import and simplify states polygonsstates = gpd.read_file(os.path.join('data','tl_2022_us_state','tl_2022_us_state.shp') )# Import Natural Earth populated places pointsplaces = gpd.read_file(os.path.join('data','ne_50m_populated_places_simple','ne_50m_populated_places_simple.shp') )# Import ferry routes linesroads = gpd.read_file(os.path.join('data','ne_10m_roads','ne_10m_roads.shp') )
Check-in
Use a for loop to iterate over the three geo-dataframes we imported and change their column names to lower caps.
Prepare Alaska multipolygon
Let’s start by taking taking a look at our states geo-dataframe. Since this is a geospatial dataset, exploration should include at least checking the head of the dataset, plotting the data, and looking at its CRS.
print(f"CRS: {states.crs}")states.head(3)
CRS: EPSG:4269
region
division
statefp
statens
geoid
stusps
name
lsad
mtfcc
funcstat
aland
awater
intptlat
intptlon
geometry
0
3
5
54
01779805
54
WV
West Virginia
00
G4000
A
62266456923
489045863
+38.6472854
-080.6183274
POLYGON ((-77.75438 39.33346, -77.75422 39.333...
1
3
5
12
00294478
12
FL
Florida
00
G4000
A
138962819934
45971472526
+28.3989775
-082.5143005
MULTIPOLYGON (((-83.10874 24.62949, -83.10711 ...
2
2
3
17
01779784
17
IL
Illinois
00
G4000
A
143778515726
6216539665
+40.1028754
-089.1526108
POLYGON ((-87.89243 38.28285, -87.89334 38.282...
states.plot()
For this lesson, we are intersted in plotting data only Alaska, se let’s select this data:
Notice that the way the Alaska multipolygon is plotted under the NAD83/EPSG:4269 CRS separates the islands and unnaturally elongates the map. To fix this, we will reproject the Alaska geo-dataframe to the EPSG:3338 CRS. This CRS is a projected CRS, better suited for working with data from Alaska:
# Reproject to CRS optimized for Alaskaalaska = alaska.to_crs('epsg:3338')# Inspect the new CRSprint('Is this CRS projected? ', alaska.crs.is_projected)alaska.crs
Is this CRS projected? True
<Projected CRS: EPSG:3338>
Name: NAD83 / Alaska Albers
Axis Info [cartesian]:
- X[east]: Easting (metre)
- Y[north]: Northing (metre)
Area of Use:
- name: United States (USA) - Alaska.
- bounds: (172.42, 51.3, -129.99, 71.4)
Coordinate Operation:
- name: Alaska Albers (meters)
- method: Albers Equal Area
Datum: North American Datum 1983
- Ellipsoid: GRS 1980
- Prime Meridian: Greenwich
alaska.plot()
Prepare populated places points
Let’s now explore the populated places data.
print(f"CRS: {places.crs}")places.head(3)
CRS: EPSG:4326
scalerank
natscale
labelrank
featurecla
name
namepar
namealt
nameascii
adm0cap
capin
worldcity
megacity
sov0name
sov_a3
adm0name
adm0_a3
adm1name
iso_a2
note
latitude
longitude
pop_max
pop_min
pop_other
rank_max
rank_min
meganame
ls_name
max_pop10
max_pop20
max_pop50
max_pop300
max_pop310
max_natsca
min_areakm
max_areakm
min_areami
max_areami
min_perkm
max_perkm
min_permi
max_permi
min_bbxmin
max_bbxmin
min_bbxmax
max_bbxmax
min_bbymin
max_bbymin
min_bbymax
max_bbymax
mean_bbxc
mean_bbyc
timezone
un_fid
pop1950
pop1955
pop1960
pop1965
pop1970
pop1975
pop1980
pop1985
pop1990
pop1995
pop2000
pop2005
pop2010
pop2015
pop2020
pop2025
pop2050
min_zoom
wikidataid
wof_id
capalt
name_en
name_de
name_es
name_fr
name_pt
name_ru
name_zh
label
name_ar
name_bn
name_el
name_hi
name_hu
name_id
name_it
name_ja
name_ko
name_nl
name_pl
name_sv
name_tr
name_vi
ne_id
name_fa
name_he
name_uk
name_ur
name_zht
geonamesid
fclass_iso
fclass_us
fclass_fr
fclass_ru
fclass_es
fclass_cn
fclass_tw
fclass_in
fclass_np
fclass_pk
fclass_de
fclass_gb
fclass_br
fclass_il
fclass_ps
fclass_sa
fclass_eg
fclass_ma
fclass_pt
fclass_ar
fclass_jp
fclass_ko
fclass_vn
fclass_tr
fclass_id
fclass_pl
fclass_gr
fclass_it
fclass_nl
fclass_se
fclass_bd
fclass_ua
fclass_tlc
geometry
0
10
1
5
Admin-1 region capital
Bombo
None
None
Bombo
0
None
0
0
Uganda
UGA
Uganda
UGA
Bamunanika
UG
None
0.583299
32.533300
75000
21000
0.0
8
7
None
None
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
None
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
7.0
Q4940747
1141906025
0
Bombo
Bombo
Bombo
Bombo
Bombo
Бомбо
邦博
None
بومبو
বোম্বো
Μπόμπο
बॉम्बो
Bombo
Bombo
Bombo
ボンボ
봄보
Bombo
Bombo
Bombo
Bombo
Bombo
1159113923
بمبو
בומבו
Бомбо
بومبو
邦博
NaN
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
POINT (32.53330 0.58330)
1
10
1
5
Admin-1 region capital
Fort Portal
None
None
Fort Portal
0
None
0
0
Uganda
UGA
Uganda
UGA
Kabarole
UG
None
0.671004
30.275002
42670
42670
0.0
7
7
None
None
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
Africa/Kampala
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
7.0
Q500107
421174009
0
Fort Portal
Fort Portal
Fort Portal
Fort Portal
Fort Portal
Форт-Портал
波特爾堡
None
فورت بورتال
ফোর্ট পোর্টাল
Φορτ Πορτάλ
फोर्ट पोर्टल
Fort Portal
Fort Portal
Fort Portal
フォート・ポータル
포트포털
Fort Portal
Fort Portal
Fort Portal
Fort Portal
Fort Portal
1159113959
فورت پورتال
פורט פורטל
Форт-Портал
فورٹ پورٹل
波特爾堡
233476.0
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
POINT (30.27500 0.67100)
2
10
1
3
Admin-1 region capital
Potenza
None
None
Potenza
0
None
0
0
Italy
ITA
Italy
ITA
Basilicata
IT
None
40.642002
15.798997
69060
69060
0.0
8
8
None
None
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
Europe/Rome
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
0.0
7.0
Q3543
101752567
0
Potenza
Potenza
Potenza
Potenza
Potenza
Потенца
波坦察
None
بوتنسا
পোটেঞ্জা
Ποτέντσα
पोटेंजा
Potenza
Potenza
Potenza
ポテンツァ
포텐차
Potenza
Potenza
Potenza
Potenza
Potenza
1159117259
پوتنزا
פוטנצה
Потенца
پوتینتسا
波坦察
3170027.0
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
None
POINT (15.79900 40.64200)
places.plot()
This dataset has the EPSG:4326 CRS. Remember this is the EPSG code for the WGS 84 CRS. This is not a surprise since the places data is global and EPSG:4326/WGS84 is the most widely used CRS for such data.
Let’s see what happens when we try to plot this data on top of Alaska:
This is better: we can see there the Alaska poygons and some of the places points on top of it. Our next step is to select these points.
Clipping
Clipping means using a polygon (or polygons) to only select geospatial data within them. Clipping a geopandas.GeoDataFrame is simple using the geopandas clip() function. The general syntax is:
updated_geodf = geopandas.clip(geodf, mask)
where:
updated_geodf is the output of the method: the intersection of the geometries in geodf with mask,
geodf is the geopandas.GeoDataFrame we want to clip,
mask is a geopandas.GeoDataFrame with the polygon(s) we want to use for clipping. This mask must be in the same CRS as geodf!
You may have already noticed that the roads data is not in the same CRS as the alaska polygons, so these geo-datasets shound’t interact until they’re in the same CRS. Before jumping right into reprojecting and clipping, we will subset the data to select only US roads:
Reduce your tabular data before reducing via geometries
Geospatial operations are usually costly in terms of computing power. The more detailed our geometries are, the longer in takes to do geospatial computations. It’s a good practice to reduce your data as much as possible before applying any geospatial transformation.
We will now compose functions to clip usa_roads using the alaska multipolygon. Notice we are using the ouput of usa_roads.to_crs(alaska.crs) directly and thus not changing the usa_roads geo-dataframe or creating new variables:
Notice how the lines break on the small islands? However, in the usa_roads there are no broken lines. This should make us suspect we are leaving data out and clipping exactly to the polygons in alaska is not quite what we want.
Clipping with a bounding box
We will clip the usa_roads geo-dataframe with the bounding box of alaska instead of its polygons. To create a bounding box, we first use the box() function we imported from shapely.geometry. The syntax for box() is:
box(minx, miny, maxx, maxy)
the output is a polygon representing a box constructed like this:
If we want to create a shapely polygon from the bounds of a geo-dataframe gdf, a more straightforward syntax is:
In the last syntax we used the asterisk * as an unpacking operator on the array alaska.total_bounds. Think about it as unpacking the elements of alaska.total_bounds and assigning them one-by-one to the paremeters minx, miny, maxx, maxy of the box() function.
This is a good article explaining more about unpacking with * in Python: https://geekflare.com/python-unpacking-operators/
Notice that the bounding box is not a geodataframe, it is a stand alone, abstract polygon without any geospatial information. To interpret this polygon as something on the Earth’s surface we need to wrap it into a geo-datfrane abd assign it a CRS:
Notice the difference between the two clipping methods:
fig, (ax1, ax2) = plt.subplots(2, 1, figsize=(10,10))ak_roads.plot(ax=ax1)ax1.set_title('Roads clipped with AK multipolygon')ak_complete_roads.plot(ax=ax2)ax2.set_title('Roads clipped with AK bounding box')plt.show()
Plot
Finally, we can put all our data together in the same map:
Code
fig, ax = plt.subplots(figsize=(11,5))ax.axis('off')alaska.plot(ax=ax, color='whitesmoke', edgecolor='0.7')ak_complete_roads.plot(ax=ax, zorder=1, # Specify layer plotting order column='type', legend=True, legend_kwds={'title': "Road Types", 'loc': 'upper left','bbox_to_anchor':(0,0.9),'fontsize':'small'} )ak_places.plot(ax=ax, zorder=2, # Specify layer plotting order color='red', marker='s'# Square marker )# Add city names as text annotationsfor x, y, name inzip(ak_places.geometry.x, ak_places.geometry.y, ak_places['name']): ax.text(x-30000, y+20000, name, fontsize=8, ha='right')ax.set_title("Road Networks and Major Cities in Alaska", fontsize=14, fontweight='bold')plt.show()
Exercise
Notice the overlaying labels for Anchorage and Valdez:
Update the map so these labels do not overlap. One way to do it is using an if when iterating over the Alaska populated places.