All About GeoSeries and Spatial Relationships

In this chapter we’ll explore the attributes and methods of a GeoPandas GeoSeries. Another aspect of geospatial data is how they relate to each other in space. Thus, you will learn the different types of spatial relationships, and how to use them in Python to query the data or to perform spatial joins.

Preparation

Let’s first do what we always do:

# import pandas and matplotlib
import pandas as pd
import matplotlib.pyplot as plt

# import geospatial libraries
import geopandas as gpd
from shapely.geometry import Point
# loading the restaurants DataFrame
restaurants = pd.read_csv('Data/Cleansed_Data/Berlin_Restaurants')

# creating a GeoDataFrame
geometry = restaurants.apply(lambda x: Point((x.lng, x.lat)), axis=1)
crs = {'init':'epsg:4326'}
restaurants_geodf = gpd.GeoDataFrame(restaurants, crs=crs, geometry=geometry)
type(restaurants_geodf)
Out[2]: geopandas.geodataframe.GeoDataFrame

GeoSeries Attributes and Methods

Now we’ll explore the attributes and methods of a GeoPandas GeoSeries, which you can think of as the geometry column of the GeoDataFrame:

berlin_districs = gpd.read_file('Data/Cleansed_Data/Berlin_Districts.shp')
berlin_districs.head(2)
type(berlin_districs.geometry)
Out[4]: geopandas.geoseries.GeoSeries

The Area Attribute

GeoPandas inherits a number of useful methods and attributes from the Shapely package. For example, the area attribute returns the calculated area of a geometry:

berlin_districs.geometry.area
0     0.011846
1     0.008561
2     0.022161
3     0.013689
4     0.005938
5     0.006902
6     0.008184
7     0.012165
8     0.013552
9     0.005218
10    0.002700
11    0.007011
dtype: float64

Let’s print the sorted areas:

# calculate area of each district
district_area = berlin_districs.geometry.area

# print the areas sorted ...
print(district_area.sort_values(ascending=False))

# ... and the crs in use
print()
print(berlin_districs.crs)
2     0.022161
3     0.013689
8     0.013552
7     0.012165
0     0.011846
1     0.008561
6     0.008184
11    0.007011
5     0.006902
4     0.005938
9     0.005218
10    0.002700
dtype: float64

{'init': 'epsg:4326'}

The third row (with index 2) is the largest district. Recall that the distance unit for a geometry is dependent on its Coordinate Reference System (CRS). The CRS for the Berlin district data is epsg:4326. Since this uses decimal degrees for distance, the area units here are decimal degrees squared.

Let’s find the area in a way that’s a little more comprehensible to us: kilometers squared. We can change the CRS to one that uses meters for distance: epsg 3857 using .to_crs() and then convert meters^2 to kilometers^2:

# create a copy of berlin_districts that uses EPSG:3857
berlin_districts_3857 = berlin_districs.to_crs(epsg=3857)

# define a variable for m^2 to km^2
sqm_to_sqkm = 10**6

# get area in kilometers squared
district_area_km = berlin_districts_3857.geometry.area / sqm_to_sqkm
print(district_area_km.sort_values(ascending=False))
print()
print(berlin_districts_3857.crs)
2     450.397219
3     279.266143
8     275.457357
7     247.776994
0     241.658604
1     174.273771
6     166.680203
11    142.522777
5     140.619878
4     120.712189
9     106.288207
10     54.961010
dtype: float64

{'init': 'epsg:3857', 'no_defs': True}

The Centroid Method

The centroid method returns the point at the center of each geometry in a GeoSeries:

# centroid of first polygon
print(berlin_districs.geometry.centroid[0])
POINT (13.29133831905174 52.59568176222349)

… which could also be stored in a new column:

# create center column from the centroid
berlin_districs['center'] = berlin_districs.geometry.centroid

berlin_districs.head(2)
# store the centroids in a GeoSeries
berlin_centroids = berlin_districs.geometry.centroid

# plot districts as ax and add the centroids
ax = berlin_districs.plot(color='lightsteelblue', figsize=(16,16))
berlin_centroids.plot(ax=ax, color='black', markersize=19)

# show the plot
plt.show()

The Distance Method

The distance method of a GeoSeries finds the minimum distance from the geometry it is called on to a geometry passed in as an argument:

# defining centroid of Mitte
centroid_mitte = berlin_districts_3857.geometry.centroid[9]

# create a geometry
brandenburg_gate = Point(13.377704, 52.516275)

# distance from brandenburg_gate to centroid of Mitte
brandenburg_gate.distance(other=centroid_mitte)
Out[11]: 7055667.0846656505

The distance unit depends on the CRS which is epsg:3857 in our case, i.e. measured in meters.

Spatial Relationships within a GeoSeries

Shapely also allows us to explore spatial relationships between individual geometries. Imagine that we’ve created polygons for each district in Berlin:

# districts
reinickendorf = berlin_districs.loc[0, 'geometry']
char_wilm = berlin_districs.loc[1, 'geometry']
trep_koep = berlin_districs.loc[2, 'geometry']
pankow = berlin_districs.loc[3, 'geometry']
neukoelln = berlin_districs.loc[4, 'geometry']
lichtenberg = berlin_districs.loc[5, 'geometry']
marz_heller = berlin_districs.loc[6, 'geometry']
spandau = berlin_districs.loc[7, 'geometry']
steg_zehl = berlin_districs.loc[8, 'geometry']
mitte = berlin_districs.loc[9, 'geometry']
fried_kreuz = berlin_districs.loc[10, 'geometry']
temp_schoen = berlin_districs.loc[11, 'geometry']

Since Shapely has no method to visualize multiple geometries, we quickly put some of our geometries in a GeoSeries and plot that:

gpd.GeoSeries([mitte, char_wilm, neukoelln, brandenburg_gate]).plot(cmap='YlOrRd', figsize=(10,8));

We see that the Brandenburg Gate is located in the Mitte district (in bright yellow).

Mitte (in bright yellow) and Charlottenburg (in dark yellow) are neighbors, while Mitte (in bright yellow) and Neukölln (in bright red) are not.

Now let’s try methods that translate relationships into code:

# contains-method to check whether Mitte contains the famous landmark
mitte.contains(brandenburg_gate)
Out[16]: True
# contains-method to check whether Neukölln contains the famous landmark
neukoelln.contains(brandenburg_gate)
Out[17]: False
# now the same with the within-method --> the order changes!
brandenburg_gate.within(mitte)
Out[18]: True
# touches-method
mitte.touches(char_wilm)
Out[19]: True

Spatial Relationships in GeoPandas

We just learned about certain spatial relationships between two individual geometry objects. The GeoDataFrame has similar methods applicable to all of its geometries.

In the next example, we’re going to check which restaurants are within the polygon of the Mitte district. To perform this operation, we need to make sure both data are using the same Coordinate Reference System (CRS):

# check crs of restaurants
restaurants_geodf.crs
Out[20]: {'init': 'epsg:4326'}
# check crs of berlin_districts
berlin_districs.crs
Out[21]: {'init': 'epsg:4326'}
# checking which restaurants are within the polygon of Mitte
restaurants_geodf.within(mitte).tail()
Out[22]:
3976     True
3977    False
3978    False
3979    False
3980    False
dtype: bool

The result of this operation is a Boolean Series that can be directly used as a mask to filter the original dataframe:

mitte_restaurants = restaurants_geodf[restaurants_geodf.within(mitte)]
mitte_restaurants.shape[0]
Out[23]: 477

For our dataset, there are 477 restaurants located in Mitte.

Spatial Joins

It would be interesting to know in which district each restaurant is located in. Unfortunately, the districts’ names are in another GeoDataFrame. So we need to combine – or: join – both datasets. Joining on location is called a spatial join.

To spatially join two GeoDataFrames, GeoPandas provides the .sjoin() method. It takes an argument op (short for operation) to specify the type of spatial join:

gpd.sjoin(first_gdf, second_gdf, op=<operation>)
  • The first_gdf is the GeoDataFrame we want to add information to.
  • The second_gdf is the GeoDataFrame that contains the information we want to add.

Note, that you need to make sure both GeoDataFrames use the same CRS before you join them! In our case, both GeoDataFrames have epsg:4326.

  • Finally we specify which spatial operation we want to use. The operation can be one of three types: intersects, contains, or within

Here, we will use within as we want GeoPandas to check whether or not rows in the restaurants dataset (left GeoDataFrame) are within the rows of the districts dataset(right GeoDataFrame). Note how the order of the arguments is important here!

joined = gpd.sjoin(restaurants_geodf, berlin_districs, op='within')

And here we’ve chained several functions to group, count, and sort the joined dataframe to see how many restaurants are within each district:

joined[['name', 'DISTRICT']].groupby('DISTRICT').agg('count').sort_values('name', ascending=False)

Advanced Visualization

Let’s again turn to the Folium package, which is a Python library for creating and styling interactive maps. If you need a refresher, check this article.

import folium

# slice the centroid of Berlin and save the variable as point
point = berlin_districs.center[9]

# reverse the order for folium location array
folium_location = [point.y, point.x]

# construct a folium map for Berlin
berlin_mitte_map = folium.Map(location=folium_location, zoom_start=12)
# slice all restaurants in Mitte
rest_in_mitte = joined[joined['DISTRICT'] == 'Mitte']
rest_in_mitte.shape
Out[34]: (477, 13)
# create a marker for each restaurant
for row in rest_in_mitte.iterrows():
    row_values = row[1] 
    location = [row_values['lat'], row_values['lng']]
    popup = popup = '<strong>' + row_values['name'] + '</strong>'
    marker = folium.Marker(location=location, popup=popup)
    marker.add_to(berlin_mitte_map)

display(berlin_mitte_map)

. . . . . . . . . . . . . .

I hope this little tutorial helps to work with geospatial data! If you think I’ve missed something or have made an error somewhere, please let me know: bb@brittabettendorf.com