## In this chapter we’ll explore the attributes and methods of a GeoPandas GeoSeries. Another aspect of geospatial data is how they relate to each other in space. Thus, you will learn the different types of spatial relationships, and how to use them in Python to query the data or to perform spatial joins.

### Preparation

Let’s first do what we always do:

```
# import pandas and matplotlib
import pandas as pd
import matplotlib.pyplot as plt
# import geospatial libraries
import geopandas as gpd
from shapely.geometry import Point
```

```
# loading the restaurants DataFrame
restaurants = pd.read_csv('Data/Cleansed_Data/Berlin_Restaurants')
# creating a GeoDataFrame
geometry = restaurants.apply(lambda x: Point((x.lng, x.lat)), axis=1)
crs = {'init':'epsg:4326'}
restaurants_geodf = gpd.GeoDataFrame(restaurants, crs=crs, geometry=geometry)
type(restaurants_geodf)
```

`Out[2]: geopandas.geodataframe.GeoDataFrame`

### GeoSeries Attributes and Methods

Now we’ll explore the attributes and methods of a GeoPandas GeoSeries, which you can think of as the geometry column of the GeoDataFrame:

```
berlin_districs = gpd.read_file('Data/Cleansed_Data/Berlin_Districts.shp')
berlin_districs.head(2)
```

`type(berlin_districs.geometry)`

`Out[4]: geopandas.geoseries.GeoSeries`

*The Area Attribute*

GeoPandas inherits a number of useful methods and attributes from the Shapely package. For example, the `area`

attribute returns the calculated area of a geometry:

`berlin_districs.geometry.area`

```
0 0.011846
1 0.008561
2 0.022161
3 0.013689
4 0.005938
5 0.006902
6 0.008184
7 0.012165
8 0.013552
9 0.005218
10 0.002700
11 0.007011
dtype: float64
```

Let’s print the sorted areas:

```
# calculate area of each district
district_area = berlin_districs.geometry.area
# print the areas sorted ...
print(district_area.sort_values(ascending=False))
# ... and the crs in use
print()
print(berlin_districs.crs)
```

```
2 0.022161
3 0.013689
8 0.013552
7 0.012165
0 0.011846
1 0.008561
6 0.008184
11 0.007011
5 0.006902
4 0.005938
9 0.005218
10 0.002700
dtype: float64
{'init': 'epsg:4326'}
```

The third row (with index 2) is the largest district. Recall that the distance unit for a geometry is dependent on its *Coordinate Reference System (CRS)*. The CRS for the Berlin district data is *epsg:4326*. Since this uses decimal degrees for distance, the area units here are decimal degrees squared.

Let’s find the area in a way that’s a little more comprehensible to us: kilometers squared. We can change the CRS to one that uses meters for distance: *epsg 3857* using `.to_crs()`

and then convert meters^2 to kilometers^2:

```
# create a copy of berlin_districts that uses EPSG:3857
berlin_districts_3857 = berlin_districs.to_crs(epsg=3857)
# define a variable for m^2 to km^2
sqm_to_sqkm = 10**6
# get area in kilometers squared
district_area_km = berlin_districts_3857.geometry.area / sqm_to_sqkm
print(district_area_km.sort_values(ascending=False))
print()
print(berlin_districts_3857.crs)
```

```
2 450.397219
3 279.266143
8 275.457357
7 247.776994
0 241.658604
1 174.273771
6 166.680203
11 142.522777
5 140.619878
4 120.712189
9 106.288207
10 54.961010
dtype: float64
{'init': 'epsg:3857', 'no_defs': True}
```

*The Centroid Method*

The `centroid`

method returns the point at the center of each geometry in a GeoSeries:

```
# centroid of first polygon
print(berlin_districs.geometry.centroid[0])
```

`POINT (13.29133831905174 52.59568176222349)`

… which could also be stored in a new column:

```
# create center column from the centroid
berlin_districs['center'] = berlin_districs.geometry.centroid
berlin_districs.head(2)
```

```
# store the centroids in a GeoSeries
berlin_centroids = berlin_districs.geometry.centroid
# plot districts as ax and add the centroids
ax = berlin_districs.plot(color='lightsteelblue', figsize=(16,16))
berlin_centroids.plot(ax=ax, color='black', markersize=19)
# show the plot
plt.show()
```

*The Distance Method*

The `distance`

method of a GeoSeries finds the minimum distance from the geometry it is called on to a geometry passed in as an argument:

```
# defining centroid of Mitte
centroid_mitte = berlin_districts_3857.geometry.centroid[9]
# create a geometry
brandenburg_gate = Point(13.377704, 52.516275)
# distance from brandenburg_gate to centroid of Mitte
brandenburg_gate.distance(other=centroid_mitte)
```

`Out[11]: 7055667.0846656505`

The distance unit depends on the CRS which is *epsg:3857* in our case, i.e. measured in meters.

### Spatial Relationships within a GeoSeries

Shapely also allows us to explore spatial **relationships** between individual geometries. Imagine that we’ve created polygons for each district in Berlin:

```
# districts
reinickendorf = berlin_districs.loc[0, 'geometry']
char_wilm = berlin_districs.loc[1, 'geometry']
trep_koep = berlin_districs.loc[2, 'geometry']
pankow = berlin_districs.loc[3, 'geometry']
neukoelln = berlin_districs.loc[4, 'geometry']
lichtenberg = berlin_districs.loc[5, 'geometry']
marz_heller = berlin_districs.loc[6, 'geometry']
spandau = berlin_districs.loc[7, 'geometry']
steg_zehl = berlin_districs.loc[8, 'geometry']
mitte = berlin_districs.loc[9, 'geometry']
fried_kreuz = berlin_districs.loc[10, 'geometry']
temp_schoen = berlin_districs.loc[11, 'geometry']
```

Since Shapely has no method to visualize multiple geometries, we quickly put some of our geometries in a GeoSeries and plot that:

`gpd.GeoSeries([mitte, char_wilm, neukoelln, brandenburg_gate]).plot(cmap='YlOrRd', figsize=(10,8));`

We see that the Brandenburg Gate is located in the Mitte district (in bright yellow).

Mitte (in bright yellow) and Charlottenburg (in dark yellow) are neighbors, while Mitte (in bright yellow) and Neukölln (in bright red) are not.

Now let’s try methods that translate relationships into code:

```
# contains-method to check whether Mitte contains the famous landmark
mitte.contains(brandenburg_gate)
```

`Out[16]: True`

```
# contains-method to check whether Neukölln contains the famous landmark
neukoelln.contains(brandenburg_gate)
```

`Out[17]: False`

```
# now the same with the within-method --> the order changes!
brandenburg_gate.within(mitte)
```

`Out[18]: True`

```
# touches-method
mitte.touches(char_wilm)
```

`Out[19]: True`

### Spatial Relationships in GeoPandas

We just learned about certain spatial relationships between two individual geometry objects. The GeoDataFrame has similar methods applicable to *all* of its geometries.

In the next example, we’re going to check which restaurants are within the polygon of the Mitte district. To perform this operation, we need to make sure both data are using the same Coordinate Reference System (CRS):

```
# check crs of restaurants
restaurants_geodf.crs
```

`Out[20]: {'init': 'epsg:4326'}`

```
# check crs of berlin_districts
berlin_districs.crs
```

`Out[21]: {'init': 'epsg:4326'}`

```
# checking which restaurants are within the polygon of Mitte
restaurants_geodf.within(mitte).tail()
```

```
Out[22]:
3976 True
3977 False
3978 False
3979 False
3980 False
dtype: bool
```

The result of this operation is a Boolean Series that can be directly used as a mask to filter the original dataframe:

```
mitte_restaurants = restaurants_geodf[restaurants_geodf.within(mitte)]
mitte_restaurants.shape[0]
```

`Out[23]: 477`

For our dataset, there are 477 restaurants located in Mitte.

### Spatial Joins

It would be interesting to know in which district each restaurant is located in. Unfortunately, the districts’ names are in another GeoDataFrame. So we need to combine – or: join – both datasets. Joining on location is called a *spatial join*.

To spatially join two GeoDataFrames, GeoPandas provides the `.sjoin()`

method. It takes an argument `op`

(short for operation) to specify the type of spatial join:

```
gpd.sjoin(first_gdf, second_gdf, op=<operation>)
```

- The first_gdf is the GeoDataFrame we want to add information to.
- The second_gdf is the GeoDataFrame that contains the information we want to add.

Note, that you need to make sure both GeoDataFrames use the same CRS before you join them! In our case, both GeoDataFrames have *epsg:4326*.

- Finally we specify which spatial operation we want to use. The operation can be one of three types:
*intersects*,*contains*, or*within*

Here, we will use *within* as we want GeoPandas to check whether or not rows in the restaurants dataset (left GeoDataFrame) are within the rows of the districts dataset(right GeoDataFrame). Note how the order of the arguments is important here!

`joined = gpd.sjoin(restaurants_geodf, berlin_districs, op='within')`

And here we’ve chained several functions to group, count, and sort the joined dataframe to see how many restaurants are within each district:

`joined[['name', 'DISTRICT']].groupby('DISTRICT').agg('count').sort_values('name', ascending=False)`

### Advanced Visualization

Let’s again turn to the Folium package, which is a Python library for creating and styling interactive maps. If you need a refresher, check this article.

```
import folium
# slice the centroid of Berlin and save the variable as point
point = berlin_districs.center[9]
# reverse the order for folium location array
folium_location = [point.y, point.x]
# construct a folium map for Berlin
berlin_mitte_map = folium.Map(location=folium_location, zoom_start=12)
```

```
# slice all restaurants in Mitte
rest_in_mitte = joined[joined['DISTRICT'] == 'Mitte']
rest_in_mitte.shape
```

`Out[34]: (477, 13)`

```
# create a marker for each restaurant
for row in rest_in_mitte.iterrows():
row_values = row[1]
location = [row_values['lat'], row_values['lng']]
popup = popup = '<strong>' + row_values['name'] + '</strong>'
marker = folium.Marker(location=location, popup=popup)
marker.add_to(berlin_mitte_map)
display(berlin_mitte_map)
```

. . . . . . . . . . . . . .

I hope this little tutorial helps to work with geospatial data! If you think I’ve missed something or have made an error somewhere, please let me know: bb@brittabettendorf.com