Spatial Data: Reducing Folium map file size with Python
How to reduce a Choropleth Folium map file size using Python’s library topojson and simplification algorithm.
Introduction
Among all types of geospatial data visualizations, Choropleth maps are probably one of the most popular. The reason is that they are useful in telling a story about your data.
According to Wikipedia: “Choropleth maps provide an easy way to visualize how a variable varies across a geographic area or show the level of variability within a region.”
I was working on a Data Analysis project on Avian Influenza (Bird Flu) in Ireland and mapping how this disease spread throughout the island. The main goal was to provide insights into possible spots and species that might need extra attention from scientists investigating this constant threat to resident birds.
Read the full Analysis here: https://pessini.me/avian-flu-wild-birds-ireland/
Problem
At some point, I needed to answer a question about “What is the proportion of birds targeted with Avian Flu in each County?”, and a Choropleth map seemed to be the best option to answer it. The image below shows what the map looks like.
If you do not know how to create a Choropleth map with Folium, here are few articles to get you started:
The map was appealing and the message was being delivered. The problem began when I exported to HTML and realized it was taking ages to load or not loading at all on mobile devices. Surprisingly, when I checked the file size, it had more than 70 MB.
Solution
After a few minutes on Google and Stack Overflow, I discovered that the problem is that GeoJSON files tend to be large and an alternative would be using TopoJSON instead.
What is TopoJSON?
TopoJSON is an extension of GeoJSON which introduces a new type, “Topology”, that contains GeoJSON objects.
The advantage of TopoJSON over GeoJSON is size and encoding of topology and eliminates redundancy, allowing related geometries to be stored efficiently in the same file.
“Talk is cheap. Show me the code.” ― Linus Torvalds
Data Source: Administrative Areas — OSi National Statutory Boundaries — 2019 | OSi Open Data Portal
Python packages
First step is to convert GeoJSON to TopoJSON computing Topology. It just may take a while to run.
topo = tp.Topology(geo_areas, prequantize=False, topology=True)
Apply toposimplify to remove unnecessary points from arcs after the topology is constructed. This will simplify the constructed arcs without altering the topological relations.
simple = topo_tq.toposimplify(
epsilon=0.001,
simplify_with='shapely',
simplify_algorithm='dp'
).to_alt().properties(title=['Douglas-Peucker simplification'])
You can choose between two simplification algorithms: Visvalingam-Whyatt and Douglas-Peucker. In my case, Douglas-Peucker simplification yield a better result.
The Douglas–Peucker algorithm is an algorithm to smooth polylines (lines that are composed of linear line segments) by reducing the number of points. The simplified curve should preserve the rough shape of the original curve but consist only of a subset of the points that defined the original curve.
Conclusion
When creating your map, taking into account the file size is important when deploying to the web. Adding extra layers and clusters to the map will inevitably increase the size of the file, but there are some alternatives, as presented in this article.
Dealing with trade-offs in decision-making is very important, and for this project, I considered the trade-off Quality X Size worthwhile.
Thanks for reading!
References
[1] Leandro Pessini, Spatial analysis on the occurrence of Bird Flu in Ireland (2022).
[2] Ordnance Survey Ireland, Administrative Areas — OSi National Statutory Boundaries (2019).
[3] Douglas, D. H., & Peucker, T. K., Algorithms for the reduction of the number of points required to represent a digitized line or its caricature (1973), Cartographica: The International Journal for Geographic Information and Geovisualization, 10(2), 112–122.
[4] Mattijn van Hoek, Encode spatial data as topology in Python!
[5] Kelsey Jordahl, About GeoPandas — GeoPandas (2013), Scipy Conference
—
Leandro is a Data Scientist with background in Software Engineering and Project Management. You can learn more about him here: https://pessini.me/