Geospatial data: finding pubs along cycling routes

Part 1: Geospatial fundamentals

“Data is everywhere”, a tagline you will often hear or read, especially when people are talking about big data. And it is definitely true, it is estimated that a mind-blowing 120 zettabytes of data has been created in 2023. A zettabytes, that is a byte with 21 zeros. However, the sentence can also be interpreted differently, focusing on the ‘everywhere’. All data is created somewhere, and more than ever this information is not only stored, but also available in a way that encourages further analysis. Furthermore, the general availability of GPS systems in smartphones has made everyone a potential land surveyor. The capability to easily collect, analyze and use geodata has already resulted in numerous new applications and may have a profound impact on decision making processes in both commercial and non-commercial situations. In this first geospatial blog, we will explore the fundamental concepts that make up geospatial data.

Defining geodata

Geospatial data or geodata can be simply defined as all data linked to a geographical location. This can be very broad: the address of your favorite cat-café, a high-resolution satellite image of a war zone, the location of the picture you upload to Instagram, the heatmap of Lionel Messi during the final of the world cup, the tracking link that shows you where your delayed Christmas present is located or the location of the internet line in your street. These geodata are analyzed in what is called Geographical Information Systems (or GIS). The power of GIS lies in visualizing these geodata and detecting patterns that would otherwise stay unnoticed, using overlaying topology rules and processing algorithms.

Public geodata

Interested in geodata, but you do not have any in your current data products? No problem, you can probably still do a lot of interesting and beneficial analysis using existing public geodata sources. For countries in Europe you can browse the Inspire Geoportal to find metadata and locations of existing geodatasets (census data, data on company locations, agricultural parcels, flood risk plans, etc..). Most countries and regions also have a specific geoportal where you can preview, download and find API references to the data. For instance, the Belgian region of Flanders has Geopunt and Luxembourg has published 3D renderings of all buildings on its geoportal. Furthermore, the Copernicus program of the European Commission and ESA has the goal to high quality earth observation data. In that scope, the Sentinel-2 satellite program provides free weekly multispectral data. Another interesting dataset is the opensource OpenStreetMap project. This community project focusses on using the knowledge of locals to create an up-to-date global geodata product. OpenStreetMap data can also be accessed using API calls.

Geodata formats

Generally, geodata is split into two big categories: raster and vector data. Raster data represents the world in a continuous grid, giving each cell of the grid a specific feature or a value to represent the spatial variation. A thermal image of a roof, a digital elevation model of a mountain or the different land uses in a city to name only a few examples. These data are typically stored in Geostationary Earth Orbit Tagged Image File Format (.tiff), but also classic .jpg or .asc formats are possible. Raster data have the benefit of being continuous, mathematically easy to use for models (such as AI pattern recognition) and they are very straightforward to visualize. On the other hand, rasters can quickly require large data storage capacity depending on the spatial and temporal resolution (the size and repetition of each pixel) you want to acquire, since every pixel has a corresponding value that needs to be stored. The spatial resolution of the grid also has a big impact on its usability. For example: a grid of 30 by 30 meters can be sufficient for analyzing soil types while doing a risk analysis for floodings, but it is unusable for locating sewage pipes in a city.

The alternative to rasters are vector data. Vectors defines the world as features using points, with corresponding x-, y- and sometimes z-coordinates. Points can further be combined to create multipoints or line features, which subsequently can be aggregated to multilines or polygons. Think of the location products in a warehouse as points, your running track in Strava as a line or the contours of a lake as a polygon. These data can come in a wider variety of different forms, but best known are: shapefiles (.shp/.shx/.dbf/.prj), geospatial JavaScript Object Notation (.geojson), keyhole markup language (.kml) and classic comma separated values (.csv). Because of their nature being feature driven, vector data typically lend well to integration with relational databases and link well to other existing datasets. They also generally require less storage capacity, since only points-of-interest are stored in an otherwise ‘empty’ world. It is worth noting that although some data is more suited to be stored as a raster or as a vector, generally all features have the potential to be stored in both systems. Because of their flexibility, scalability and there integration in data warehouses, vector data are currently the more prominent data format.

Table 1: Comparison of raster and vector data

 
 RasterVector
 
OutlineGrid cells (pixels)Points, lines, polygons
CoverageContinuous dataDiscrete objects
Data sizeGenerally larger data sizeGenerally smaller data size
Data formats.geotiff, .jpc, .asc.shp, .geojson, .kml, .csv, .gpx
SpeedFaster for simpler, area-wide processesFaster for complex queries
AnalysisSuitable for surface analysis, pattern recognitionSuitable for network analysis, proximity, and adjacency
Figure 1: Belgium UCI World Tour races as vector line elements.
Figure 2: Cafes, bars, pubs and restaurants from OpenStreetMap as vector point locations.

It’s a web service

To facilitate the exchange of geospatial data over the internet, the Open Geospatial Consortium (OGC) over the years developed different standards. The goal of OCS is to facilitate the integration of these geospatial data with all types of data. Historically, Web Map Service (WMS) are maps delivered as images. WMS are mainly used as background as they offer no access to the underlying data. In situations where quick loading of smaller map tiles are required, for instance in applications with high user loads, Web Map Tile Service (WMTS) are the go-to standard. If the user needs to manipulate and analyze the data, the service is dependent on the underlying data format. Web Feature Service (WFS) is the standard for vector data, while Web Coverage Service (WCS) is made for distributing raster data. While these standards (WMS, WFS, WMTS and WCS) can still be found for many products in existing data libraries, OGC is continuously designing new modern methods following current trends in the data universe. Currently, most new products developed and in development use APIs as the building blocks.

Figure 3: WMTS of 2022 orthophoto of Belgium with zoom on Brussels site showing the Atomium.and King Baudouin Stadium

Somewhere on Earth

If you are asked to meet someone ‘in New York’, you run into multiple problems. They could be (most likely) referring to the city of New York in the state of New York in the USA. But it could also be New York in Lincolnshire UK or even the city of New York in Ukraine. To get a unique location for each place, river or tree you need a more precise system. Therefore, different coordinate reference systems (crs) have been developed over the years. Assuming that Earth is approximately an ellipsoid, latitudinal and longitudinal coordinates can be measured as the angles from the prime meridian and equator. Although a very efficient system, measuring a distance or surface on an ellipsoid is not workable in everyday operations. Therefore the 3-dimensional ellipsoid is projected onto a 2D plane. However, as you can see in this video, perfectly flattening a sphere is not possible but without making distortions. Therefore people have made multiple trade-offs. For example, the famous Mercator map we all know from high school and Google Maps, works very well for distances and angles, but fails at preserving the size of countries. Resulting in the countries far away from the equator appearing much larger than they are in reality. For smaller regions, projections can be made with much less distortion, resulting in a wide variety of different coordinate systems around the world. For standardization, all these projections and systems have received a unique EPSG number. For example the default global latitude and longitude coordinates you see in Google Maps are in the WGS 84 system with EPSG number 4326. If you do not like using numbers for coordinates or want to confuse your mailman, you can also have a look at what3words, using a combination of 3 words to define every place on Earth with a 3 by 3 meter resolution. In English, laptop.helpful.survivor will bring you straight to Aivix HQ!

Figure 4: All OSM pubs within 50 meters of the routes of the Belgian World Tour cycling races.

Conclusion

In this first geospatial blog, we focused on the key concepts that make up the geospatial data environment. Processed by GIS, geodata exists in raster and vector formats, each suited for different types of analysis and storage needs. Web services facilitate the sharing of this data, while diverse coordinate reference systems ensure precise location mapping. Publicly available geodata sources, like the Inspire Geoportal, Copernicus program and OpenStreetMap project, offer vast amounts of freely available data for various applications.

In part 2, we will have a look at how these concepts can be integrated into an AWS environment.

Stay tuned!

Figure 5: Number of pubs in proximity to routes of Belgian World Tour races in 2023.

References

https://ogcapi.ogc.org/

https://epsg.io/

Images from https://urstudio.sec.sg/courses/gis-knowledge-basics/ and https://map-projections.net/imglist.php

Data from

Bjorn Rombouts

Hi, I’m Bjorn Rombouts. With my background in Earth observation and bioscience engineering, I have a passion for (geo)data, GIS and remote sensing. As a data scientist and engineer, I look forward to get in touch and find useful insights in your data.