Geospatial data: finding pubs along cycling routes
Part 1: Geospatial fundamentals
“Data is everywhere”, a tagline you will often hear or read, especially when people are talking about big data. And it is definitely true, it is estimated that a mind-blowing 120 zettabytes of data has been created in 2023. A zettabytes, that is a byte with 21 zeros. However, the sentence can also be interpreted differently, focusing on the ‘everywhere’. All data is created somewhere, and more than ever this information is not only stored, but also available in a way that encourages further analysis. Furthermore, the general availability of GPS systems in smartphones has made everyone a potential land surveyor. The capability to easily collect, analyze and use geodata has already resulted in numerous new applications and may have a profound impact on decision making processes in both commercial and non-commercial situations. In this first geospatial blog, we will explore the fundamental concepts that make up geospatial data.
Geospatial data or geodata can be simply defined as all data linked to a geographical location. This can be very broad: the address of your favorite cat-café, a high-resolution satellite image of a war zone, the location of the picture you upload to Instagram, the heatmap of Lionel Messi during the final of the world cup, the tracking link that shows you where your delayed Christmas present is located or the location of the internet line in your street. These geodata are analyzed in what is called Geographical Information Systems (or GIS). The power of GIS lies in visualizing these geodata and detecting patterns that would otherwise stay unnoticed, using overlaying topology rules and processing algorithms.
Interested in geodata, but you do not have any in your current data products? No problem, you can probably still do a lot of interesting and beneficial analysis using existing public geodata sources. For countries in Europe you can browse the Inspire Geoportal to find metadata and locations of existing geodatasets (census data, data on company locations, agricultural parcels, flood risk plans, etc..). Most countries and regions also have a specific geoportal where you can preview, download and find API references to the data. For instance, the Belgian region of Flanders has Geopunt and Luxembourg has published 3D renderings of all buildings on its geoportal. Furthermore, the Copernicus program of the European Commission and ESA has the goal to high quality earth observation data. In that scope, the Sentinel-2 satellite program provides free weekly multispectral data. Another interesting dataset is the opensource OpenStreetMap project. This community project focusses on using the knowledge of locals to create an up-to-date global geodata product. OpenStreetMap data can also be accessed using API calls.
Generally, geodata is split into two big categories: raster and vector data. Raster data represents the world in a continuous grid, giving each cell of the grid a specific feature or a value to represent the spatial variation. A thermal image of a roof, a digital elevation model of a mountain or the different land uses in a city to name only a few examples. These data are typically stored in Geostationary Earth Orbit Tagged Image File Format (.tiff), but also classic .jpg or .asc formats are possible. Raster data have the benefit of being continuous, mathematically easy to use for models (such as AI pattern recognition) and they are very straightforward to visualize. On the other hand, rasters can quickly require large data storage capacity depending on the spatial and temporal resolution (the size and repetition of each pixel) you want to acquire, since every pixel has a corresponding value that needs to be stored. The spatial resolution of the grid also has a big impact on its usability. For example: a grid of 30 by 30 meters can be sufficient for analyzing soil types while doing a risk analysis for floodings, but it is unusable for locating sewage pipes in a city.
Table 1: Comparison of raster and vector data
|Grid cells (pixels)
|Points, lines, polygons
|Generally larger data size
|Generally smaller data size
|.geotiff, .jpc, .asc
|.shp, .geojson, .kml, .csv, .gpx
|Faster for simpler, area-wide processes
|Faster for complex queries
|Suitable for surface analysis, pattern recognition
|Suitable for network analysis, proximity, and adjacency
It’s a web service
To facilitate the exchange of geospatial data over the internet, the Open Geospatial Consortium (OGC) over the years developed different standards. The goal of OCS is to facilitate the integration of these geospatial data with all types of data. Historically, Web Map Service (WMS) are maps delivered as images. WMS are mainly used as background as they offer no access to the underlying data. In situations where quick loading of smaller map tiles are required, for instance in applications with high user loads, Web Map Tile Service (WMTS) are the go-to standard. If the user needs to manipulate and analyze the data, the service is dependent on the underlying data format. Web Feature Service (WFS) is the standard for vector data, while Web Coverage Service (WCS) is made for distributing raster data. While these standards (WMS, WFS, WMTS and WCS) can still be found for many products in existing data libraries, OGC is continuously designing new modern methods following current trends in the data universe. Currently, most new products developed and in development use APIs as the building blocks.
Somewhere on Earth
If you are asked to meet someone ‘in New York’, you run into multiple problems. They could be (most likely) referring to the city of New York in the state of New York in the USA. But it could also be New York in Lincolnshire UK or even the city of New York in Ukraine. To get a unique location for each place, river or tree you need a more precise system. Therefore, different coordinate reference systems (crs) have been developed over the years. Assuming that Earth is approximately an ellipsoid, latitudinal and longitudinal coordinates can be measured as the angles from the prime meridian and equator. Although a very efficient system, measuring a distance or surface on an ellipsoid is not workable in everyday operations. Therefore the 3-dimensional ellipsoid is projected onto a 2D plane. However, as you can see in this video, perfectly flattening a sphere is not possible but without making distortions. Therefore people have made multiple trade-offs. For example, the famous Mercator map we all know from high school and Google Maps, works very well for distances and angles, but fails at preserving the size of countries. Resulting in the countries far away from the equator appearing much larger than they are in reality. For smaller regions, projections can be made with much less distortion, resulting in a wide variety of different coordinate systems around the world. For standardization, all these projections and systems have received a unique EPSG number. For example the default global latitude and longitude coordinates you see in Google Maps are in the WGS 84 system with EPSG number 4326. If you do not like using numbers for coordinates or want to confuse your mailman, you can also have a look at what3words, using a combination of 3 words to define every place on Earth with a 3 by 3 meter resolution. In English, laptop.helpful.survivor will bring you straight to Aivix HQ!
In this first geospatial blog, we focused on the key concepts that make up the geospatial data environment. Processed by GIS, geodata exists in raster and vector formats, each suited for different types of analysis and storage needs. Web services facilitate the sharing of this data, while diverse coordinate reference systems ensure precise location mapping. Publicly available geodata sources, like the Inspire Geoportal, Copernicus program and OpenStreetMap project, offer vast amounts of freely available data for various applications.
In part 2, we will have a look at how these concepts can be integrated into an AWS environment.
- Administrative borders and orthophoto Belgium: NGI (https://www.ngi.be/website/)
- Cycling routes: La FlammeRouge (https://www.la-flamme-rouge.eu/)
- Pub locations: OpenStreetMap (https://www.openstreetmap.org)
Hi, I’m Bjorn Rombouts. With my background in Earth observation and bioscience engineering, I have a passion for (geo)data, GIS and remote sensing. As a data scientist and engineer, I look forward to get in touch and find useful insights in your data.