8 Vector data formats

Vector data is one of the two primary types of geospatial data (the other one being raster data). In this section we will review what vector data is and go over some common data formats for it.

Points, lines, and polygons

Vector data represents specific features on the Earth’s surface. There are three core types of vector data:

  • Points: each point has a single \(x\),\(y\) location. Examples of data that can be represented as point vector data are sampling locations or animal sightings.

  • Lines: a line is composed of at least two points that are connected. Roads and streams are commonly depicted as line vector data.

  • Polygons: polygons are sets of three or more vertices that are connected and form a closed region. Political boundaries (outlines of countries, states, cities, etc) are examples of polygon vector data.

Each item in the vector data is usually referred to as a feature. So each point would be a feature, each polygon is a feature, etc.

Image Source: National Ecological Observatory Network (NEON)

Besides points, lines, and polygons, we can also encounter an additional type of vector data:

  • Multipolygons: represent features made up of several polygons. Each polygon can be a separate, disconnected area, but together they form one feature. For example, a country with both mainland and island territories is often represented as a multi-polygon. Multi-polygons are useful for geographic entities that consist of multiple regions but need to be treated as a single unit in analysis.

In addition to the geospatial information stored, vector data can include attributes that describe each feature. For example, a vector dataset where each feature is a polygon representing the boundary of a state could have as attributes the population and area of the state.

Image Source: National Ecological Observatory Network (NEON)

Shapefiles

One of the most popular formats to store vector data is the shapefile data format. The shapefile format is developed and maintained by the Environmental Systems Research Institute (Esri).

So far we have been working with data that comes stored in a single file, like a CSV or .txt file for tabular data. A shapefile is actually a collection of files that interact together to create a single data file. All the files that make up a shapefile need to have the same name (with different extensions) and be in the same directory. For our shapefiles to work we need at least these three files:

  • .shp: shape format, this file has the geometries for all features.
  • .shx: shape index format, this file indexes the features
  • .dbf: attribute format, this file stores the attributes for features as a table

Sometimes shapefiles will have additional files, including:

  • .prj: a file containing information about the projection and coordinate reference system
  • .sbn and .sbx: files that contain a spatial index of the features
  • .shp.xml: geospatial metadata in XML format.

Check the Wikipedia page about shapefiles to see a more extensive list of files associated to shapefiles.

File management for shapefiles

Remember: when working with a shapefile all the associated files must have the same name (with different extensions) and must be located in the same directory.

A single shapefile can only have one vector type

Each shapefile can only hold one type of vector data. You cannot have, for example, lines and points in the same file. Only points, only lines, or only polygons can be inside a single shapefile.

GeoJSON

GeoJSON, which stands for Geographic JavaScript Object Notation, is an open format for encoding vector data (points, lines, polygons, and multipolygons) and their attributes. It is a popular format for web mapping applications. The GeoJSON format uses a single file, with extension .json or .geojson.

While shapefiles can be in any coordinate reference system, the GeoJSON specification requires GeoJSON files to use the World Geodetic System 1984 (EPSG:4326/WGS84) CRS and all points must be expressed in longitude and latitude units of decimal degrees.

Data in a GeoJSON is stored as attribute-value pairs (similar to Python dictionaries!) and lists. The following are examples of how points, lines, and polygons are represented as GeoJSON features:

Source: Wikipedia

The followng is an example of a full GeoJSON file. Notice that multiple types of geometries can be mixed within the same file:

# Source: Wikipedia
{
  "type": "FeatureCollection",
  "features": [
    {
      "type": "Feature",
      "geometry": {
        "type": "Point",
        "coordinates": [102.0, 0.5]
      },
      "properties": {
        "prop0": "value0"
      }
    },
    {
      "type": "Feature",
      "geometry": {
        "type": "LineString",
        "coordinates": [
          [102.0, 0.0],
          [103.0, 1.0],
          [104.0, 0.0],
          [105.0, 1.0]
        ]
      },
      "properties": {
        "prop0": "value0",
        "prop1": 0.0
      }
    },
    {
      "type": "Feature",
      "geometry": {
        "type": "Polygon",
        "coordinates": [
          [
            [100.0, 0.0],
            [101.0, 0.0],
            [101.0, 1.0],
            [100.0, 1.0],
            [100.0, 0.0]
          ]
        ]
      },
      "properties": {
        "prop0": "value0",
        "prop1": { "this": "that" }
      }
    }
  ]
}

The website https://geojson.io/ is a nice tool to easily create GeoJSON files.