Vector Data in R

Introduction

This chapter provides brief explanations of the fundamental vector model. You will get familiar with the theory behind vector model and the disciplines in which they predominate, before demonstrating its implementation in R.

Vector is the most basic data structure in R. It is a sequence of elements of the same data type. if the elemenets are of different data types, they be coerced to a commontype that can accomodate all the elelements. Vector are generally created using the c() function widely called concatenate, though depeending on the type vector being created, other medhod.

Numeric Vector

We create a numeric vector using a c() function but you can use any function that creates a sequence of numbers

sst = c(25.4, 26, 28, 27.8, 29, 24.8, 22.3)

We can use the is.vector() function to check if is is avector and class to check the data type

is.vector(sst); class(sst)
FALSE [1] TRUE
FALSE [1] "numeric"

Integer vector

Creating an integer vector is similar to numeric vector except that we need to instruct R to treat the data as integer and not numeric or double. To command R creating integer, we specify a suffix L to an element

depth = c(5L, 10L, 15L, 20L, 25L,30L)
is.vector(depth);class(depth)
FALSE [1] TRUE
FALSE [1] "integer"

Character vector

A character vector may contain a single character , a word or a group of words. The elements must be enclosed with a single or double quotations mark.

sites = c("Pemba Channel", "Zanzibar Channnel", "Pemba Channel")
is.vector(sites); class(sites)
FALSE [1] TRUE
FALSE [1] "character"

Logical Vector

A vector of logical values will either contain TRUE or FALSE or both

presence = c(TRUE,TRUE, FALSE, TRUE, FALSE)
is.vector(presence);class(presence)
FALSE [1] TRUE
FALSE [1] "logical"

Vector Data

The geographic vector model is based on points located within a coordinate reference system (CRS). Points can represent self-standing features (e.g., the locations where research sample were taken ) or they can be linked together to form complex geometries like lines and polygons. Most point geometries contain only two dimensions with longitude and latitude together with the attribute information. However 3-dimensional points contain an additional \(z\) value— representing a thrid dimension—elevation, bathmetry etc.

The standard and widely implemented spatial format for vector data is shapefile. shapefile format is popular geospatial vector data format for geographical information system (GIS) software.It is developed and maintained by Esri. Despite what its name may imply, a “single” shapefile is actually composed of at least three files, and as many as eight. Each file that makes up a “shapefile” has a common filename but different extension type. The list of files that define a “shapefile” are shown in table 1. Note that each file has a specific role in defining a shapefile.

Table 1: Eight Common files that makes a shapefile
Description Extension
Attribute information .dbf
Feature geometry .shp
Feature geometry index .shx
Attribute index .aih
Attribute index .ain
Coordinate system information .prj
Spatial index file .sbn
Spatial index file .sbx

Until recent, shapefile format was the de facto form ofvector data basis for libraries such as GDAL. R has well-supported classes for storing spatial data and interfacing to the shapefile format, but has so far lacked a complete implementation of simple features, making conversions at times convoluted, inefficient or incomplete [@sf].

Simple features

@sf plainly described simple features as hierachical data model that present objects in the real world in computers, with emphasis on the spatial geometry of these objects. Out of 17, there are only seven seven simple feature types described in Table 2 that are commonly used. sf can represent common vector geometry types—points, lines, polygons and their respective ‘multi’ versions. sf also supports geometry collections, which can contain multiple geometry types in a single object.

Table 2: Common simple features
Type Description
Point zero-dimensional geometry containing a single point
Linestring sequence of points connected by straight, non-self intersecting line pieces; one-dimensional geometry
Polygon geometry with a positive area (two-dimensional); sequence of points form a closed, non-self intersecting ring; the first ring denotes the exterior ring, zero or more subsequent rings denote holes in this exterior ring
Multipoint set of points; a MULTIPOINT is simple if no two Points in the MULTIPOINT are equal
Multilinestring Set of linestrings
Multipolygon set of polygons
Geometrycollection Set of geometries of any type with exception of geometrycollection

These core geometry types are fully supported by the R package sf [@sf]. sf is a package providing a class system for geographic vector data [@geocomputation] supersede, sp—methods for spatial data [@sp]. It also provides a consistent command-line interface to GEOS and GDAL, superseding rgdal— for data read/write [@rgdal] and rgeos—for spatial operations [@rgeos] packages

Reading vector data

We will use the sf package to work with vector data in R [@sf. Notice that the rgdal package automatically loads when sf is loaded. The sf package has the st_read() function that read different types of vector data to sf object.

require(sf)

Reading shapefiles

Shapefile is the widely used vector format in GIS software. The function st_read() import any type of shapefile into R. for example the chunk block below show how to import the sampling location that are in shapefile format into simple feature object in R’s worksapace.

location = st_read("data/simple_feature.shp", quiet = TRUE)
location
FALSE Simple feature collection with 11 features and 4 fields
FALSE Geometry type: POINT
FALSE Dimension:     XY
FALSE Bounding box:  xmin: 39.50958 ymin: -8.425115 xmax: 42.00623 ymax: -6.414011
FALSE Geodetic CRS:  WGS 84
FALSE First 10 features:
FALSE     id   type depth      sst                   geometry
FALSE 1  294 marker    29 27.87999 POINT (39.50958 -6.438159)
FALSE 2  300 marker  -604 27.97999  POINT (39.6318 -6.621774)
FALSE 3  306 marker  -569 27.97999 POINT (39.65447 -6.746649)
FALSE 4  312 marker  -485 28.03999 POINT (39.62563 -6.805321)
FALSE 5  318 marker  -325 28.03999 POINT (39.58374 -6.833973)
FALSE 6  326 marker  -461 28.03999 POINT (39.66476 -6.837384)
FALSE 7  414 marker  -505 28.02999 POINT (39.95728 -7.843535)
FALSE 8  428 marker  -132 28.23999 POINT (39.67712 -8.136846)
FALSE 9  434 marker  -976 28.16999 POINT (39.74853 -8.425115)
FALSE 10 456 marker -3311 28.33999 POINT (42.00623 -7.025368)

When we print the this simple feature it tells us that it has 18 features that span between longitude 39.45336°E and 39.55239°E and latitude 6.850945°S and 6.461915°S with defined geographical coordinate system of WGS84.

Reading GPX file

The st_read() function can also read files from GPS devices with the .gpx extension.

track = st_read("data/Track-180911-063740.gpx", quiet  = TRUE)
track
FALSE Simple feature collection with 1 feature and 24 fields
FALSE Geometry type: POINT
FALSE Dimension:     XY
FALSE Bounding box:  xmin: 39.44527 ymin: -6.907095 xmax: 39.44527 ymax: -6.907095
FALSE Geodetic CRS:  WGS 84
FALSE    ele                time magvar geoidheight                    name  cmt
FALSE 1 14.4 2018-09-11 07:42:07     NA          NA Track Recording Stopped <NA>
FALSE                                                                                 desc
FALSE 1 Recording stopped at 33'00" because the user stopped it after 6.58km (0.50m gain).
FALSE    src link1_href link1_text link1_type link2_href link2_text link2_type  sym
FALSE 1 <NA>       <NA>       <NA>       <NA>       <NA>       <NA>       <NA> <NA>
FALSE   type  fix sat hdop vdop pdop ageofdgpsdata dgpsid  x_speed
FALSE 1 <NA> <NA>  NA   NA   NA   NA            NA     NA 0.385527
FALSE                     geometry
FALSE 1 POINT (39.44527 -6.907095)

We can assess the geographical extent of the simple feature track with the st_bbox() function.

track %>% st_bbox()
FALSE      xmin      ymin      xmax      ymax 
FALSE 39.445274 -6.907095 39.445274 -6.907095

And check the type of geographical coordinate system with st_crs() function

track %>% st_crs()
FALSE Coordinate Reference System:
FALSE   User input: WGS 84 
FALSE   wkt:
FALSE GEOGCRS["WGS 84",
FALSE     DATUM["World Geodetic System 1984",
FALSE         ELLIPSOID["WGS 84",6378137,298.257223563,
FALSE             LENGTHUNIT["metre",1]]],
FALSE     PRIMEM["Greenwich",0,
FALSE         ANGLEUNIT["degree",0.0174532925199433]],
FALSE     CS[ellipsoidal,2],
FALSE         AXIS["geodetic latitude (Lat)",north,
FALSE             ORDER[1],
FALSE             ANGLEUNIT["degree",0.0174532925199433]],
FALSE         AXIS["geodetic longitude (Lon)",east,
FALSE             ORDER[2],
FALSE             ANGLEUNIT["degree",0.0174532925199433]],
FALSE     ID["EPSG",4326]]

Make shapefiles from Tabular data

Sometimes the geographical information are in tabular form and you need to convert them into simple feature to work with spatial analysis and mapping. The sf package provide a st_as_sf() function that can make simple feature from the location information in the table. To illustrate this point, let us first load the file that contain the geographical information into the workspace.

location = read_csv("data/kimbiji_kizimkazi_transect.csv")

Looking the internal structure of the location object we loaded, we find that there are eighteen observations and each observation has the longitude and latitude information.

location %>% glimpse()
FALSE Rows: 18
FALSE Columns: 2
FALSE $ lon <dbl> 39.45336, 39.45336, 39.46751, 39.47458, 39.47812, 39.49226, 39.485~
FALSE $ lat <dbl> -6.850945, -6.822652, -6.787286, -6.758993, -6.730700, -6.713016, ~

The file contain only the geographical information. We can add the column for station names. mutate() function from dplyr package add the third column. Because the station name should be sequentially numbered, the paste() function was used to do this.

location = location %>% 
  mutate(name = paste("station", 1:18))

Once we know that the dataset contain the longitude and latitude information, we can use these spatial information to make simple feature object using the st_as_sf() from sf package

location.sf = location %>% 
  st_as_sf(coords = c("lon", "lat"))

location.sf
FALSE Simple feature collection with 18 features and 1 field
FALSE Geometry type: POINT
FALSE Dimension:     XY
FALSE Bounding box:  xmin: 39.45336 ymin: -6.850945 xmax: 39.55239 ymax: -6.461915
FALSE CRS:           NA
FALSE # A tibble: 18 x 2
FALSE    name                   geometry
FALSE    <chr>                   <POINT>
FALSE  1 station 1  (39.45336 -6.850945)
FALSE  2 station 2  (39.45336 -6.822652)
FALSE  3 station 3  (39.46751 -6.787286)
FALSE  4 station 4  (39.47458 -6.758993)
FALSE  5 station 5    (39.47812 -6.7307)
FALSE  6 station 6  (39.49226 -6.713016)
FALSE  7 station 7  (39.48519 -6.695333)
FALSE  8 station 8  (39.49226 -6.659967)
FALSE  9 station 9   (39.50641 -6.64582)
FALSE 10 station 10 (39.51702 -6.631674)
FALSE 11 station 11  (39.52056 -6.61399)
FALSE 12 station 12 (39.52763 -6.578624)
FALSE 13 station 13 (39.52763 -6.557404)
FALSE 14 station 14  (39.5347 -6.539721)
FALSE 15 station 15 (39.54178 -6.518501)
FALSE 16 station 16 (39.54531 -6.497281)
FALSE 17 station 17 (39.54531 -6.483135)
FALSE 18 station 18 (39.55239 -6.461915)

The coords parameter is given the latitude and longitude value columns–values used to locate the points associated with each record. We now have a simple featuere with 18 points. However, the simple feature lack the coordinate system. We can define the coordinate system for the simple feature with the st_set_crs() function and parse the epsg code of WGS84.

location.sf = location.sf %>% 
  st_set_crs(4326)

Let us check if the location.sf is indeed a spatial object

location.sf
FALSE Simple feature collection with 18 features and 1 field
FALSE Geometry type: POINT
FALSE Dimension:     XY
FALSE Bounding box:  xmin: 39.45336 ymin: -6.850945 xmax: 39.55239 ymax: -6.461915
FALSE Geodetic CRS:  WGS 84
FALSE # A tibble: 18 x 2
FALSE    name                   geometry
FALSE  * <chr>               <POINT [°]>
FALSE  1 station 1  (39.45336 -6.850945)
FALSE  2 station 2  (39.45336 -6.822652)
FALSE  3 station 3  (39.46751 -6.787286)
FALSE  4 station 4  (39.47458 -6.758993)
FALSE  5 station 5    (39.47812 -6.7307)
FALSE  6 station 6  (39.49226 -6.713016)
FALSE  7 station 7  (39.48519 -6.695333)
FALSE  8 station 8  (39.49226 -6.659967)
FALSE  9 station 9   (39.50641 -6.64582)
FALSE 10 station 10 (39.51702 -6.631674)
FALSE 11 station 11  (39.52056 -6.61399)
FALSE 12 station 12 (39.52763 -6.578624)
FALSE 13 station 13 (39.52763 -6.557404)
FALSE 14 station 14  (39.5347 -6.539721)
FALSE 15 station 15 (39.54178 -6.518501)
FALSE 16 station 16 (39.54531 -6.497281)
FALSE 17 station 17 (39.54531 -6.483135)
FALSE 18 station 18 (39.55239 -6.461915)

let us check the class of the simple feature

location.sf %>% 
  class()
FALSE [1] "sf"         "tbl_df"     "tbl"        "data.frame"

Note the object has four class sf, tbl_df, tbl, and data_frame. The data frame contents was also carried over into the attributes table of the simple feature. There was only one attribute, name, other than lon and lat in the tabular data used to create this simple feature.

Looking on the file clearly the projection is defined to WGS84. We can further transform the geographical coordinate system that is degree into the UTM, which is in metric. The function st_transform() from sf package handle transformation of coordinate system [@sf]. The epsg code for zone 37 south is 32737, which is parsed into the function.

location.utm = location.sf %>% 
  st_transform(32737)

location.utm
FALSE Simple feature collection with 18 features and 1 field
FALSE Geometry type: POINT
FALSE Dimension:     XY
FALSE Bounding box:  xmin: 550090.3 ymin: 9242705 xmax: 561079.7 ymax: 9285701
FALSE Projected CRS: WGS 84 / UTM zone 37S
FALSE # A tibble: 18 x 2
FALSE    name                 geometry
FALSE  * <chr>             <POINT [m]>
FALSE  1 station 1  (550090.3 9242705)
FALSE  2 station 2  (550093.2 9245833)
FALSE  3 station 3  (551660.2 9249741)
FALSE  4 station 4  (552444.8 9252868)
FALSE  5 station 5  (552838.7 9255995)
FALSE  6 station 6  (554404.1 9257949)
FALSE  7 station 7  (553624.3 9259904)
FALSE  8 station 8    (554410 9263813)
FALSE  9 station 9  (555975.3 9265376)
FALSE 10 station 10 (557149.7 9266938)
FALSE 11 station 11 (557542.7 9268893)
FALSE 12 station 12 (558328.7 9272802)
FALSE 13 station 13 (558331.2 9275147)
FALSE 14 station 14 (559115.3 9277101)
FALSE 15 station 15 (559899.8 9279446)
FALSE 16 station 16 (560293.4 9281792)
FALSE 17 station 17 (560295.1 9283356)
FALSE 18 station 18 (561079.7 9285701)

Export simple feature as shapefile

Once the simple feature is created, you might be interested to export as shapefile for use with other GIS software like QGIS and Esri ARCGIS. The sf package has a st_write() function that export simple feature from the workspace into shapefiles in the working directory. The chunk block below demonstrates the export of simple feature object location.sf into the location.shp in the working directory—denoted with ./

location.sf %>% st_write("./location.shp")