Howdy Neighbor! Proximity Analysis in R

March 1, 2021

Proximity analysis is one of the cornerstones of spatial analysis. It refers to the ways in which we use spatial methods to ask "what is happening near here". It is at the heart of the Tobler's first law of geography which I paraphrase as "Everything is related but near things are more related." In practice, proximity analysis is often implemented with buffer polygons. A buffer polygon delineates an area or area within a defined radius of a feature of interest. This necessitates the definition of the buffer distance and a clear explanation of why that distance was selected. Another tool for exploring proximity is Voronoi diagram, which is also called Voronoi tessellation, a Voronoi decomposition, a Voronoi partition, or a Dirichlet tessellation.  A Voronoi diagram partitions the area around a set of features into regions where each location within the region is closer to the contained feature than to any other feature. The resultant polygons, called Thiessen polygons, are shown in the image below.

 

Source: https://commons.wikimedia.org/wiki/File:Euclidean_Voronoi_diagram.svg

Let's take a look at some of the many ways in which proximity analysis can be implemented in R. Keep in mind that this is only an informational example and not a real analysis workflow which would be more complex.

OVERVIEW

In this example we look at service areas around hospitals in Alameda County, CA. We use the California Healthcare Facility Listing data that we downloaded from the CA Health & Human Services Open Data Portal. For geographic context, we download the 2019 USA counties cartographic boundary file from the Census website and subset to get the border of Alameda County. We use buffers and Voronoi polygons to model the service area for each hospital, considering distance alone. Then we vary these approaches by weighting the analysis based on the number of available hospital beds.

GETTING STARTED

First, let’s load the R libraries we will use. I encourage you to look at the online documentation for each of these important R spatial libraries.

library(sf)

library(data.table)

library(raster)

library(rgeos)

library(tmap)

library(classInt)

Don’t forget to set your working directory to the folder in which your data resides!

 # Set working directory

setwd("~/Documents/proximity_analysis")

Let’s identify our study polygon as Alameda County, CA. We will load the USA County boundary data downloaded from the Census and then subset to keep just Alameda County.

 # Census USA Counties geography data, cartographic boundary file

poly_file <- "cb_2019_us_county_500k.shp"

polys <- st_read(poly_file)

# Subset County Polygons to Keep only Alameda County

my_poly <- polys[polys$STATEFP=='06'& polys$COUNTYFP=='001',]

# Map it!

plot(my_poly$geometry)

 

We will need to transform all the spatial data we use in this analysis to a planar (2-dimensional) coordinate reference system. This allows our spatial methods to use Euclidean geometry operations, which is required by the libraries we are using. Here we choose the CA Albers coordinate reference system (CRS) as it is appropriate for area analysis within the state of CA.

# Transform the data to CA ALbers, using the identifier EPSG code 3310

my_crs_code <- 3310

my_poly <- st_transform(my_poly, crs=my_crs_code)

Now we are ready to read in the healthcare facility data. We will subset it to keep only hospitals, i.e., where the type is "General Acute Care Hospital" and the number of beds in the facility is greater than 1 (to remove administrative offices).

# CA Hospital facility points

points_file <- "data_chhs_ca_gov_lic_heathcare_fac_20210215.csv"

pts_df <- read.csv(points_file)

head(pts_df)

# keep only facilities with hospital beds

nrow(pts_df)

pts_df <- pts_df[pts_df$TOTAL_NUMBER_BEDS > 1,]

nrow(pts_df)

# and where the type of facility is equal to "General Acute Care Hospital"

unique(pts_df$LICENSE_CATEGORY_DESC)

pts_df <- pts_df[pts_df$LICENSE_CATEGORY_DESC=="General Acute Care Hospital",]

nrow(pts_df)

The hospital data are originally in a CSV file and we read them into an R dataframe. In order to make these data spatial points we need to convert them using the sf library’s function st_as_sf which in this case returns a SpatialPointsDataframe. The inputs to this function are (1) the name of the dataframe (2) the columns in that dataframe that contain the x and y coordinate data (in that order), and the CRS of the data as it is encoded in the dataframe. We then transform the data to CA Albers to use in our analysis.

# make dataframe an sf::SpatialPointsDataFrame

pts = st_as_sf(pts_df,

                coords = c('LONGITUDE', 'LATITUDE'),

                crs = 4326)

plot(pts$geometry)

# transform the CRS to CA Albers

my_pts <- st_transform(pts, crs=my_crs_code)

# Subset - select only those points in Alameda

my_pts <- st_intersection(my_pts, my_poly)

# Map the points on top of Alameda County

plot(my_poly$geometry)

plot(my_pts$geometry, add=T)

 

BUFFER ANALYSIS

The sf library makes it relatively straightforward to create distance based buffers with the st_buffer operation where the two key imports are the name of the sf object with the features around which to buffer and the buffer distance in the units of the CRS of the sf object. For the CA Albers CRS the units are meters.  In the code below we define a buffer distance of 2000 meters, or 2km. This will define areas that are 2KM around each point. We may wish to do this if our assumption is that people use the closest hospital if it is within 2KM. We know that to be an oversimplification!

# Identify buffer distance in units of the CRS, here meters

bufdist <- 2000  # or 2km

# Implement the buffer operation on the points

# which outputs polygons for each point

pts_buf <- st_buffer(my_pts, dist=bufdist)

# Plot results

plot(my_poly$geometry)

plot(pts_buf$geometry, add=T)

plot(my_pts$geometry, col="red", add=T)

 

Now those buffers are all the same size regardless of the number of hospital beds. So let’s vary the size of the buffers based on the number of beds. The way we do this is as follows. 

  • First we use the classInt::classIntervals function to create a set of break points based on binning the points into 5 quantiles based on the number of beds. 
  • We then use those class breaks to create a bed_weight variable that ranges from 1-5 for each quantile.
  • We set the initial buffer size to 1000 meters and then multiply it by the bed_weight to get a buffer distance that ranges from 1 to 5 KM.

# Take a look at the range of values for number of beds

summary(as.numeric(my_pts$TOTAL_NUMBER_BEDS))

# make the number of beds an integer variable

my_pts$TOTAL_NUMBER_BEDS <- as.numeric(my_pts$TOTAL_NUMBER_BEDS)

# Now lets use the classInt package to classify the number of beds into weights

# depending on what quantile they fall into

my_cats <- classIntervals(my_pts$TOTAL_NUMBER_BEDS, n=5, style="quantile")

my_cats

names(my_cats)

# Make a weights vector that we can add back to the dataframe

my_weights <- cut(my_pts$TOTAL_NUMBER_BEDS, breaks = my_cats$brks, labels=as.numeric(1:5))

length(my_weights)

nrow(my_pts)

my_pts$bed_weights <- cut(my_pts$TOTAL_NUMBER_BEDS, breaks = my_cats$brks, labels=F)

#Check the shape of the bed_weights var

summary(my_pts$bed_weights)

# if any weights are NA set them to one

my_pts$bed_weights[is.na(my_pts$bed_weights)] <- 1

# Now create a buffer that multiplies our base distance times the weight

bufdist = 1000

pts_bufw <- st_buffer(my_pts, (bufdist*my_pts$bed_weights))

#finally, map the output

plot(my_poly$geometry)

plot(pts_bufw$geometry, add=T)

plot(my_pts$geometry, col="red", add=T)

 

Our variable distance polygons take into account the ability of a hospital with a lot of beds to service a larger area.

VORONOI DIAGRAM

Our buffer polygons do not account for all of the area around a hospital, e.g., if an area isn’t in the buffer it is not assigned to a hospital. But we know that everyone may need to go to a hospital at some point. So let’s consider how a Voronoi diagram can be used to model proximity.

In the code below, after that from Dan Houghton on github, we partition all of Alameda County so that each area within it is assigned to the nearest hospital.

voronoi <- 

  my_pts %>% 

  st_geometry() %>%

  st_union() %>%

  st_voronoi() %>%

  st_collection_extract()

# Put output back in their original order

# so can easily rejoin with source attributes

voronoi <-

  voronoi[unlist(st_intersects(my_pts,voronoi))]

# Plot the point voronoi polygons

plot(voronoi)

# clip the voronoi polys to the boundary

voronoi2 <- st_intersection(my_poly$geometry, voronoi)

# plot the clipped voronoi polygons

plot(voronoi2)

plot(my_pts$geometry, add=T, col="red", pch=20)

# Make the voronoi polygon data an sf object

# that has the voronoi geometry but the input attributes

voronoi_sf = st_sf(my_pts, geometry = voronoi2)    # sf object

st_set_crs(voronoi_sf, my_crs_code)

st_crs(voronoi_sf)

# Plot it to check

plot(voronoi_sf$geometry)

plot(my_pts$geometry, add=T, col="red", pch=20)

 

COMBINING BUFFERS AND VORONOI POLYGONS

 We can combine these two approaches as a way of implementing the assumption that an area will be served by the closest hospital, but only within a specific distance.

 Here we implement this combining Voronoi polygons clipped to fixed distance buffers, again following code by Dan Houghton:

st_buffer_without_overlap <- function(centroids, dist) {

  # Voronoi tesselation

  voronoi <- 

    centroids %>% 

    st_geometry() %>%

    st_union() %>%

    st_voronoi() %>%

    st_collection_extract()

  # Put them back in their original order

  voronoi <-

    voronoi[unlist(st_intersects(centroids,voronoi))]

  # Keep the attributes

  result <- centroids

  # Intersect voronoi zones with buffer zones

  st_geometry(result) <-

    mapply(function(x,y) st_intersection(x,y),

           st_buffer(st_geometry(centroids),dist), 

           voronoi,

           SIMPLIFY=FALSE) %>%

    st_sfc(crs=st_crs(centroids))

  result

}

buffer_distance_meters <- 2000 # 2000 meters or 2 km

pts_buffered <- st_buffer_without_overlap(my_pts, buffer_distance_meters)

# Plot buffer points

plot(my_poly$geometry)

plot(pts_buffered$geometry, add=T)

# Clip buffer points

pts_buffered_clipped <- st_intersection(my_poly, pts_buffered)

# plot clipped points

plot(my_poly$geometry) 

plot(pts_buffered_clipped, col="beige", add=T)

plot(my_pts$geometry, pch=20, col="red", add=T)

Here we implement same approach with with variable width buffers

st_buffer_without_overlap_weighted <- function(centroids, dist, weight_var) {

  # Voronoi tesselation

  voronoi <- 

    centroids %>% 

    st_geometry() %>%

    st_union() %>%

    st_voronoi() %>%

    st_collection_extract()

  # Put them back in their original order

  voronoi <-

    voronoi[unlist(st_intersects(centroids,voronoi))]

  # Keep the attributes

  result <- centroids

  # Intersect voronoi zones with buffer zones

  st_geometry(result) <-

    mapply(function(x,y) st_intersection(x,y),

           st_buffer(st_geometry(centroids), (dist*centroids[[weight_var]])), 

           voronoi,

           SIMPLIFY=FALSE) %>%

    st_sfc(crs=st_crs(centroids))

  result

}

#

buffer_distance_meters <- 1000 # 20000 meters or 20 km

pts_buffered2 <- st_buffer_without_overlap_weighted(my_pts, buffer_distance_meters, 'bed_weights')

# Plot buffer points

plot(my_pts$geometry)

plot(pts_buffered2$geometry, add=T)

# clip 

pts_buffered_clipped2 <- st_intersection(my_poly, pts_buffered2)

# plot clipped points

plot(my_poly$geometry) 

plot(pts_buffered_clipped2, col="beige", add=T)

plot(my_pts$geometry, pch=20, col="red", add=T)

 

WEIGHTED VORONOI DIAGRAM

For our final example, we are going to implement code to create a weighted Voronoi diagram as a way to partition the study area but giving more weight to hospitals with a larger bed capacity. This code is borrowed from two Stack Overflow posts, one by Dr. William Huber that describes Thiessen polygons in detail, and one by Simone Bianchi who provide code for how to implement their creation in R. I recommend that you read both of these posts. I provide the code here so that you can see how to implement it with sf objects in R using data that is available online. In this example, we use the bed weights to multiplicatively vary the size of the Voronoi polygons around each point. Following Bianchi, we implement a raster implementation due to the computation complexity of generating a vector weighted Voronoi partition.  This assumes some knowledge of raster data structures and methods in R.

################

# weighted voronoi

################

rez= 1000 # this number is the cell size and depends on the size of the study area

maxlen=603 # WHERE DID 603 come from? It should be the distance / rez

my_id_var <- 'OSHPD_ID'

my_weight_var <- 'bed_weights'

r <- raster(xmn=sf::st_bbox(my_poly)[['xmin']], 

            xmx=(sf::st_bbox(my_poly)[['xmin']] + (maxlen*rez)), 

            ymn=sf::st_bbox(my_poly)[['ymin']], 

            ymx=(sf::st_bbox(my_poly)[['ymin']] + (maxlen*rez)), 

            crs = sf::st_crs(my_poly)$proj4string, resolution=rez)

pts_dt <- data.table(

  ptID = paste0('x',my_pts[[my_id_var]]),  # ID MUST BE A STR and UNIQUE

  x = st_coordinates(my_pts)[,1] ,

  y = st_coordinates(my_pts)[,2],

  size=my_pts[[my_weight_var]]

)

head(pts_dt)

# create an empty datatable with the length equal to the raster size

dt <- data.table(V1=seq(1, length.out = (maxlen*rez)^2*(1/rez)^2))

# calculate distance grid for each tree, apply weight and store the data

for (i in pts_dt[, unique(ptID)]){

  #print(i)

  # 1) standard Euclidean distance between each cell and each point

  d <- distanceFromPoints(r, pts_dt[ptID==i,.(x,y)])

  # 2) apply your own custom weight

  # modify distance values by point weight (should set weight default to 1)

  d <- ((d@data@values)^2)/pts_dt[ptID==i,size]

  # 3) store the results with the correct name

  dt <- cbind(dt,d)

  #print(i)

  setnames(dt, "d", i)

}

# carry out comparison

# find the column with the lowest value (+1 to account to column V1 at the beginning)

# ie. find the smallest distance between a cell and a point

dt2<-dt

dt[,minval:=which.min(.SD)+1, by=V1, .SDcols=2:length(names(dt))]

# find the name of that column, that is the id of the point

dt[,minlab:=names(dt)[minval]]

# create resulting raster

out <- raster(xmn=sf::st_bbox(my_poly)[['xmin']], 

              xmx=(sf::st_bbox(my_poly)[['xmin']] + (maxlen*rez)), 

              ymn=sf::st_bbox(my_poly)[['ymin']], 

              ymx=(sf::st_bbox(my_poly)[['ymin']] + (maxlen*rez)), 

              crs = sf::st_crs(my_poly)$proj4string, resolution=rez,

              vals=dt[,as.factor(minlab)]

)

plot(out)

out_masked <- mask(out, my_poly)

plot(out_masked)

plot(my_poly, add=T)

# Transform into polygons

# Note outpt is class sp

outpoly <- rasterToPolygons(out_masked, dissolve=T)

outpoly_sf <-st_as_sf(outpoly)

# add the point ids to the polys

x <- dt[,as.factor(minlab)]

outpoly_sf$nearpt_id <- levels(x)[outpoly_sf$layer]

# Plot results with starting points

plot(outpoly_sf$geometry, col="beige");

plot(my_pts$geometry, col="red",add=T)

 

Voronoi and Weighted Voronoi polygons partition space into neighborhoods with similar estimated characteristics, hospital service areas in this example. As such, Voronoi diagrams are a method of spatial interpolation. These polygons can also serve as inputs to adjacency based methods of proximity analysis such as spatial autocorrelation, which considers patterns and trends in the data. Spatial analysis provides many ways to consider neighborliness.

CONCLUDING THOUGHTS

That’s a lot of proximity analysis code to chew on! I hope that there is some useful code in this post, whatever type of neighborhood you are trying to model.  I also hope you can see from this post just how powerful R is as a computing environment for spatial analysis. The R sf and raster packages and related libraries provide a fantastic complement to desktop GIS software like ArcGIS and QGIS.  If you want to learn more, register for the D-Lab’s R Geospatial Data Fundamentals workshop, one iteration of which will be held in March 2021. I also strongly (and often) recommend the great online book Geocomputation with R by Robin Lovelace, Jakub Nowosad, and Jannes Muenchow for those interested in geospatial data and analysis in R.