Simple way to subset SpatialPolygonsDataFrame (i.e. delete polygons) by attribute in R

I would like simply delete some polygons from a SpatialPolygonsDataFrame object based on corresponding attribute values in the @data data frame so that I can plot a simplified/subsetted shapefile. So far I haven't found a way to do this.

For example, let's say I want to delete all polygons from this world shapefile that have an area of less than 30000. How would I go about doing this?

Or, similarly, how can I delete Antartica?

require(maptools)

getinfo.shape("TM_WORLD_BORDERS_SIMPL-0.3.shp") 
# Shapefile type: Polygon, (5), # of Shapes: 246
world.map <- readShapeSpatial("TM_WORLD_BORDERS_SIMPL-0.3.shp")

class(world.map)
# [1] "SpatialPolygonsDataFrame"
# attr(,"package")
# [1] "sp"

head(world.map@data)
#   FIPS ISO2 ISO3 UN                NAME   AREA  POP2005 REGION SUBREGION     LON     LAT
# 0   AC   AG  ATG 28 Antigua and Barbuda     44    83039     19        29 -61.783  17.078
# 1   AG   DZ  DZA 12             Algeria 238174 32854159      2        15   2.632  28.163
# 2   AJ   AZ  AZE 31          Azerbaijan   8260  8352021    142       145  47.395  40.430
# 3   AL   AL  ALB  8             Albania   2740  3153731    150        39  20.068  41.143
# 4   AM   AM  ARM 51             Armenia   2820  3017661    142       145  44.563  40.534
# 5   AO   AO  AGO 24              Angola 124670 16095214      2        17  17.544 -12.296

If I do something like this, the plot does not reflect any changes.

world.map@data = world.map@data[world.map@data$AREA > 30000,]
plot(world.map)

same result if I do this:

world.map@data = world.map@data[world.map@data$NAME != "Antarctica",]
plot(world.map)

Any help is appreciated!

Answers


looks like you're overwriting the data, but not removing the polygons. If you want to cut down the dataset including both data and polygons, try e.g.

world.map <- world.map[world.map$AREA > 30000,]
plot(world.map)

[[Edit 19 April, 2016]] That solution used to work, but @Bonnie reports otherwise for a newer R version (though perhaps the data has changed too?): world.map <- world.map[world.map@data$AREA > 30000, ] Upvote @Bonnie's answer if that helped.


When I tried to do this in R 3.2.1, tim riffe's technique above did not work for me, although modifying it slightly fixed the problem. I found that I had to specifically reference the data slot as well before specifying the attribute to subset on, as below:

world.map <- world.map[world.map@data$AREA > 30000, ]
plot(world.map)

Adding this as an alternative answer in case others come across the same issue.


Just to mention that subset also makes the work avoiding to write the data's name in the condition.

world.map <- subset(world.map, AREA > 30000)
plot(world.map)

I used the above technique to make a map of just Australia:

australia.map < - world.map[world.map$NAME == "Australia",]
plot(australia.map)

The comma after "Australia" is important, as it turns out.

One flaw with this method is that it appears to retain all of the attribute columns and rows for all of the other countries, and just populates them with zeros. I found that if I wrote out a .shp file, then read it back in using readOGR (rgdal package), it automatically removes the null geographic data. Then I could write another shape file with only the data I want.

writeOGR(australia.map,".","australia",driver="ESRI Shapefile")
australia.map < - readOGR(".","australia")
writeOGR(australia.map,".","australia_small",driver="ESRI Shapefile")

On my system, at least, it's the "read" function that removes the null data, so I have to write the file after reading it back once (and if I try to re-use the filename, I get an error). I'm sure there's a simpler way, but this seems to work good enough for my purposes anyway.


As a second pointer: this does not work for shapefiles with "holes" in the shapes, because it is subsetting by index.


Need Your Help

Why use hex values instead of normal base 10 numbers?

java c hex math.sqrt

I was looking over this code to calculate math.sqrt in Java. Why did they use hex values in some of the loops and normal values for variables? What benefits are there to use hex?