Simple way to subset SpatialPolygonsDataFrame (i.e. delete polygons) by attribute in R
I would like simply delete some polygons from a SpatialPolygonsDataFrame object based on corresponding attribute values in the @data data frame so that I can plot a simplified/subsetted shapefile. So far I haven't found a way to do this.
For example, let's say I want to delete all polygons from this world shapefile that have an area of less than 30000. How would I go about doing this?
Or, similarly, how can I delete Antartica?
require(maptools) getinfo.shape("TM_WORLD_BORDERS_SIMPL-0.3.shp") # Shapefile type: Polygon, (5), # of Shapes: 246 world.map <- readShapeSpatial("TM_WORLD_BORDERS_SIMPL-0.3.shp") class(world.map) #  "SpatialPolygonsDataFrame" # attr(,"package") #  "sp" head(world.map@data) # FIPS ISO2 ISO3 UN NAME AREA POP2005 REGION SUBREGION LON LAT # 0 AC AG ATG 28 Antigua and Barbuda 44 83039 19 29 -61.783 17.078 # 1 AG DZ DZA 12 Algeria 238174 32854159 2 15 2.632 28.163 # 2 AJ AZ AZE 31 Azerbaijan 8260 8352021 142 145 47.395 40.430 # 3 AL AL ALB 8 Albania 2740 3153731 150 39 20.068 41.143 # 4 AM AM ARM 51 Armenia 2820 3017661 142 145 44.563 40.534 # 5 AO AO AGO 24 Angola 124670 16095214 2 17 17.544 -12.296
If I do something like this, the plot does not reflect any changes.
world.map@data = world.map@data[world.map@data$AREA > 30000,] plot(world.map)
same result if I do this:
world.map@data = world.map@data[world.map@data$NAME != "Antarctica",] plot(world.map)
Any help is appreciated!
looks like you're overwriting the data, but not removing the polygons. If you want to cut down the dataset including both data and polygons, try e.g.
world.map <- world.map[world.map$AREA > 30000,] plot(world.map)
[[Edit 19 April, 2016]] That solution used to work, but @Bonnie reports otherwise for a newer R version (though perhaps the data has changed too?): world.map <- world.map[world.map@data$AREA > 30000, ] Upvote @Bonnie's answer if that helped.
When I tried to do this in R 3.2.1, tim riffe's technique above did not work for me, although modifying it slightly fixed the problem. I found that I had to specifically reference the data slot as well before specifying the attribute to subset on, as below:
world.map <- world.map[world.map@data$AREA > 30000, ] plot(world.map)
Adding this as an alternative answer in case others come across the same issue.
Just to mention that subset also makes the work avoiding to write the data's name in the condition.
world.map <- subset(world.map, AREA > 30000) plot(world.map)
I used the above technique to make a map of just Australia:
australia.map < - world.map[world.map$NAME == "Australia",] plot(australia.map)
The comma after "Australia" is important, as it turns out.
One flaw with this method is that it appears to retain all of the attribute columns and rows for all of the other countries, and just populates them with zeros. I found that if I wrote out a .shp file, then read it back in using readOGR (rgdal package), it automatically removes the null geographic data. Then I could write another shape file with only the data I want.
writeOGR(australia.map,".","australia",driver="ESRI Shapefile") australia.map < - readOGR(".","australia") writeOGR(australia.map,".","australia_small",driver="ESRI Shapefile")
On my system, at least, it's the "read" function that removes the null data, so I have to write the file after reading it back once (and if I try to re-use the filename, I get an error). I'm sure there's a simpler way, but this seems to work good enough for my purposes anyway.
As a second pointer: this does not work for shapefiles with "holes" in the shapes, because it is subsetting by index.