Sequentially replace multiple places matching single pattern in a string with different replacements
Using stringr package, it is easy to perform regex replacement in a vectorized manner.
Question: How can I do the following:
Replace every word in
to different replacements, e.g. increasing numbers
Note that simple separators cannot be assumed, the practical use case is more complicated.
stringr::str_replace_all does not seem to work because it
str_replace_all(x, "(\\w+)", 1:7)
produces a vector for each replacement applied to all words, or it has uncertain and/or duplicate input entries so that
str_replace_all(x, c("hello" = "1", "world" = "2", ...))
will not work for the purpose.
Here's another idea using gsubfn. The pre function is run before the substitutions and the fun function is run for each substitution:
library(gsubfn) x <- "hello,world??your,make|world,hello,pos" p <- proto(pre = function(t) t$v <- 0, # replace all matches by 0 fun = function(t, x) t$v <- v + 1) # increment 1 gsubfn("\\w+", p, x)
This variation would give the same answer since gsubfn maintains a count variable for use in proto functions:
pp <- proto(fun = function(...) count) gsubfn("\\w+", pp, x)
See the gsubfn vignette for examples of using count.
I would suggest the "ore" package for something like this. Of particular note would be ore.search and ore.subst, the latter of which can accept a function as the replacement value.
library(ore) x <- "hello,world??your,make|world,hello,pos" ## Match all and replace with the sequence in which they are found ore.subst("(\\w+)", function(i) seq_along(i), x, all = TRUE) #  "1,2??3,4|5,6,7" ## Create a cool ore object with details about what was extracted ore.search("(\\w+)", x, all = TRUE) # match: hello world your make world hello pos # context: , ?? , | , , # number: 1==== 2==== 3=== 4=== 5==== 6==== 7==
Here a base R solution. It should also be vectorized.
x="hello,world??your,make|world,hello,pos" #split x into single chars x_split=strsplit(x,"")[] #find all char positions and replace them with "a" x_split[gregexpr("\\w", x)[]]="a" #find all runs of "a" rle_res=rle(x_split) #replace run lengths by 1 rle_res$lengths[rle_res$values=="a"]=1 #replace run values by increasing number rle_res$values[rle_res$values=="a"]=1:sum(rle_res$values=="a") #use inverse.rle on the modified rle object and collapse string paste0(inverse.rle(rle_res),collapse="") # "1,2??3,4|5,6,7"