Regex to remove non letters

I'm trying to remove non-letters from a string. Would this do it:

c = o.replace(o.gsub!(/\W+/, ''))

Answers


Just gsub! is sufficient:

o.gsub!(/\W+/, '')

Note that gsub! modifies the original o object. Also, if the o does not contain any non-word characters, the result will be nil, so using the return value as the modified string is unreliable.

You probably want this instead:

c = o.gsub(/\W+/, '')

Remove anything that is not a letter:

> " sd  190i.2912390123.aaabbcd".gsub(/[^a-zA-Z]/, '')
"sdiaaabbcd"

EDIT: as ikegami points out, this doesn't take into account accented characters, umlauts, and other similar characters. The solution to this problem will depend on what exactly you are referring to as "not a letter". Also, what your input will be.


That will work most of the cases, except when o initially does not contain any non-letter, in which case gsub! will return nil.

If you just want a replaced string, it can be simpler:

c = o.gsub(/\W+/, '')

Using \W or \w to select or delete only characters won't work. \w means A-Z, a-z, 0-9, and "_":

irb(main):002:0> characters = (' ' .. "\x7e").to_a.join('')
=> " !\"\#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~"
irb(main):003:0> characters.gsub(/\W+/, '')
=> "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ_abcdefghijklmnopqrstuvwxyz"

So, stripping using \W preserves digits and underscores.

If you want to match characters use /[A-Za-z]+/, or the POSIX character class [:alpha:], i.e. /[[:alpha:]]+/, or /\p{ALPHA}/.

The final format is the Unicode property for 'A'..'Z' + 'a'..'z' in ASCII, and gets extended when dealing with Unicode, so if you have multibyte characters you should probably use that.


Keep in mind that ruby considers the underscore _ to be a word character. So if you want to keep underscores as well, this should do it

string.gsub!(/\W+/, '')

Otherwise, you need to do this:

string.gsub!(/[^a-zA-Z]/, '')

use Regexp#union to create a big matching object

allowed = Regexp.union(/[a-zA-Z0-9]/, " ", "-", ":", ")", "(", ".")
cleanstring = dirty_string.chars.select {|c| c =~ allowed}.join("")

I don't see what that o.replace is in there for if you have a string:

string = 't = 4 6 ^'

And you do:

string.gsub!(/\W+/, '')

You get:

t46

If you want to get rid of the number characters too, you can do:

string.gsub!(/\W+|\d+/, '')

And you get:

t

Need Your Help

Getting String from A TextCtrl Box

python wxpython textctrl

How to get the strings from a TextCtrl box? Here is the practice code:

Export MySQL data to Excel in PHP

php mysql excel export

I'm trying to get my MySQL data to Excel file, but I'm having problems with Excel cells. All my text goes to one cell, I would like to have each row value in separate Excel cell. Here is my code: