Removing Parts of String With Sed

I have lines of data that looks like this:


How can I use sed to delete parts of string after 4th column (_ separated) for each line. Finally yielding:



cut is a better fit.

cut -d_ -f 1-4 old_file

This simply means use _ as delimiter, and keep fields 1-4.

If you insist on sed:

sed 's/\(_[^_]*\)\{4\}$//'

This left hand side matches exactly four repetitions of a group, consisting of an underscore followed by 0 or more non-underscores. After that, we must be at the end of the line. This is all replaced by nothing.

sed -e 's/\([^_]*\)_\([^_]*\)_\([^_]*\)_\([^_]*\)_.*/\1_\2_\3_\4' infile > outfile

Match "any number of not '_'", saving what was matched between \( and \), followed by '_'. Do this 4 times, then match anything for the rest of the line (to be ignored). Substitute with each of the matches separated by '_'.

Here's another possibility:

sed -E -e 's|^([^_]+(_[^_]+){3}).*$|\1|'

where -E, like -r in GNU sed, turns on extended regular expressions for readability.

Just because you can do it in sed, though, doesn't mean you should. I like cut much much better for this.

AWK likes to play in the fields:

awk 'BEGIN{FS=OFS="_"}{print $1,$2,$3,$4}' inputfile

or, more generally:

awk -v count=4 'BEGIN{FS="_"}{for(i=1;i<=count;i++){printf "%s%s",sep,$i;sep=FS};printf "\n"}'

sed -e 's/_[0-9][0-9]*_[+-]_contigs_full.fasta$//g'

Still the cut answer is probably faster and just generally better.

Yes, cut is way better, and yes matching the back of each is easier.

I finally got a match using the beginning of each line:

 sed -r 's/(([^_]*_){3}([^_]*)).*/\1/' oldFile > newFile

Need Your Help

Cannot get POST variables

php html post

I have a PHP file where I am posting variables to the same page but it is not working.