Print Valid words with _ in between them

I have done my research, but not able to find the solution to my problem. I am trying to extract all valid words(Starting with a letter) in a string and concatenate them with underscore("_"). I am looking for solution with awk, sed or grep, etc.

Something like:

echo "The string under consideration" | (awk/grep/sed) (pattern match)

Example 1

Input:

1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11

Desired output:

L2_Traffic_house_seen_during_ABCD_from

Example 2

Input:

XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi

Desired Output:

XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi

Example 3

Input:

ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE

Desired Output:

ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE

Answers


This might work for you (GNU sed):

sed 's/[[:punct:]]/ /g;s/\<[[:alpha:]]/\n&/g;s/[^\n]*\n//;s/ [^\n]*//g;y/\n/_/' file

A perl one-liner. It searches any alphabetic character followed by any number of word characters enclosed in word boundaries. Use the /g flag to try several matches for each line.

Content of infile:

1.2.3::L2 Traffic-house seen during ABCD from 2.2.4/5.2.3a to 1.2.3.X11
XYZ-2-VRECYY_FAIL: Verify failed - Client 0x880016, Reason: Object exi
ABCMGR-2-SERVICE_CRASHED: Service "abcmgr" (PID 7582) during UPGRADE

Perl command:

perl -ne 'printf qq|%s\n|, join qq|_|, (m/\b([[:alpha:]]\w*)\b/g)' infile

Output:

L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE

One way using awk, with the contents of script.awk:

BEGIN {
    FS="[^[:alnum:]_]"
}

{
    for (i=1; i<=NF; i++) {
        if ($i !~ /^[0-9]/ && $i != "") {
            if (i < NF) {
                printf "%s_", $i
            }
            else {
                print $i
            }
        }
    }
}

Run like:

awk -f script.awk file.txt

Alternatively, here is the one liner:

awk -F "[^[:alnum:]_]" '{ for (i=1; i<=NF; i++) { if ($i !~ /^[0-9]/ && $i != "") { if (i < NF) printf "%s_", $i; else print $i; } } }' file.txt

Results:

L2_Traffic_house_seen_during_ABCD_from_to_X11
XYZ_VRECYY_FAIL_Verify_failed_Client_Reason_Object_exi
ABCMGR_SERVICE_CRASHED_Service_abcmgr_PID_during_UPGRADE

This solution requires some tuning and I think one needs gawk to have regexp as "record separator" http://www.gnu.org/software/gawk/manual/html_node/Records.html#Records gawk -v ORS='_' -v RS='[-: \"()]' '/^[a-zA-Z]/' file.dat


Need Your Help

how would i get the current size of the variable in java

java memory-management

Yes , this question is little bit vague and any serious programmer would laugh about it ! Although java does not tell about the sizes it assigns to the varibles ,i mean it can give more memory or l...

Does the Raspberry Pi 2 model B v1.1 have an internal RTC?

python raspberry-pi raspbian

I work with Raspberry Pi 2 model B v1.1 and I searched about a RTC to keep time even in the case of a power outage or a loss of internet connection. I found that I must buy a RTC chip with a batter...