Parsing xml in bash

I need to parse a given XML file for specific content. Unfortunately I only have xmllint WITHOUT xpath on my system (and I'm not allowed to install / update any other sources). The XML would contain:

<?xml version="1.0"?>
<SOAP-ENV:Envelope xmlns:SOAP-ENV="http://schemas.xmlsoap.org/soap/envelope/">
  <SOAP-ENV:Body>
    <CreateIncidentResponse xmlns="http://schemas.hp.com/SM/7" xmlns:cmn="http://schemas.hp.com/SM/7/Common" xmlns:xmime="http://www.w3.org/2005/05/xmlmime" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" message="Success" returnCode="0" schemaRevisionDate="2016-02-16" schemaRevisionLevel="2" status="SUCCESS" xsi:schemaLocation="http://schemas.hp.com/SM/7 /Incident.xsd">
      <model>
        <keys>
          <IncidentID type="String">IM0832268</IncidentID>
        </keys>
        <instance recordid="IM0832268 - Paul test 3 incident via soap" uniquequery="number=&quot;IM0832268&quot;">
          <IncidentID type="String">IM0832268</IncidentID>
          <Category type="String">request for change</Category>
          <OpenTime type="DateTime">2016-03-18T16:06:28+00:00</OpenTime>
          <OpenedBy type="String">Harlass, Alexander</OpenedBy>
          <Priority type="String">4</Priority>
          <Urgency type="String">medium</Urgency>
          <UpdatedTime type="DateTime">2016-03-18T16:06:28+00:00</UpdatedTime>
          <AssignmentGroup type="String">TS3-AOS</AssignmentGroup>
          <Description type="Array">
            <Description type="String">RH test incident description via soap row 1</Description>
            <Description type="String">RH test incident description via soap row 2</Description>
          </Description>
          <Contact type="String">Harlass, Rudolf</Contact>
          <Title type="String">Paul test 3 incident via soap</Title>
          <TicketOwner type="String">INTEGRATION.OVO</TicketOwner>
          <UpdatedBy type="String">INTEGRATION.OVO</UpdatedBy>
          <Status type="String">Open</Status>
          <Area type="String">it products</Area>
          <Subarea type="String">utilization</Subarea>
          <ProblemType type="String">request for change</ProblemType>
          <Impact type="String">low</Impact>
          <Service type="String">PI Automation and Orchestration Service</Service>
          <VIP type="Boolean">false</VIP>
          <TargetResolutionDate type="DateTime">2016-03-25T15:00:00+00:00</TargetResolutionDate>
          <SOD type="String">OML</SOD>
          <SourceId type="String">4711</SourceId>
          <UserIncident type="Boolean">false</UserIncident>
          <AlertId type="String">4712</AlertId>
          <MonitoredId type="String">MI4713</MonitoredId>
        </instance>
      </model>
      <messages>
        <cmn:message type="String">Audit Record successfully recorded and added.</cmn:message>
      </messages>
    </CreateIncidentResponse>
  </SOAP-ENV:Body>
</SOAP-ENV:Envelope>

In the end I would need an output like this:

Create SUCCESS
Messages:
    Audit Record successfully recorded and added.
Incident ID: IM0832268
    Status: Open
    Severity: 4
    Brief Description: RH test incident description via soap row 1
    Opened by: integration.ovo
    Opened time: March 20, 2016 11:54:08 PM CET

I do know how to create a string containing the output, but unfortunately I'm not that familiar with sed or similar tools. Any help on how to extract the needed strings from the xml would be appreciated. Thanks in advance

Answers


Most systems contain python or perl or some other language that has actual XML processing capabilities. This would yield a far better solution that attempting to produce a nicely formatted report from a large chunk of XML in bash. Having said that, here are some ideas for extracting this data with bash.

Given a string like:

<IncidentID type="String">IM0832268</IncidentID>

You can get the value using awk like this (assuming your data is in a file called data.xml):

awk -F'[<>]' '/IncidentID/ {print $3}' data.xml

Tje -F'[<>]' sets the awk field separator to be either < or >, so that the given line is split in fields like this:

| 1  |  2                     |  3      |  4        |  5 |
|    |IncidentID type="String"|IM0832268|/IncidentID|    |

The above example will actually return two lines (because there are two IncidentID tags in your data):

IM0832268
IM0832268

If you know these will always be the same, you can just take the first one:

awk -F'[<>]' '/IncidentID/ {print $3; exit}' data.xml

To extract an attribute from a line like:

<CreateIncidentResponse xmlns="http://schemas.hp.com/SM/7" xmlns:cmn="http://schemas.hp.com/SM/7/Common" xmlns:xmime="http://www.w3.org/2005/05/xmlmime" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" message="Success" returnCode="0" schemaRevisionDate="2016-02-16" schemaRevisionLevel="2" status="SUCCESS" xsi:schemaLocation="http://schemas.hp.com/SM/7 /Incident.xsd">

You can first split it into one line per attribute, like this:

grep '<CreateIncidentResponse' data.xml | tr ' ' '\n'

Which will give you:

<CreateIncidentResponse
xmlns="http://schemas.hp.com/SM/7"
xmlns:cmn="http://schemas.hp.com/SM/7/Common"
xmlns:xmime="http://www.w3.org/2005/05/xmlmime"
xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
message="Success"
returnCode="0"
schemaRevisionDate="2016-02-16"
schemaRevisionLevel="2"
status="SUCCESS"
xsi:schemaLocation="http://schemas.hp.com/SM/7
/Incident.xsd">

Which you can then pass to awk to extract attribute values. For example, to get the value of the message attribute:

grep '<CreateIncidentResponse' data.xml | tr ' ' '\n' |
awk -F'"' '/message/ {print $2}'

Which would yield:

Success

Hopefully this is enough to get you started.


Need Your Help

WCF Full Duplex Handle Client Disconnection

c# wcf

Is there any way for me to detect client disconnection from my service? I only get the chance to know if a certain client is disconnected on try-catch method which is not a very good way to do it. ...