How to convert XML to Dict

I want to find some great solution to convert XML to dict and vice versa in python

Answers


The following recipe should be helpful:


xmltodict (full disclosure: I wrote it) does exactly that, following this "standard". It is Expat-based, so it's very fast and doesn't need to load the whole XML tree in memory.

>>> print(json.dumps(xmltodict.parse("""
...  <mydocument has="an attribute">
...    <and>
...      <many>elements</many>
...      <many>more elements</many>
...    </and>
...    <plus a="complex">
...      element as well
...    </plus>
...  </mydocument>
...  """), indent=4))
{
    "mydocument": {
        "@has": "an attribute", 
        "and": {
            "many": [
                "elements", 
                "more elements"
            ]
        }, 
        "plus": {
            "@a": "complex", 
            "#text": "element as well"
        }
    }
}

When converting between XML and Python dictionaries, there are some interesting corner cases that make this non-trivial (attributes? lists? anonymous lists? single entry lists? content eval?): tale a look at this document from the PicklingTools distribution: XML to Dict conversions: http://www.picklingtools.com

The docs discuss how to do this, but here's a simple example. In a file named ‘example.xml’, we will put the following XML:

<top>
  <a>1</a>
  <b>2.2</b>
  <c>three</c>
</top>

To process this file and turn it into a dictionary:

>>> from xmlloader import *
>>> example = file('example.xml', 'r')
>>> xl = StreamXMLLoader(example, 0)  # 0 = All defaults on options
>>> result = xl.expectXML()
>>> print result
{'top': {'a': '1', 'c': 'three', 'b': '2.2'}}

I might suggest taking a look at declxml to see if it fits your use case (full disclosure: I am the author). With declxml, you create objects called processors which declaratively define the structure of your XML document. Processors are used to both parse and serialize between XML and Python values including objects, dictionaries, and namedtuples.

import declxml as xml

some_xml = """
<mydocument has="an attribute">
  <and>
    <many>elements</many>
    <many>more elements</many>
  </and>
  <plus a="complex">
    element as well
  </plus>
</mydocument>
"""

processor = xml.dictionary('mydocument', [
    xml.string('.', attribute='has'),
    xml.array(xml.string('many'), nested='and'),
    xml.dictionary('plus', [
        xml.string('.', attribute='a'),
        xml.string('.', alias='plus')
    ])
])

xml.parse_from_string(processor, some_xml)

Which produces the following output

{'has': 'an attribute',
 'and': ['elements', 'more elements'],
 'plus': {'a': 'complex', 'plus': 'element as well'}}

Use xmltodict lib. The following snippet works nicely:

import xmltodict
with open(file) as fd:
xml = fd.read()
xml_dict = xmltodict.parse(xml)

I think the best way would be roll your own to suit your needs. Get lxml, read up on the docs and you should be good to go. In case you have doubts, come right back :)


Need Your Help

RabbitMQ Pull all messages from queue

erlang rabbitmq

I have a system that wraps RabbitMQ using erlang and the erlang client. We have the occasional situation where a subscriber goes down and messages queue. We will be implementing a dead-letter queue...

Do while loop in SQL Server 2008

sql-server sql-server-2008 loops while-loop do-while

Is there any method for implement do while loop in SQL server 2008?