Retrieving raw XML for items with feedparser

I'm trying to use feedparser to retrieve some specific information from feeds, but also retrieve the raw XML of each entry (ie. elements for RSS and for Atom), and I can't see how to do that. Obviously I could parse the XML by hand, but that's not very elegant, would require separate support for RSS and Atom, and I imagine it could fall out of sync with feedparser for ill-formed feeds. Is there a better way?

Thanks!

Answers


I'm the current developer of feedparser. Currently, one of the ways you can get that information is to monkeypatch feedparser._FeedParserMixin (or edit a local copy of feedparser.py). The methods you'll want to modify are:

  • feedparser._FeedParserMixin.unknown_starttag
  • feedparser._FeedParserMixin.unknown_endtag

At the top of each method you can insert a callback to a routine of your own that will capture the elements and their attributes as they're encountered by feedparser.


Need Your Help

Reusing a PreparedStatement multiple times

java jdbc prepared-statement

in the case of using PreparedStatement with a single common connection without any pool, can I recreate an instance for every dml/sql operation mantaining the power of prepared statements?

How to convert an HTML file to PDF file using .NET?

c# .net html pdf

I'd need to create a pdf version of my html pages using .NET 4.0. Is there any good free tool I can use?