My Podcasts

Creating a simple XML file using python

What are my options if I want to create a simple XML file in python? (library wise) The xml I want looks like:

6 Answers 6

These days, the most popular (and very simple) option is the ElementTree API, which has been included in the standard library since Python 2.5.

The available options for that are:

  • ElementTree (Basic, pure-Python implementation of ElementTree. Part of the standard library since 2.5)
  • cElementTree (Optimized C implementation of ElementTree. Also offered in the standard library since 2.5. Deprecated and folded into the regular ElementTree as an automatic thing as of 3.3.)
  • LXML (Based on libxml2. Offers a rich superset of the ElementTree API as well XPath, CSS Selectors, and more)

Here’s an example of how to generate your example document using the in-stdlib cElementTree:

import xml.etree.cElementTree as ET root = ET.Element("root") doc = ET.SubElement(root, "doc") ET.SubElement(doc, "field1", name="blah").text = "some value1" ET.SubElement(doc, "field2", name="asdfasd").text = "some vlaue2" tree = ET.ElementTree(root) tree.write("filename.xml") 

I’ve tested it and it works, but I’m assuming whitespace isn’t significant. If you need «prettyprint» indentation, let me know and I’ll look up how to do that. (It may be an LXML-specific option. I don’t use the stdlib implementation much)

For further reading, here are some useful links:

  • API docs for the implementation in the Python standard library
  • Introductory Tutorial (From the original author’s site)
  • LXML etree tutorial. (With example code for loading the best available option from all major ElementTree implementations)

As a final note, either cElementTree or LXML should be fast enough for all your needs (both are optimized C code), but in the event you’re in a situation where you need to squeeze out every last bit of performance, the benchmarks on the LXML site indicate that:

  • LXML clearly wins for serializing (generating) XML
  • As a side-effect of implementing proper parent traversal, LXML is a bit slower than cElementTree for parsing.

Источник

PyMOTW

If you find this information useful, consider picking up a copy of my book, The Python Standard Library By Example.

Page Contents

This Page

Examples

The output from all the example programs from PyMOTW has been generated with Python 2.7.8, unless otherwise noted. Some of the features described here may not be available in earlier versions of Python.

If you are looking for examples that work under Python 3, please refer to the PyMOTW-3 section of the site.

Creating XML Documents¶

In addition to its parsing capabilities, xml.etree.ElementTree also supports creating well-formed XML documents from Element objects constructed in an application. The Element class used when a document is parsed also knows how to generate a serialized form of its contents, which can then be written to a file or other data stream.

Читайте также:  Python error types valueerror

Building Element Nodes¶

There are three helper functions useful for creating a hierarchy of Element nodes. Element() creates a standard node, SubElement() attaches a new node to a parent, and Comment() creates a node that serializes using XML’s comment syntax.

from xml.etree.ElementTree import Element, SubElement, Comment, tostring top = Element('top') comment = Comment('Generated for PyMOTW') top.append(comment) child = SubElement(top, 'child') child.text = 'This child contains text.' child_with_tail = SubElement(top, 'child_with_tail') child_with_tail.text = 'This child has regular text.' child_with_tail.tail = 'And "tail" text.' child_with_entity_ref = SubElement(top, 'child_with_entity_ref') child_with_entity_ref.text = 'This & that' print tostring(top) 

The output contains only the XML nodes in the tree, not the XML declaration with version and encoding.

$ python ElementTree_create.py This child contains text.This child has regular text.A nd "tail" text.This & that 

Pretty-Printing XML¶

ElementTree makes no effort to “pretty print” the output produced by tostring() , since adding extra whitespace changes the contents of the document. To make the output easier to follow for human readers, the rest of the examples below will use a tip I found online and re-parse the XML with xml.dom.minidom then use its toprettyxml() method.

from xml.etree import ElementTree from xml.dom import minidom def prettify(elem): """Return a pretty-printed XML string for the Element. """ rough_string = ElementTree.tostring(elem, 'utf-8') reparsed = minidom.parseString(rough_string) return reparsed.toprettyxml(indent=" ") 

The updated example now looks like:

from xml.etree.ElementTree import Element, SubElement, Comment from ElementTree_pretty import prettify top = Element('top') comment = Comment('Generated for PyMOTW') top.append(comment) child = SubElement(top, 'child') child.text = 'This child contains text.' child_with_tail = SubElement(top, 'child_with_tail') child_with_tail.text = 'This child has regular text.' child_with_tail.tail = 'And "tail" text.' child_with_entity_ref = SubElement(top, 'child_with_entity_ref') child_with_entity_ref.text = 'This & that' print prettify(top) 

and the output is easier to read:

$ python ElementTree_create_pretty.py    This child contains text. This child has regular text. And "tail" text. This & that  

In addition to the extra whitespace for formatting, the xml.dom.minidom pretty-printer also adds an XML declaration to the output.

Setting Element Properties¶

The previous example created nodes with tags and text content, but did not set any attributes of the nodes. Many of the examples from Parsing XML Documents worked with an OPML file listing podcasts and their feeds. The outline nodes in the tree used attributes for the group names and podcast properties. ElementTree can be used to construct a similar XML file from a CSV input file, setting all of the element attributes as the tree is constructed.

import csv from xml.etree.ElementTree import Element, SubElement, Comment, tostring import datetime from ElementTree_pretty import prettify generated_on = str(datetime.datetime.now()) # Configure one attribute with set() root = Element('opml') root.set('version', '1.0') root.append(Comment('Generated by ElementTree_csv_to_xml.py for PyMOTW')) head = SubElement(root, 'head') title = SubElement(head, 'title') title.text = 'My Podcasts' dc = SubElement(head, 'dateCreated') dc.text = generated_on dm = SubElement(head, 'dateModified') dm.text = generated_on body = SubElement(root, 'body') with open('podcasts.csv', 'rt') as f: current_group = None reader = csv.reader(f) for row in reader: group_name, podcast_name, xml_url, html_url = row if current_group is None or group_name != current_group.text: # Start a new group current_group = SubElement(body, 'outline', 'text':group_name>) # Add this podcast to the group, # setting all of its attributes at # once. podcast = SubElement(current_group, 'outline', 'text':podcast_name, 'xmlUrl':xml_url, 'htmlUrl':html_url, >) print prettify(root) 

The attribute values can be configured one at a time with set() (as with the root node), or all at once by passing a dictionary to the node factory (as with each group and podcast node).

$ python ElementTree_csv_to_xml.py      2013-02-21 06:38:01.494066 2013-02-21 06:38:01.494066                               

Building Trees from Lists of Nodes¶

Multiple children can be added to an Element instance with the extend() method. The argument to extend() is any iterable, including a list or another Element instance.

from xml.etree.ElementTree import Element, tostring from ElementTree_pretty import prettify top = Element('top') children = [ Element('child', num=str(i)) for i in xrange(3) ] top.extend(children) print prettify(top) 

When a list is given, the nodes in the list are added directly to the new parent.

$ python ElementTree_extend.py      

When another Element instance is given, the children of that node are added to the new parent.

from xml.etree.ElementTree import Element, SubElement, tostring, XML from ElementTree_pretty import prettify top = Element('top') parent = SubElement(top, 'parent') children = XML(''' ''') parent.extend(children) print prettify(top) 

In this case, the node with tag root created by parsing the XML string has three children, which are added to the parent node. The root node is not part of the output tree.

$ python ElementTree_extend_node.py        

It is important to understand that extend() does not modify any existing parent-child relationships with the nodes. If the values passed to extend exist somewhere in the tree already, they will still be there, and will be repeated in the output.

from xml.etree.ElementTree import Element, SubElement, tostring, XML from ElementTree_pretty import prettify top = Element('top') parent_a = SubElement(top, 'parent', id='A') parent_b = SubElement(top, 'parent', id='B') # Create children children = XML(''' ''') # Set the id to the Python object id of the node to make duplicates # easier to spot. for c in children: c.set('id', str(id(c))) # Add to first parent parent_a.extend(children) print 'A:' print prettify(top) print # Copy nodes to second parent parent_b.extend(children) print 'B:' print prettify(top) print 

Setting the id attribute of these children to the Python unique object identifier exposes the fact that the same node objects appear in the output tree more than once.

$ python ElementTree_extend_node_copy.py A:               

Serializing XML to a Stream¶

tostring() is implemented to write to an in-memory file-like object and then return a string representing the entire element tree. When working with large amounts of data, it will take less memory and make more efficient use of the I/O libraries to write directly to a file handle using the write() method of ElementTree .

import sys from xml.etree.ElementTree import Element, SubElement, Comment, ElementTree top = Element('top') comment = Comment('Generated for PyMOTW') top.append(comment) child = SubElement(top, 'child') child.text = 'This child contains text.' child_with_tail = SubElement(top, 'child_with_tail') child_with_tail.text = 'This child has regular text.' child_with_tail.tail = 'And "tail" text.' child_with_entity_ref = SubElement(top, 'child_with_entity_ref') child_with_entity_ref.text = 'This & that' empty_child = SubElement(top, 'empty_child') ElementTree(top).write(sys.stdout) 

The example uses sys.stdout to write to the console, but it could also write to an open file or socket.

$ python ElementTree_write.py This child contains text.This child has regular text.A nd "tail" text.This & that 

The last node in the tree contains no text or sub-nodes, so it is written as an empty tag, . write() takes a method argument to control the handling for empty nodes.

import sys from xml.etree.ElementTree import Element, SubElement, ElementTree top = Element('top') child = SubElement(top, 'child') child.text = 'This child contains text.' empty_child = SubElement(top, 'empty_child') for method in [ 'xml', 'html', 'text' ]: print method ElementTree(top).write(sys.stdout, method=method) print '\n' 

Three methods are supported:

xml The default method, produces . html Produce the tag pair, as is required in HTML documents ( ). text Prints only the text of nodes, and skips empty tags entirely.

$ python ElementTree_write_method.py xml This child contains text. html This child contains text.  text This child contains text.

Outline Processor Markup Language, OPML Dave Winer’s OPML specification and documentation.

© Copyright Doug Hellmann. | | Last updated on Jul 11, 2020. | Created using Sphinx. | Design based on «Leaves» by SmallPark |

Источник

Оцените статью