Reading utf 8 xml in java

How to read and write XML using Java

In my previous articles, I covered how to read and write JSON in Java as well as in Spring Boot. In this article, you’ll learn how to read and write XML using different Java APIs.

Let us first look at an XML document and how it is structured.

An XML document consists of elements (also known as tags) similar to HTML. Each element has an opening and a closing tag along with the content. Every XML must have exactly one root element — one tag that wraps the remaining tags. Tag names are can-sensitive, which means XML differentiates between capital and non-capital letters. Each element can have any number of nested child elements. Unlike HTML, XML doesn’t have a pre-defined set of tags. This gives complete freedom to developers to define any tag they want to use in the document. A valid XML file is well-formed and must contain a link to an XML schema. Let us look at the below XML document that contains user information: user.xml

 user id="1"> name>John Doename> email>john.doe@example.comemail> roles> role>Memberrole> role>Adminrole> roles> admin>trueadmin> user> 

As you can see above, user.xml file starts with known as XML prolog. Another important thing to notice is that each element is wrapped in its own tag e.g. John Deo . Since roles is an array, we have to specify each array element using the nested role tag.

Before we discuss marshalling and unmarshalling in detail, let us first create a simple Java class named User.java that represents a user described in the above user.xml file:

@XmlRootElement public class User  private int id; private String name; private String email; private String[] roles; private boolean admin; public User()  > public User(int id, String name, String email, String[] roles, boolean admin)  this.id = id; this.name = name; this.email = email; this.roles = roles; this.admin = admin; > public int getId()  return id; > @XmlAttribute public void setId(int id)  this.id = id; > public String getName()  return name; > @XmlElement public void setName(String name)  this.name = name; > public String getEmail()  return email; > @XmlElement public void setEmail(String email)  this.email = email; > public String[] getRoles()  return roles; > @XmlElementWrapper(name = "roles") @XmlElement(name = "role") public void setRoles(String[] roles)  this.roles = roles; > public boolean isAdmin()  return admin; > @XmlElement public void setAdmin(boolean admin)  this.admin = admin; > @Override public String toString()  return "User + "id=" + id + ", name='" + name + '\'' + ", email='" + email + '\'' + ", roles=" + Arrays.toString(roles) + ", admin=" + admin + '>'; > > 

As you can see above, we have annotated the class attributes with different JAXB annotations. These annotations serve a specific purpose while converting a Java object to and from XML.

  • @XmlRootElement — This annotation is used to specify the root element of the XML document. It maps a class or an enum type to an XML element. By default, it uses the class name or enum as the root element name. However, you can customize the name by explicitly setting the name attribute i.e. @XmlRootElement(name = «person») .
  • @XmlAttribute — This annotation maps a Java object property to an XML element derived from the property name. To specify a different XML property name, you can pass the name parameter to the annotation declaration.
  • @XmlElement — This annotation maps a Java object property to an XML element derived from the property name. The name of the XML element mapped can be customized by using the name parameter.
  • @XmlElementWrapper — This annotation generates a wrapper element around the XML representation, an array of String in our case. You must explicitly specify elements of the collection by using the @XmlElement annotation.

Marshalling in JAXB refers to converting a Java object to an XML document. JAXB provides the Marshaller class for this purpose. All you need to do is create a new instance of JAXBContext by calling the newInstance() static method with a reference to the User class. You can then call the createUnmarshaller() method to create an instance of Marshaller . The Marshaller class provides several marshal() overloaded methods to turn a Java object into a file, an output stream, or output directly to the console. Here is an example that demonstrates how to convert a User object into an XML document called user2.xml :

try  // create XML file File file = new File("user2.xml"); // create an instance of `JAXBContext` JAXBContext context = JAXBContext.newInstance(User.class); // create an instance of `Marshaller` Marshaller marshaller = context.createMarshaller(); // enable pretty-print XML output marshaller.setProperty(Marshaller.JAXB_FORMATTED_OUTPUT, true); // create user object User user = new User(2, "Tom Deo", "tom.doe@example.com", new String[]"Member", "Moderator">, false); // convert user object to XML file marshaller.marshal(user, file); > catch (JAXBException ex)  ex.printStackTrace(); > 

Now if you run the above code, you should see an XML file called user2.xml created in the root directory with the following contents:

 user id="2"> admin>falseadmin> email>tom.doe@example.comemail> name>Tom Deoname> roles> role>Memberrole> role>Moderatorrole> roles> user> 

The Marshall class also provides an overloaded method to output the generated XML document on the console as shown below:

// print XML to console marshaller.marshal(user, System.out); 

Unmarshalling is very much similar to the marshalling process we discussed above. Except that, this time, we will be using the Unmarshaller class to convert an XML document to a Java object. The following example demonstrates the JAXB’s ability to read the above user.xml XML file to create a User object:

try  // XML file path File file = new File("user.xml"); // create an instance of `JAXBContext` JAXBContext context = JAXBContext.newInstance(User.class); // create an instance of `Unmarshaller` Unmarshaller unmarshaller = context.createUnmarshaller(); // convert XML file to user object User user = (User) unmarshaller.unmarshal(file); // print user object System.out.println(user); > catch (JAXBException ex)  ex.printStackTrace(); > 
Userid=1, name='John Doe', email='john.doe@example.com', roles=[Member, Admin], admin=true> 

By default, the unmarshal() method returns an object. So we have to explicitly typecast it to the correct type ( User in our case). There are several other unmarshal() overloaded methods provided by Unmarshaller that you can use to read an XML document from different sources like a URL, a reader, or a writer.

  • Node — The base datatype of the DOM.
  • Element — Represents an individual element in the DOM.
  • Attr — Represents an attribute of an element.
  • Text — The actual content of an Element or Attr .
  • Document — Represents the entire XML document. A Document object is often referred to as a DOM tree.

To create an XML file using the DOM parser, you first create an instance of the Document class using DocumentBuilder . Then define all the XML content — elements, attributes, values — with Element and Attr classes. In the end, you use the Transformer class to output the entire XML document to an output stream, usually a file or a string. Here is an example that creates a simple XML file using the DOM parser:

try  // create new `Document` DocumentBuilder builder = DocumentBuilderFactory.newInstance() .newDocumentBuilder(); Document dom = builder.newDocument(); // first create root element Element root = dom.createElement("user"); dom.appendChild(root); // set `id` attribute to root element Attr attr = dom.createAttribute("id"); attr.setValue("1"); root.setAttributeNode(attr); // now create child elements (name, email, phone) Element name = dom.createElement("name"); name.setTextContent("John Deo"); Element email = dom.createElement("email"); email.setTextContent("john.doe@example.com"); Element phone = dom.createElement("phone"); phone.setTextContent("800 456-4578"); // add child nodes to root node root.appendChild(name); root.appendChild(email); root.appendChild(phone); // write DOM to XML file Transformer tr = TransformerFactory.newInstance().newTransformer(); tr.setOutputProperty(OutputKeys.INDENT, "yes"); tr.transform(new DOMSource(dom), new StreamResult(new File("file.xml"))); > catch (Exception ex)  ex.printStackTrace(); > 

Now, if you execute the above code, you’d see the following file.xml file created with default UTF-8 encoded:

 user id="1"> name>John Deoname> email>john.doe@example.comemail> phone>800 456-4578phone> user> 

If you want to output the XML document to the console, just pass StreamResult with System.out as an argument, as shown below:

// output XML document to console tr.transform(new DOMSource(dom), new StreamResult(System.out)); 

A DOM parser can also read and parse an XML file in Java. By default, the DOM parser reads the entire XML file into memory; then parses it to create a tree structure for easy traversal or manipulation. Let us look at the below example that reads and parses the XML file, we have just created above, using DOM XML parser:

try  // parse XML file to build DOM DocumentBuilder builder = DocumentBuilderFactory.newInstance() .newDocumentBuilder(); Document dom = builder.parse(new File("file.xml")); // normalize XML structure dom.normalizeDocument(); // get root element Element root = dom.getDocumentElement(); // print attributes System.out.println("ID: " + root.getAttribute("id")); // print elements System.out.println("Name: " + root.getElementsByTagName("name").item(0).getTextContent()); System.out.println("Email: " + root.getElementsByTagName("email").item(0).getTextContent()); System.out.println("Phone: " + root.getElementsByTagName("phone").item(0).getTextContent()); > catch (Exception ex)  ex.printStackTrace(); > 
ID: 1 Name: John Deo Email: john.doe@example.com Phone: 800 456-4578 

Note: DOM Parser is best for reading and parsing small XML files as it loads the whole file into the memory. For larger XML files that contain a lot of data, you should consider using the SAX (Simple API for XML) parser. SAX doesn’t load the entire file into memory, which makes it faster than the DOM parser.

Although XML is not widely used as a data exchange format in modern systems, it is still used by old services on the web as a primary source of data exchange. This is also true for many file formats that store data in XML-formatted files. Java provides multiple ways to read and write XML files. In this article, we looked at JAXB and DOM parsers for reading and writing XML data to and from a file. JAXB is a modern replacement for old XML parsers like DOM and SAX. It provides methods to read and write Java objects to and from a file. We can easily define the relationship between XML elements and object attributes using JAXB annotations. If you want to read and write JSON files, check out how to read and write JSON in Java guide for JSON file read and write examples. ✌️ Like this article? Follow me on Twitter and LinkedIn. You can also subscribe to RSS Feed.

You might also like.

Источник

Read UTF-8 XML File in Java using SAX parser example

In the previous SAX parser tutorial we saw how to parse and read a simple XML File. If your file had UTF-8 encoding, there is a chance that the client produced a MalformedByteSequenceException . In order to solve this you have to set the InputSource encoding to UTF-8.

You can do this with the following code :

InputStream inputStream= new FileInputStream(xmlFile); InputStreamReader inputReader = new InputStreamReader(inputStream,"UTF-8"); InputSource inputSource = new InputSource(inputReader); InputSource.setEncoding("UTF-8");

Here is the XML File we are going to use for our demo. We have the special UTF-8 character ©.

  Jeremy Harley james@example.org Human Resources 2000000 
34 Stanley St.©
John May john@example.org Logistics 400
123 Stanley St.
package com.javacodegeeks.java.core; import org.xml.sax.Attributes; import org.xml.sax.SAXException; import org.xml.sax.helpers.DefaultHandler; public class MyHandler extends DefaultHandler < boolean tagFname = false; boolean tagLname = false; boolean tagEmail = false; boolean tagDep = false; boolean tagSalary = false; boolean tagAddress = false; public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException < if (attributes.getLength() >0) < String tag = "tag += ">"; System.out.println(tag); > else < System.out.println(""); > if (qName.equalsIgnoreCase("firstname")) < tagFname = true; >if (qName.equalsIgnoreCase("lastname")) < tagLname = true; >if (qName.equalsIgnoreCase("email")) < tagEmail = true; >if (qName.equalsIgnoreCase("department")) < tagDep = true; >if (qName.equalsIgnoreCase("salary")) < tagSalary = true; >if (qName.equalsIgnoreCase("address")) < tagAddress = true; >> public void characters(char ch[], int start, int length) throws SAXException < if (tagFname) < System.out.println(new String(ch, start, length)); tagFname = false; >if (tagLname) < System.out.println(new String(ch, start, length)); tagLname = false; >if (tagEmail) < System.out.println(new String(ch, start, length)); tagEmail = false; >if (tagDep) < System.out.println(new String(ch, start, length)); tagDep = false; >if (tagSalary) < System.out.println(new String(ch, start, length)); tagSalary = false; >if (tagAddress) < System.out.println(new String(ch, start, length)); tagAddress = false; >> public void endElement(String uri, String localName, String qName) throws SAXException < System.out.println(""); > >
package com.javacodegeeks.java.core; import java.io.File; import java.io.FileInputStream; import java.io.InputStream; import java.io.InputStreamReader; import javax.xml.parsers.SAXParser; import javax.xml.parsers.SAXParserFactory; import org.xml.sax.InputSource; public class ParseUTF8XMLFileWithSAX < private static final String xmlFilePath = "C:\\Users\\nikos7\\Desktop\\filesForExamples\\testFile.xml"; public static void main(String argv[]) < try < SAXParserFactory factory = SAXParserFactory.newInstance(); SAXParser saxParser = factory.newSAXParser(); File xmlFile = new File(xmlFilePath); InputStream inputStream= new FileInputStream(xmlFile); InputStreamReader inputReader = new InputStreamReader(inputStream,"UTF-8"); InputSource inputSource = new InputSource(inputReader); inputSource.setEncoding("UTF-8"); saxParser.parse(inputSource, new MyHandler()); >catch (Exception e) < e.printStackTrace(); >> >

This was an example on how to read UTF-8 XML File in Java using SAX parser.

Источник

Читайте также:  Прозрачность в слое
Оцените статью