Xpath with namespaces in java

Содержание

How to query XML using namespaces in Java with XPath?
Java XPath NamespaceContext – NameSpace Resolution Example
Практический пример применения XPath для получения данных из XML c namespace-ами на Java

How to query XML using namespaces in Java with XPath?

All namespaces that you intend to select from in the source XML must be associated with a prefix in the host language. In Java/JAXP this is done by specifying the URI for each namespace prefix using an instance of javax.xml.namespace.NamespaceContext . Unfortunately, there is no implementation of NamespaceContext provided in the SDK.

Fortunately, it’s very easy to write your own:

import java.util.HashMap; import java.util.Iterator; import java.util.Map; import javax.xml.namespace.NamespaceContext; public class SimpleNamespaceContext implements NamespaceContext < private final MapPREF_MAP = new HashMap(); public SimpleNamespaceContext(final Map prefMap) < PREF_MAP.putAll(prefMap); >public String getNamespaceURI(String prefix) < return PREF_MAP.get(prefix); >public String getPrefix(String uri) < throw new UnsupportedOperationException(); >public Iterator getPrefixes(String uri) < throw new UnsupportedOperationException(); >>

XPathFactory factory = XPathFactory.newInstance(); XPath xpath = factory.newXPath(); HashMap prefMap = new HashMap() >; SimpleNamespaceContext namespaces = new SimpleNamespaceContext(prefMap); xpath.setNamespaceContext(namespaces); XPathExpression expr = xpath .compile("/main:workbook/main:sheets/main:sheet[1]"); Object result = expr.evaluate(doc, XPathConstants.NODESET);

Note that even though the first namespace does not specify a prefix in the source document (i.e. it is the default namespace) you must associate it with a prefix anyway. Your expression should then reference nodes in that namespace using the prefix you’ve chosen, like this:

/main:workbook/main:sheets/main:sheet[1]

The prefix names you choose to associate with each namespace are arbitrary; they do not need to match what appears in the source XML. This mapping is just a way to tell the XPath engine that a given prefix name in an expression correlates with a specific namespace in the source document.

In the second example XML file the elements are bound to a namespace. Your XPath is attempting to address elements that are bound to the default «no namespace» namespace, so they don’t match.

The preferred method is to register the namespace with a namespace-prefix. It makes your XPath much easier to develop, read, and maintain.

However, it is not mandatory that you register the namespace and use the namespace-prefix in your XPath.

You can formulate an XPath expression that uses a generic match for an element and a predicate filter that restricts the match for the desired local-name() and the namespace-uri() . For example:

/*[local-name()='workbook' and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main'] /*[local-name()='sheets' and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main'] /*[local-name()='sheet' and namespace-uri()='http://schemas.openxmlformats.org/spreadsheetml/2006/main'][1]

As you can see, it produces an extremely long and verbose XPath statement that is very difficult to read (and maintain).

You could also just match on the local-name() of the element and ignore the namespace. For example:

/*[local-name()='workbook']/*[local-name()='sheets']/*[local-name()='sheet'][1]

However, you run the risk of matching the wrong elements. If your XML has mixed vocabularies (which may not be an issue for this instance) that use the same local-name() , your XPath could match on the wrong elements and select the wrong content:

Your problem is the default namespace. Check out this article for how to deal with namespaces in your XPath: http://www.edankert.com/defaultnamespaces.html

One of the conclusions they draw is:

So, to be able to use XPath expressions on XML content defined in a (default) namespace, we need to specify a namespace prefix mapping

Note that this doesn’t mean that you have to change your source document in any way (though you’re free to put the namespace prefixes in there if you so desire). Sounds strange, right? What you will do is create a namespace prefix mapping in your java code and use said prefix in your XPath expression. Here, we’ll create a mapping from spreadsheet to your default namespace.

XPathFactory factory = XPathFactory.newInstance(); XPath xpath = factory.newXPath(); // there's no default implementation for NamespaceContext. seems kind of silly, no? xpath.setNamespaceContext(new NamespaceContext() < public String getNamespaceURI(String prefix) < if (prefix == null) throw new NullPointerException("Null prefix"); else if ("spreadsheet".equals(prefix)) return "http://schemas.openxmlformats.org/spreadsheetml/2006/main"; else if ("xml".equals(prefix)) return XMLConstants.XML_NS_URI; return XMLConstants.NULL_NS_URI; >// This method isn't necessary for XPath processing. public String getPrefix(String uri) < throw new UnsupportedOperationException(); >// This method isn't necessary for XPath processing either. public Iterator getPrefixes(String uri) < throw new UnsupportedOperationException(); >>); // note that all the elements in the expression are prefixed with our namespace mapping! XPathExpression expr = xpath.compile("/spreadsheet:workbook/spreadsheet:sheets/spreadsheet:sheet[1]"); // assuming you've got your XML document in a variable named doc. Node result = (Node) expr.evaluate(doc, XPathConstants.NODE);

And voila. Now you’ve got your element saved in the result variable.

Caveat: if you’re parsing your XML as a DOM with the standard JAXP classes, be sure to call setNamespaceAware(true) on your DocumentBuilderFactory . Otherwise, this code won’t work!

Источник

Java XPath NamespaceContext – NameSpace Resolution Example

In this java example, we will learn XPath namespace resolution into an XML file using NamespaceContext which has namespace declarations and respective usages.

We have created sample.xml file and put it on the classpath for demo purpose.

  Data Structure  Java Core

2. Implement NamespaceContext to Create NameSpace Resolver

This namespace resolver can be used with any XML file where namespace definitions have been used. It searches for namespace declarations for any given namespace prefix – passed as a parameter – inside the XML document itself. So no need to create namespace mapping separately.

public class NamespaceResolver implements NamespaceContext < //Store the source document to search the namespaces private Document sourceDocument; public NamespaceResolver(Document document) < sourceDocument = document; >//The lookup for the namespace uris is delegated to the stored document. public String getNamespaceURI(String prefix) < if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) < return sourceDocument.lookupNamespaceURI(null); >else < return sourceDocument.lookupNamespaceURI(prefix); >> public String getPrefix(String namespaceURI) < return sourceDocument.lookupPrefix(namespaceURI); >@SuppressWarnings("rawtypes") public Iterator getPrefixes(String namespaceURI) < return null; >>

3. Using NamespaceResolver and Applying XPath

Now we are ready to apply the xpath expression over the XML file.

//Want to read all book names from XML ArrayList bookNames = new ArrayList(); //Parse XML file DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(new FileInputStream(new File("sample.xml"))); //Get XPath expression XPathFactory xpathfactory = XPathFactory.newInstance(); XPath xpath = xpathfactory.newXPath(); xpath.setNamespaceContext(new NamespaceResolver(doc)); XPathExpression expr = xpath.compile("//ns2:bookStore/ns2:book/ns2:name/text()"); //Search XPath expression Object result = expr.evaluate(doc, XPathConstants.NODESET); //Iterate over results and fetch book names NodeList nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) < bookNames.add(nodes.item(i).getNodeValue()); >//Verify book names System.out.println(bookNames);

4. Complete Source Code for XPath Namespace Resolution

Читайте также: Java double to fixed

This is the complete source code of the above example.

import java.io.File; import java.io.FileInputStream; import java.util.ArrayList; import java.util.Iterator; import javax.xml.XMLConstants; import javax.xml.namespace.NamespaceContext; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import javax.xml.xpath.XPath; import javax.xml.xpath.XPathConstants; import javax.xml.xpath.XPathExpression; import javax.xml.xpath.XPathFactory; import org.w3c.dom.Document; import org.w3c.dom.NodeList; public class Main < public static void main(String[] args) throws Exception < //Want to read all book names from XML ArrayListbookNames = new ArrayList(); //Parse XML file DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance(); factory.setNamespaceAware(true); DocumentBuilder builder = factory.newDocumentBuilder(); Document doc = builder.parse(new FileInputStream(new File("sample.xml"))); //Get XPath expression XPathFactory xpathfactory = XPathFactory.newInstance(); XPath xpath = xpathfactory.newXPath(); xpath.setNamespaceContext(new NamespaceResolver(doc)); XPathExpression expr = xpath.compile("//ns2:bookStore/ns2:book/ns2:name/text()"); //Search XPath expression Object result = expr.evaluate(doc, XPathConstants.NODESET); //Iterate over results and fetch book names NodeList nodes = (NodeList) result; for (int i = 0; i < nodes.getLength(); i++) < bookNames.add(nodes.item(i).getNodeValue()); >//Verify book names System.out.println(bookNames); > > class NamespaceResolver implements NamespaceContext < //Store the source document to search the namespaces private Document sourceDocument; public NamespaceResolver(Document document) < sourceDocument = document; >//The lookup for the namespace uris is delegated to the stored document. public String getNamespaceURI(String prefix) < if (prefix.equals(XMLConstants.DEFAULT_NS_PREFIX)) < return sourceDocument.lookupNamespaceURI(null); >else < return sourceDocument.lookupNamespaceURI(prefix); >> public String getPrefix(String namespaceURI) < return sourceDocument.lookupPrefix(namespaceURI); >@SuppressWarnings("rawtypes") public Iterator getPrefixes(String namespaceURI) < return null; >>

Let me know of your questions in the comments section.

Источник

Практический пример применения XPath для получения данных из XML c namespace-ами на Java

Я однажды переводил описание XPath-синтаксиса c w3schools. Этот пост — практическое применение XPath на Java. Допустим, есть какой-нибудь более-менее сложный XML документ:

   New York Los Angeles 2001-12-14 late afternoon aisle  Los Angeles New York 2001-12-20 mid-morning

Документ сложен тем, что в нём присутствуют сразу 3 namespace-а:

xmlns:p=»http://travelcompany.example.org/reservation/travel»
xmlns:x=»http:namespace»
и default

Нужно к примеру получить из этого документа поле departing. В случае простого документа без namespace-ов, выражение было бы простым:

Результат такого запроса был бы «New York».

На Java код для работы такой:

Document doc = XmlUtil.fromXML(TEST_XML); final String xpathStr = "//itinerary/departure/departing"; final XPathFactory xpathFact = XPathFactory.newInstance(); final XPath xpath = xpathFact.newXPath(); String result = xpath.evaluate(xpathStr, doc);

(код класса XmlUtil можно взять, например здесь)

Если бы не namespace-ы, всё было бы хорошо. Но при их наличии, как в нашем примере, писать нужно выражение с namespace-ами:

В данном выражении «a» и «b» — префиксы. Заметьте, они не обязательно должны совпадать с теми, которые приходят в документе. Главное, что itinerary и departure имеют одинаковый префикс, а departing — другой.

Чтобы xpath выражение могло правильно вычислиться с namespace-ами, нужно создать класс типа NameSpaceContext, реализующий метод getNamespaceURI(), возвращающий по заданному префиксу, значение соответствующего ему namespace-а. Ключевой момент: здесь нужно соответствие не реального префикса namespace-у из документа, а соответствие тех самых абстрактных namespace-ов «a» и «b» указанных в запросе. Ниже пример получения данных из приведенного документа c namespace-ами:

/** */ public class XPathTest < public static final String TEST_XML2= "\n" + "\n" + " \n" + " New York\n" + " Los Angeles\n" + " 2001-12-14\n" + " late afternoon\n" + " aisle\n" + " \n" + " \n" + " Los Angeles\n" + " New York\n" + " 2001-12-20\n" + " mid-morning\n" + " \n" + " \n" + ""; @Test public void test() throws Exception, IOException < Document doc = XmlUtil.fromXML(TEST_XML2); String xpathStr = "//a:itinerary/a:departure/b:departing"; XPathFactory xpathFact = XPathFactory.newInstance(); XPath xpath = xpathFact.newXPath(); xpath.setNamespaceContext(new NamespaceContext() < @Override public String getNamespaceURI(String prefix) < if ("a".equals(prefix)) < return "http://travelcompany.example.org/reservation/travel"; >else if ("b".equals(prefix)) < return "http:namespace"; >else return "?"; > @Override public String getPrefix(String namespaceURI) < return null; >@Override public Iterator getPrefixes(String namespaceURI) < return null; >>); String result = xpath.evaluate(xpathStr, doc); System.out.println("result:" + result); > >

В вышеприведенном примере та самая переопределённая функция:

@Override public String getNamespaceURI(String prefix) < if ("a".equals(prefix)) < return "http://travelcompany.example.org/reservation/travel"; >else if ("b".equals(prefix)) < return "http:namespace"; >else return "?"; >

Для случая неизвестного namespace — возвращаю просто вопросик. Такого namespace-а в тестовом документе нет.

Приведённый пример, объясняет применение XPath с namespace-ами. Однако, такой код может показаться слишком длинным. Каждый раз при запросах создавать в рукопашную новый переопределённый интерфейс NamespaceContext может показаться трудоёмким.

Чтобы упростить код XPath запросов предлагаю воспользоваться подходом, уже описанным в посте Удобный билдер объектов на Java. Я не буду вдаваться в детали. Просто приведу юнит тест:

Document doc = XmlUtil.fromXML(TEST_XML2); String xpathRequest = "//a:itinerary/a:departure/arriving"; String result = new XPathBuilder(doc).evaluateString(xpathRequest).withNamespaces( "a", "http://travelcompany.example.org/reservation/travel", "b", "http:namespace" ); System.out.println(result);

В данном случае я получаю значение поля «arriving». Output: «Los Angeles»

Всё получение данных производится по-сути одной командой:

String result = new XPathBuilder(doc).evaluateString(xpathRequest).withNamespaces(
«a», «http://travelcompany.example.org/reservation/travel»,
«b», «http:namespace»
);

doc — входной документ
xpathRequest — строка xpath запроса
withNamespaces … список пар префикс — namespace

import org.w3c.dom.Document; import javax.xml.namespace.NamespaceContext; import javax.xml.xpath.XPathExpressionException; import javax.xml.xpath.XPathFactory; import java.util.HashMap; import java.util.Iterator; import java.util.Map; /** */ public class XPathBuilder < private final Document document; public XPathBuilder(Document document) < this.document = document; >public XPathStringBuilder evaluateString(String request) < return new XPathStringBuilder(document, request); >public static class XPathStringBuilder < private Document document; private String xpathStr; private NamespaceContext namespaceContext; private XPathStringBuilder(Document document, String xpath) < this.document = document; this.xpathStr = xpath; this.namespaceContext = null; >public String withNamespaces(Object . objects) throws XPathExpressionException < Mapmap = new HashMap(); if ((objects.length & 1)!=0) < throw new IllegalArgumentException( "Supplied odd number of arguments to namespaces method: " + objects.length ); >for(int i=0; i namespaceContext = new XPathNameSpaceContext(map); return value(); > public String value() throws XPathExpressionException < XPathFactory xpathFact = XPathFactory.newInstance(); javax.xml.xpath.XPath xpath = xpathFact.newXPath(); if (namespaceContext != null) < xpath.setNamespaceContext(namespaceContext); >return xpath.evaluate(xpathStr, document); > private static class XPathNameSpaceContext implements NamespaceContext < private Mapnamespaces; XPathNameSpaceContext(final Map namespaces) < this.namespaces = namespaces; >@Override public String getNamespaceURI(String prefix) < String namespace = namespaces.get(prefix); if(namespace == null) < namespace="?"; >return namespace; > @Override public String getPrefix(String namespaceURI) < return null; >@Override public Iterator getPrefixes(String namespaceURI) < return null; >> > >

В приведенном классе, функция evaluateString() возвращает промежуточный объект XPathStringBuilder, определяющий тип значения который мы хотим вычислить с помощью XPath. Если потребуется, можно реализовать дополнительные удобные функции для вычисления результатов других типов. Но это уже другая история… Класс XmlUtil можно взять из поста Получаем Body из SOAP сообщения Дополнительные примеры с XPath можно найти здесь

Источник