XPath is a query language that operates on XML documents and offers a reasonably succinct way to find XML nodes. Unfortunately, XPath string literals have an unsophisticated syntax, so there’s a little extra work to be done to handle strings safely. I’ve released an xpath-utils library for Java to do this robustly.
The simple XPath
//div will find a
div anywhere in the document. You can also use attributes in your XPath. If you have XML like this
<foo> <bar attr="baz">asdf</bar> <bar attr="quux"></bar> </foo>
then you could find the second
bar tag with
XPath has lots of tutorials, so check those out if you’re curious.
Strings in XPath
XPath is useful when writing Selenium tests. Even though many Selenium selectors are better done with CSS selectors, XPath can express things that CSS cannot, so it’s useful to know how to use it. One common operation in Selenium is using
text() to match against text node contents. You could match the first
bar node in the above xml with
//bar. However, to match against arbitrary strings, it’s important to be able to safely handle them, just like proper XML escaping is important when generating an XML document.
Strings are very limited in XPath. String literals can either be single-quoted strings that do not contain single quotes, or double quoted strings that do not contain double quotes. To be more specific, the spec uses this grammar:
'"' [^"]* '"' | "'" [^']* "'"
This means that we can represent
It's-Its are delicious with the string literal
"It's-Its are delicious" because the original string does not contain a
". Similarly, we can represent
"To be or not to be" as the literal
'"To be or not to be"'.
Unfortunately, when a string contains both
", it gets a little messy. To represent
"I'm hungry", we have to split the string on either
" and use
concat() to stitch the string back together into an XPath expression that doesn’t need to use both types of quotes in each string literal. If we split on
', we get
concat('"I', "'", 'm hungry"')
and if we split on
" we get
concat('"', "I'm hungry", '"')
You can use xpath-utils to do this
concat-ification for you. The
XPathUtils class in that library has some convenience methods, the most important of which is
getXPathString. It takes a string input and returns an XPath string literal or expression as needed.
// XPath string literal: "foo ' bar" String simpleCase = XPathUtils.getXPathString("foo ' bar"); // XPath expression: concat('""foo"', "'", '"bar""') String complexCase = XPathUtils.getXPathString("\"\"foo\"'\"bar\"\"");
Another use of this technique beyond
text() matching is matching CSS classes. It’s somewhat awkward to do so in XPath, but it exhibits how to correctly use safe string handling. If we have the following markup
<div id="content"> <ul> <li class="thing"></li> <li class="thing selected"></li> <li class="thing"></li> </ul> </div>
and we want to find the
li with the
selected class, we can’t just do
//li[@class='selected'] because the
class attribute isn’t an exact string match for the XPath string literal
'selected'. (Of course, the
li.selected CSS selector would work fine here!) Instead, we can use
concat() and friends to handle the case where the target class isn’t the only class on the node:
//li[contains(concat(' ', normalize-space(@class), ' '), ' selected ')]
It’d be nice if we could safely construct XPath even if we tried to use a CSS class that had quotes in it. A class like
crazy"'class won’t match any HTML nodes, but that’s better than throwing an exception because our XPath statement didn’t parse! We can use another XPathUtils method,
hasCssClass, to automatically generate the XPath boilerplate:
// contains(concat(' ', normalize-space(@class), ' '), concat(' crazy"', "'", 'class ')) String classXpath = XPathUtils.hasCssClass("crazy\"'class");
Of course, it also works fine on simple cases:
// contains(concat(' ', normalize-space(@class), ' '), ' selected ') String classXpath = XPathUtils.hasCssClass("selected");
If you’re a Maven user, this is the dependency statement for the library.
<dependency> <groupId>com.palominolabs.xpath</groupId> <artifactId>xpath-utils</artifactId> <version>1.0.1</version> </dependency>
Since this was just released, it may take a day or two for this to propagate to your Maven mirror.