XPath Basics
XPath is a query language that operates on XML documents and offers a reasonably succinct way to find XML nodes. Unfortunately, XPath string literals have an unsophisticated syntax, so there’s a little extra work to be done to handle strings safely. I’ve released an xpath-utils library for Java to do this robustly.
The simple XPath //div
will find a div
anywhere in the document. You can also use attributes in your XPath. If you have XML like this
<foo> <bar attr="baz">asdf</bar> <bar attr="quux"></bar> </foo>
then you could find the second bar
tag with /foo/bar[@attr='quux']
.
XPath has lots of tutorials, so check those out if you’re curious.
Strings in XPath
XPath is useful when writing Selenium tests. Even though many Selenium selectors are better done with CSS selectors, XPath can express things that CSS cannot, so it’s useful to know how to use it. One common operation in Selenium is using text()
to match against text node contents. You could match the first bar
node in the above xml with //bar
. However, to match against arbitrary strings, it’s important to be able to safely handle them, just like proper XML escaping is important when generating an XML document.
Strings are very limited in XPath. String literals can either be single-quoted strings that do not contain single quotes, or double quoted strings that do not contain double quotes. To be more specific, the spec uses this grammar:
'"' [^"]* '"' | "'" [^']* "'"
This means that we can represent It's-Its are delicious
with the string literal "It's-Its are delicious"
because the original string does not contain a "
. Similarly, we can represent "To be or not to be"
as the literal '"To be or not to be"'
.
Unfortunately, when a string contains both '
and "
, it gets a little messy. To represent "I'm hungry"
, we have to split the string on either '
or "
and use concat()
to stitch the string back together into an XPath expression that doesn’t need to use both types of quotes in each string literal. If we split on '
, we get
concat('"I', "'", 'm hungry"')
and if we split on "
we get
concat('"', "I'm hungry", '"')
You can use xpath-utils to do this concat
-ification for you. The XPathUtils
class in that library has some convenience methods, the most important of which is getXPathString
. It takes a string input and returns an XPath string literal or expression as needed.
// XPath string literal: "foo ' bar" String simpleCase = XPathUtils.getXPathString("foo ' bar"); // XPath expression: concat('""foo"', "'", '"bar""') String complexCase = XPathUtils.getXPathString("\"\"foo\"'\"bar\"\"");
Another use of this technique beyond text()
matching is matching CSS classes. It’s somewhat awkward to do so in XPath, but it exhibits how to correctly use safe string handling. If we have the following markup
<div id="content"> <ul> <li class="thing"></li> <li class="thing selected"></li> <li class="thing"></li> </ul> </div>
and we want to find the li
with the selected
class, we can’t just do //li[@class='selected']
because the class
attribute isn’t an exact string match for the XPath string literal'selected'
. (Of course, the li.selected
CSS selector would work fine here!) Instead, we can use concat()
and friends to handle the case where the target class isn’t the only class on the node:
//li[contains(concat(' ', normalize-space(@class), ' '), ' selected ')]
It’d be nice if we could safely construct XPath even if we tried to use a CSS class that had quotes in it. A class like crazy"'class
won’t match any HTML nodes, but that’s better than throwing an exception because our XPath statement didn’t parse! We can use another XPathUtils method, hasCssClass
, to automatically generate the XPath boilerplate:
// contains(concat(' ', normalize-space(@class), ' '), concat(' crazy"', "'", 'class ')) String classXpath = XPathUtils.hasCssClass("crazy\"'class");
Of course, it also works fine on simple cases:
// contains(concat(' ', normalize-space(@class), ' '), ' selected ') String classXpath = XPathUtils.hasCssClass("selected");
If you’re a Maven user, this is the dependency statement for the library.
<dependency> <groupId>com.palominolabs.xpath</groupId> <artifactId>xpath-utils</artifactId> <version>1.0.1</version> </dependency>
Since this was just released, it may take a day or two for this to propagate to your Maven mirror.