Useful Tips

Examples of xpath html requests


People who got their hands dirty in automation with selenium are familiar with the pain of failing a test due to an incorrect element locator and because they did not know how to correctly get XPath, CSS paths. Firebug is the solution to these problems, and it will also speed up your robot over automation. This article describes how to get XPath and CSS paths using Firebug on any element on a web page. XPath is just a combination of HTML tags that identify a particular element of a web page.

Create a request to web page sites

I bring to your attention a small lab, during which I will demonstrate the creation of xpath requests for a web page. You will be able to repeat my requests and, most importantly, try to fulfill yours. I hope that this will make the article equally interesting for beginners and programmers familiar with xpath in xml.

For the laboratory, we need:
- xhtml web page,
- Mozilla Firefox browser with add-ons,
- firebug,
- firePath,
(you can use any other browser with visual xpath support)
- a little time.

As a web page for the experiment, I propose the main page of the World Wide Web Consortium website (''). It is this organization that develops the xquery (xpath) languages, the xhtml specification, and many other Internet standards.

Get information about consortium conferences using xpath requests from the xhtml code of the main page.
Let's start writing xpath requests.

First xpath request

Open the Firepath tab in FireBug, select the element for analysis with the selector, click: Firepath created an xpath request to the selected element.

If you select the header of the first event, then the request will be like this:

After deleting redundant indexes, the query will correspond to all elements of the "header" type.

Firepath highlights elements that match the query. You can see in real time which nodes of the document match the query.

Go ahead. We create queries to search for conference venues and their sponsors either using the selector or by modifying the first query.

Request for information on conference venues:
.//*[@] / ul / li / div / p

So we get the list of sponsors:
.//*[@] / ul / li / div / p

Xpath syntax

Let's go back to the created queries and see how they work.
Let's consider the first request in detail

In this query, I highlighted three parts to demonstrate the capabilities of xpath. (The division into parts is catchy)

First part
.// - recursive descent to zero or more hierarchy levels from the current context. In our case, the current context is the root of the document.

The second part of
* - any element
[@. XHTML element identifiers must be unique. Therefore, the query "any element with a specific ID" should return the only node we are looking for.

We can replace * to the exact node name div in this request
div [@]

Thus, we go down the document tree to the desired node div [@]. We are absolutely not worried about which nodes the DOM tree consists of and how many hierarchy levels remain above.

The third part
/ ul / li / div / p / a –Xpath-path to a specific element. The path consists of addressing steps and the conditions for checking nodes (ul, li, etc.). Steps are separated by a "/" (slash).

Xpath collections

It is not always possible to access a node of interest using a predicate or addressing steps. Very often at the same level of the hierarchy there are how many nodes of the same type and it is necessary to select “only the first” or “only the second” nodes. For such cases, collections are provided.

The xpath collections allow you to access an element by its index. Indexes correspond to the order in which the elements were presented in the original document. The serial number in the collections is counted from one.

Based on the fact that the “venue” is always the second paragraph after the “conference name”, we get the following query:
.//*[@] / ul / li / div / p
Where p is the second element in the set for each node in the list / ul / li / div.

Similarly, we can get the list of sponsors by request:
.//*[@] / ul / li / div / p

Some xpath functions

There are many functions in xpath for working with elements within a collection. I will give only some of them.

last ():
Returns the last item in the collection.
Query ul / li / div / p [last ()] - will return the last paragraphs for each node of the ul list.
The first () function is not provided. To access the first item, use the index "1".

text ():
Returns the test content of an element.
.// a [text () = 'Archive'] - we get all the links with the text “Archive”.

position () and mod:
position () - returns the position of the element in the set.
mod - remainder of division.

By the combination of these functions we can get:
- odd elements ul / li [position () mod 2 = 1]
- even elements: ul / li [position () mod 2 = 0]

Comparison operations

  • - logical "more"
  • = - logical "greater than or equal to"

ul / li [position ()> 2], ul / li [position () - list items starting from the 3rd number and vice versa.


In a simple example, we saw the capabilities of xpath for accessing web page nodes.
Xpath is the industry standard for accessing xml and xhtml, xslt transform elements.
You can use it for parsing any html page. If the source html-code contains significant markup errors, pass it through tidy. Errors will be fixed.

Try to refuse regular expressions when parsing web pages in favor of xpath.
This will make your code easier, more understandable. You make fewer mistakes. Reduce debugging time.