XML XPathBasics

LearningXML XPath concepts, 语法 and application, Understandsuch as何usingXPath in XMLdocumentationin定位 and 选择node, implementation high 效 XMLdataquery and 提取

XPathIntroduction

XPath (XML Path Language) is a用于 in XMLdocumentationin定位 and 选择node language, 它providing了一种简洁, high 效 方式来query and 提取XMLdata. XPath最初 is for XSLTdesign , 但现 in 已经成 for 许 many XMLprocessingtool and API 标准querylanguage.

XPath 作用

  • in XMLdocumentationin定位specific node
  • 选择满足specific条件 node集
  • 提取node in 容 and property
  • for nodeforsort and filter
  • in XSLTin用于模板匹配 and data转换
  • in JavaScript, Pythonetc.programminglanguagein用于XML解析 and processing

XPath version

  • XPath 1.0: 1999年release, is 目 before 最广泛support version
  • XPath 2.0: 2007年release, scale了XPath 1.0 functions, 增加了 for 序列, 正则表达式etc. support
  • XPath 3.0: 2014年release, 进一步scale了XPath 2.0 functions
  • XPath 3.1: 2017年release, 增加了 for JSONdata support

XPath 特点

  • 基于path表达式, 语法简洁易懂
  • support many 种nodeclass型 and 选择方式
  • providing丰富 functionlibrary, supportstring, 数值, booleanetc.operation
  • support轴 (Axes) , 可以flexible地定位node
  • support谓词 (Predicates) , 可以根据条件filternode
  • 可以嵌套using, 构建 complex query表达式

XPathnodeclass型

in XPathin, XMLdocumentation被视 for nodetree, 每个node都 has specific class型. XPath定义了七种nodeclass型:

nodeclass型 describes example
documentationnode (Document Node) 整个XMLdocumentation 根node /
元素node (Element Node) XML元素 <book>, <title>
propertynode (Attribute Node) 元素 property id="B001", name="title"
文本node (Text Node) 元素 or property 文本 in 容 文本 in 容
commentnode (Comment Node) XMLcomment <!-- 这 is comment -->
processing指令node (Processing Instruction Node) XMLprocessing指令 <?xml version="1.0"?>
namespacenode (Namespace Node) 元素 namespace声明 xmlns="http://example.com"
XMLnodetreeexample
<?xml version="1.0" encoding="UTF-8"?>
<!-- 这 is a graph书list -->
<books>
    <book id="B001">
        <title>XMLBasicstutorial</title>
        <author>张三</author>
        <year>2025</year>
        <price>99.00</price>
    </book>
    <book id="B002">
        <title>XMLadvancedapplication</title>
        <author>李四</author>
        <year>2025</year>
        <price>129.00</price>
    </book>
</books>

XPath表达式

XPath表达式用于选择XMLdocumentationin node or node集. 表达式可以 is simple path, 也可以 is complex 条件表达式.

basic语法

XPath表达式 basic语法class似于filesystempath:

表达式 describes
/ from 根node开始
// 选择所 has 匹配 node, 无论它们 in documentationin 位置such as何
. 选择当 before node
.. 选择当 before node 父node
@ 选择property

path表达式example

XPathpath表达式example
<!-- using on 面 books.xmlexample -->


/books/book


//book


//title


//book/@id


//book[@id="B001"]


//book[@id="B001"]/title


//book/..


./book

谓词 (Predicates)

谓词用于filternode, 它们被放 in 方括号[]in. 谓词可以package含各种条件表达式, 用于选择满足specific条件 node.

XPath谓词example
<!-- 选择第一个book元素 -->
//book[1]


//book[last()]


//book[position() < 3]


//book[@id="B001"]


//book[author]


//book[author="张三"]


//book[price > 100]


//book[starts-with(@id, "B")]

XPath轴

XPath轴定义了所选node and 当 before node之间 relationships, 用于 in XMLdocumentationin导航. 轴可以 and nodetest and 谓词结合using, 构建 complex XPath表达式.

常用轴

轴名称 describes example
ancestor 选择当 before node 所 has 祖先node (父, 祖父etc.) //title/ancestor::book
ancestor-or-self 选择当 before node及其所 has 祖先node //title/ancestor-or-self::*[name()="title"]
child 选择当 before node 所 has 子node //book/child::title
descendant 选择当 before node 所 has after 代node (子, 孙etc.) //books/descendant::title
descendant-or-self 选择当 before node及其所 has after 代node //books/descendant-or-self::*
following 选择当 before node之 after 所 has node //book[1]/following::book
following-sibling 选择当 before node之 after 所 has 兄弟node //book[1]/following-sibling::book
parent 选择当 before node 父node //title/parent::book
preceding 选择当 before node之 before 所 has node //book[last()]/preceding::book
preceding-sibling 选择当 before node之 before 所 has 兄弟node //book[2]/preceding-sibling::book
self 选择当 before node //book[1]/self::book

轴example

XPath轴example
<!-- 选择所 has book元素 祖先node -->
//book/ancestor::*


//title/parent::*


//book[1]/following-sibling::*


//book[last()]/preceding-sibling::*


/books/descendant::*


//book[@id="B001"]/child::*

XPathfunction

XPathproviding了丰富 functionlibrary, 用于processingstring, 数值, boolean值 and nodeetc.. 这些function可以 in XPath表达式inusing, 增强XPath querycapacity.

常用function

1. nodefunction

  • last(): 返回 on under 文nodecollectionin 最 after 一个node 位置
  • position(): 返回 on under 文node in collectionin 位置
  • count(node-set): 返回nodecollectionin node数量
  • id(string): 返回具 has 指定ID 元素
  • local-name(node-set?): 返回node 本地名称 (不package含namespace before 缀)
  • name(node-set?): 返回node 完整名称 (package含namespace before 缀)
  • namespace-uri(node-set?): 返回node namespaceURI

2. stringfunction

  • string(object?): 将object转换 for string
  • concat(string, string, ...): 连接 many 个string
  • substring(string, start, length?): 提取string 子串
  • substring-before(string, string): 返回第一个stringin位于第二个string之 before 部分
  • substring-after(string, string): 返回第一个stringin位于第二个string之 after 部分
  • normalize-space(string?): 去除string两端 空格, 并将连续 空格replace for 单个空格
  • translate(string, string, string): replacestringin 字符
  • contains(string, string): check第一个string is 否package含第二个string
  • starts-with(string, string): check第一个string is 否以第二个string开头
  • ends-with(string, string): check第一个string is 否以第二个string结尾 (XPath 2.0+)
  • string-length(string?): 返回string long 度

3. 数值function

  • number(object?): 将object转换 for 数值
  • sum(node-set): 返回nodecollectionin所 has node 数值之 and
  • floor(number): 返回 small 于 or etc.于指定数值 最 big 整数
  • ceiling(number): 返回 big 于 or etc.于指定数值 最 small 整数
  • round(number): 将数值四舍五入 for 最接近 整数

4. booleanfunction

  • boolean(object?): 将object转换 for boolean值
  • not(boolean): 返回boolean值 否定
  • true(): 返回boolean值true
  • false(): 返回boolean值false
  • lang(string): check当 before node language is 否 and 指定language匹配

functionexample

XPathfunctionexample
<!-- 选择最 after 一个book元素 -->
//book[last()]


//book[position() > 1]


count(//book)


//book[contains(title, "XML")]


//book[starts-with(title, "XML")]


//book[string-length(author) > 2]


concat(//book[1]/title, " by ", //book[1]/author)


sum(//book/price)


//book[price > sum(//book/price) div count(//book)]


//book[@id="B001" or @id="B002"]


//book[not(@id="B001")]

实践case: usingXPathqueryXMLdocumentation

casedescribes

creation一个XMLdocumentation, 然 after usingXPath表达式query and 提取其in data.

implementation步骤

  1. creation一个名 for library.xml XMLfile
  2. in XMLfilein添加graph书information
  3. writingXPath表达式querygraph书information
  4. usingxmllinttooltestXPath表达式

最终code

library.xml
<?xml version="1.0" encoding="UTF-8"?>
<library>
    <book id="B001" category="计算机">
        <title>XMLBasicstutorial</title>
        <author>张三</author>
        <year>2025</year>
        <price>99.00</price>
    </book>
    <book id="B002" category="计算机">
        <title>XMLadvancedapplication</title>
        <author>李四</author>
        <year>2025</year>
        <price>129.00</price>
    </book>
    <book id="B003" category="文学">
        <title> small 说集</title>
        <author>王五</author>
        <year>2024</year>
        <price>89.00</price>
    </book>
    <book id="B004" category="文学">
        <title>散文集</title>
        <author>赵六</author>
        <year>2023</year>
        <price>79.00</price>
    </book>
</library>

XPathqueryexample

usingxmllinttestXPath表达式
# usingxmllinttooltestXPath表达式

# 选择所 has book元素
xmllint --xpath "//book" library.xml

# 选择所 has book元素 title子元素
xmllint --xpath "//book/title" library.xml

# 选择所 has book元素 idproperty
xmllint --xpath "//book/@id" library.xml

# 选择idproperty值 for "B001" book元素
xmllint --xpath "//book[@id='B001']" library.xml

# 选择idproperty值 for "B001" book元素 title子元素 文本 in 容
xmllint --xpath "//book[@id='B001']/title/text()" library.xml

# 选择categoryproperty值 for "计算机" book元素
xmllint --xpath "//book[@category='计算机']" library.xml

# 选择price子元素值 big 于100 book元素
xmllint --xpath "//book[price > 100]" library.xml

# 选择author子元素值 for "张三" book元素
xmllint --xpath "//book[author='张三']" library.xml

# 计算book元素 数量
xmllint --xpath "count(//book)" library.xml

# 计算所 has book元素 price之 and 
xmllint --xpath "sum(//book/price)" library.xml

互动练习

练习1: writingXPath表达式

针 for on 面 library.xmldocumentation, writingXPath表达式来completion以 under query:
  1. 选择所 has categoryproperty值 for "文学" book元素
  2. 选择year子元素值 for 2025 book元素
  3. 选择price子元素值 small 于90 book元素
  4. 选择第二个book元素
  5. 选择所 has book元素 author子元素

XPath表达式such as under :

  1. //book[@category='文学']
  2. //book[year='2025']
  3. //book[price < 90]
  4. //book[2] or (//book)[2]
  5. //book/author

练习2: analysisXPath结果

针 for on 面 library.xmldocumentation, analysis以 under XPath表达式 执行结果:
  1. //book[last()]/title
  2. //book[position()=3]/author
  3. //book[contains(title, "集")]
  4. count(//book[@category='计算机'])
  5. sum(//book[price < 100]/price)

XPath表达式 执行结果such as under :

  1. 返回最 after 一个book元素 title子元素, 即<title>散文集</title>
  2. 返回第三个book元素 author子元素, 即<author>王五</author>
  3. 返回所 has title元素package含"集" book元素, 即第三个 and 第四个book元素
  4. 返回categoryproperty值 for "计算机" book元素 数量, 即2
  5. 返回price子元素值 small 于100 book元素 price之 and , 即89.00 + 79.00 = 168.00