Python Pyquery 学习手册

python pyquery 学习手册

安装与使用

pyquery: 一个为 python 定制的 jquery-like 库

简介 pyquery: a jquery-like library for python

pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jquery. pyquery uses lxml for fast xml and html manipulation.

安装

pip install requests

pypi源速度过慢的情况下,可以考虑切换至国内源百度一下解决方案
例子:使用清华源

sudo -H pip install requests -i https://pypi.tuna.tsinghua.edu.cn/simple/ –trusted-host pypi.tuna.tsinghua.edu.cn

QuickStart

1
2
3
4
5
6
7
8
9
10
11
12
13
# 定义一段 html
html = """
<html><head><title>The Dormouse's story</title></head>
<body>
<p class="title" name="dromouse"><b>The Dormouse's story</b></p>
<p class="story">Once upon a time there were three little sisters; and their names were
<a href="http://example.com/elsie" class="sister" id="link1"><!-- Elsie --></a>,
<a href="http://example.com/lacie" class="sister" id="link2">Lacie</a> and
<a href="http://example.com/tillie" class="sister" id="link3">Tillie</a>;
and they lived at the bottom of a well.</p>
<p class="story">...</p>
</body><html>
"""
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
>>> from pyquery import PyQuery as pq
>>> html_pq = pq(html)
>>> # 查找某个标签,以p为例
>>> p = html_pq("p")
>>> p
>>> [<p.title>, <p.story>, <p.story>]
>>> # 返回依然是PyQuery对象,可以继续操作
>>> # 根据css类查找相应标签
>>> p(".title")
>>> [<p.title>]
>>> # 获取标签html内容
>>> p(".title").html()
>>> "<b>The Dormouse's story</b>"
>>> # 获取标签文本内容
>>> p(".title").text()
>>> "The Dormouse's story"
>>> # 根据id查找相应标签
>>> a = html_pq("a")
>>> [<a#link1.sister>, <a#link2.sister>, <a#link3.sister>]
>>> a("#link1").html()
>>> '<!-- Elsie -->'
>>> # 可以将查找条件写在一起,结果相同
>>> html_pq("a#link1").html()
>>> '<!-- Elsie -->'

获取属性

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
>>> from pyquery import PyQuery as pq
>>> html_pq = pq(html)
>>> # 根据属性查找标签
>>> html_pq("p[name='dromouse']")
>>> [<p.title>]
>>> # 获取标签属性
>>> p_test = pq('<p id="hello" class="hello"></p>')('p')
>>> p_test.attr('id')
>>> 'hello'
>>> # 修改标签属性
>>> p_test.attr("id", "change")
>>> [<p#change.hello>]
>>> # 同样可采用下列方式修改属性
>>> p_test.attr.id = "change"
>>> p_test.attr["id"] = "change_again"
>>> # 修改类属性时,用class_而不是class,class为python关键字
>>> p_test.attr.class_ = "change_class"
>>> p_test
>>> [<p#change_again.change_class>]
>>> # 同时修改多个属性
>>> p_test.attr(id='change_id', class_='change_class')
>>> [<p#change_id.change_class>]

操作CSS

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
>>> from pyquery import PyQuery as pq
>>> p = pq('<p id="hello" class="hello"></p>')('p')
>>> # 添加新CSS类
>>> p.addClass("newClass")
>>> [<p#hello.hello.newClass>]
>>> # 反转指定CCS类
>>> p.toggleClass("newClass")
>>> [<p#hello.hello>]
>>> p.toggleClass("newClass")
>>> [<p#hello.hello.newClass>]
>>> # 移除指定CSS类
>>> p.removeClass("newClass")
>>> [<p#hello.hello>]
>>> # 修改指定ccs属性
>>> p.css("font-size", "15px")
>>> p.css({"font-size": "15px"}) # 等价
>>> # 获取标签样式
>>> p.attr("style")
>>> 'font-size: 15px'
>>> # 可以用pythonic方式完成同样工作
>>> p.css.font_size = "15px"
>>> p.css['font-size'] = "15px"
>>> p.css(font_size="15px")
>>> p.css = {"font-size": "15px"}
坚持原创技术分享,您的支持将鼓励我继续创作!