Python实验项目9：网络爬虫与自动化

人工智能
2025-07-21 19:12:19

实验 1：爬取网页中的数据。

要求：使用 urllib 库和 requests 库分别爬取 http:// .sohu 首页的前 360 个字节的数据。

# 要求：使用 urllib 库和 requests 库分别爬取 http:// .sohu 首页的前 360 个字节的数据。 import urllib.request import requests # 使用 urllib 库爬取 http:// .sohu 首页的前 360 个字节的数据。 url = 'http:// .sohu ' req = urllib.request.Request(url) res = urllib.request.urlopen(req) data = res.read(360) print(data) # 使用 requests 库爬取 http:// .sohu 首页的前 360 个字节的数据。 #url = 'http:// .sohu ' #res = requests.get(url) #data = res.content[:360] #print(data)

实验 2：测试 BeautifulSoup 对象的方法。

要求：

1）创建 BeautifulSoup 对象。 2）测试搜索文档树的 find_all()方法和 find()方法。 # 实验 2：测试 BeautifulSoup 对象的方法。 # 要求： # 1）创建 BeautifulSoup 对象。 # 2）测试搜索文档树的 find_all()方法和 find()方法。 from bs4 import BeautifulSoup import requests # 过http请求加载网页 response = requests.get("http:// .sohu ") # 创建BeautifulSoup对象 soup = BeautifulSoup(response.text, "html.parser") # 搜索文档树的find_all()方法 print(soup.find_all("a")) # 搜索文档树的find()方法 print(soup.find("a"))

实验 3：爬取并分析网页页面数据。（1）使用requests库爬取 .hnnu.edu /main.htm首页内容。（2）编写程序获取 .hnnu.edu /119/list.htm的通知公告的信息。 # 实验 3：爬取并分析网页页面数据。 # （1）使用requests库爬取 .hnnu.edu /main.htm首页内容。 # （2）编写程序获取 .hnnu.edu /119/list.htm的通知公告的信息。 import requests from bs4 import BeautifulSoup url = ' .hnnu.edu /main.htm' res = requests.get(url) soup = BeautifulSoup(res.text,'html.parser') print(soup.find_all('a')) print(soup.find('a')) for i in range(1,23,1): url = ' .hnnu.edu /119/list.htm{}.htm'.format(i) res = requests.get(url) soup = BeautifulSoup(res.text,'html.parser') print("-------------------------------------------------------") print(soup) #print(soup.find('a'))

标签：

Python实验项目9：网络爬虫与自动化由讯客互联人工智能栏目发布，感谢您对讯客互联的认可，以及对我们原创作品以及文章的青睐，非常欢迎各位朋友分享到个人网站或者朋友圈，但转载请说明文章出处“Python实验项目9：网络爬虫与自动化”

上一篇
Pytorch神经网络的模型架构(nn.Module和nn.

下一篇
C_1练习题答案