share-image
ESC

利用bs4抓取图片上传到又拍云

我们通过bs4去爬取网页中的图片资源,然后将其上传到又拍云存储

申明,本文图片做完实验已全部删除

代码

我们先看下完整代码

#!/usr/bin/env python
#-*-coding:utf-8-*-
import upyun
from bs4 import BeautifulSoup
import requests
import time
up = upyun.UpYun('servername', 'username', 'password', timeout=30, endpoint=upyun.ED_AUTO)
notify_url = 'http://httpbin.org/post'

def requrl(url):
try:
headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36","Referer":"http://pic.hao123.com/static/pic/css/pic_xl_c.css?v=1482838677"}
r = requests.get(url,headers=headers)
r.raise_for_status()
r.encoding = r.apparent_encoding
demo = r.text
except:
print("请求异常")
soup = BeautifulSoup(demo, "html.parser")
all_images = soup.find_all("img")
return all_images

def fetch(url):
for i in url:
url = i['src']
filename = str(int(time.time()))+".jpg!awen)"
print(url)
print(filename)
fetch_tasks = [
{
'url': url,
'random': False,
'overwrite': True,
'save_as': '/upyun/'+filename,
}
]
time.sleep(5)
print(up.put_tasks(fetch_tasks, notify_url, 'spiderman'))

url = requrl("http://pic.hao123.com/meinv?style=xl")
fetch(url)

在开始之前,需要先安装又拍云的库以及requests库和BeautifulSoup

pip3 install requests
pip3 install upyun
pip3 install BeautifulSoup

函数requrl

def requrl(url):
try:
headers = {"User-Agent":"Mozilla/5.0 (Macintosh; Intel Mac OS X 10_12_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.90 Safari/537.36","Referer":"http://pic.hao123.com/static/pic/css/pic_xl_c.css?v=1482838677"}
r = requests.get(url,headers=headers)
r.raise_for_status()
r.encoding = r.apparent_encoding
demo = r.text
except:
print("请求异常")
soup = BeautifulSoup(demo, "html.parser")
all_images = soup.find_all("img")
return all_images

这个函数的作用是传入一个url,会去请求这个url并获取页面中所有的img元素并返回

函数fetch

def fetch(url):
for i in url:
url = i['src']
filename = str(int(time.time()))+".jpg!awen)"
print(url)
print(filename)
fetch_tasks = [
{
'url': url,
'random': False,
'overwrite': True,
'save_as': '/upyun/'+filename,
}
]
time.sleep(5)
print(up.put_tasks(fetch_tasks, notify_url, 'spiderman'))

url = requrl("http://pic.hao123.com/meinv?style=xl")
fetch(url)

该函数主要是把requrl请求后的返回值传入,然后调用又拍云的异步拉取接口拉取

文章作者:阿文
文章链接: https://www.awen.me/post/10537.html
版权声明:本博客所有文章除特别声明外,均采用 CC BY-NC-SA 4.0 许可协议。转载请注明来自 阿文的博客
本文于 2017-08-17 发布,已超过半年(3087天),请注意甄别内容是否已过期。