Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cha 2 -编写你的第一个网络爬虫.ipynb #2

Open
Github-Minghui opened this issue Jan 31, 2019 · 2 comments
Open

Cha 2 -编写你的第一个网络爬虫.ipynb #2

Github-Minghui opened this issue Jan 31, 2019 · 2 comments

Comments

@Github-Minghui
Copy link

当我试图运行下面这段编码的时候,编译器报错。在想是否因为我在用的是英文系统,无法encode中文。请教
import requests
from bs4 import BeautifulSoup #从bs4这个库中导入BeautifulSoup

link = "http://www.santostang.com/"
headers = {'User-Agent' : 'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.1.6) Gecko/20091201 Firefox/3.5.6'}
r = requests.get(link, headers= headers)

soup = BeautifulSoup(r.text, "html.parser") #使用BeautifulSoup解析这段代码
title = soup.find("h1", class_="post-title").a.text.strip()
print (title)

with open('title_test.txt', "a+") as f:
f.write(title)
f.close()

===================================================================
4.3 通过selenium 模拟浏览器抓取

UnicodeEncodeError Traceback (most recent call last)
in
11
12 with open('title_test.txt', "a+") as f:
---> 13 f.write(title)
14 f.close()

~\Anaconda3\lib\encodings\cp1252.py in encode(self, input, final)
17 class IncrementalEncoder(codecs.IncrementalEncoder):
18 def encode(self, input, final=False):
---> 19 return codecs.charmap_encode(input,self.errors,encoding_table)[0]
20
21 class IncrementalDecoder(codecs.IncrementalDecoder):

UnicodeEncodeError: 'charmap' codec can't encode characters in position 4-5: character maps to

@shimmer07
Copy link

我在我的电脑上运行没有问题,你试试抓取英文的可以吗

@ffflora
Copy link

ffflora commented Jun 26, 2019

试试这样,可以解决这个问题

import codecs
with codecs.open('title_test.txt','a+','utf-8') as f:
    f.write(title)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants