0%

python乱码

本文整理遇到过的python乱码、中文输出异常问题,会持续更新。

http响应

以天气预报的接口为例http://www.weather.com.cn/data/sk/101010100.html

浏览器访问,默认会得到乱码。在浏览器中设置编码之后(菜单 - 更多 - 文字编码 - Unicode),正常显示中文。

liulanqi_1

liulanqi_2

用此可知,此网站响应内容在Unicode编码下可正常显示中文。

python3

python3默认编码为utf-8。

将response设置为unicode编码后:

1
2
3
4
5
6
7
8
9
10
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import requests
import json

url = 'http://www.weather.com.cn/data/sk/101010100.html'
response = requests.get(url)
response.encoding = 'unicode'
print(response.json())
print(response.text)

运行结果:

1
2
{'weatherinfo': {'city': '北京', 'cityid': '101010100', 'temp': '27.9', 'WD': '南风', 'WS': '小于3级', 'SD': '28%', 'AP': '1002hPa', 'njd': '暂无实况', 'WSE': '<3', 'time': '17:55', 'sm': '2.1', 'isRadar': '1', 'Radar': 'JC_RADAR_AZ9010_JB'}}
{"weatherinfo":{"city":"北京","cityid":"101010100","temp":"27.9","WD":"南风","WS":"小于3级","SD":"28%","AP":"1002hPa","njd":"暂无实况","WSE":"<3","time":"17:55","sm":"2.1","isRadar":"1","Radar":"JC_RADAR_AZ9010_JB"}}

将response设置为utf-8编码后:

1
2
3
4
5
6
7
8
9
10
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import requests
import json

url = 'http://www.weather.com.cn/data/sk/101010100.html'
response = requests.get(url)
response.encoding = 'utf-8'
print(response.json())
print(response.text)

运行结果:

1
2
{'weatherinfo': {'city': '北京', 'cityid': '101010100', 'temp': '27.9', 'WD': '南风', 'WS': '小于3级', 'SD': '28%', 'AP': '1002hPa', 'njd': '暂无实况', 'WSE': '<3', 'time': '17:55', 'sm': '2.1', 'isRadar': '1', 'Radar': 'JC_RADAR_AZ9010_JB'}}
{"weatherinfo":{"city":"北京","cityid":"101010100","temp":"27.9","WD":"南风","WS":"小于3级","SD":"28%","AP":"1002hPa","njd":"暂无实况","WSE":"<3","time":"17:55","sm":"2.1","isRadar":"1","Radar":"JC_RADAR_AZ9010_JB"}}

python2

python2中默认编码为ascii。目前我总结到的是string类型可正常显示中文,dict则无法正常显示,所以除了修改response.encoding之外,还需修改输出前的数据类型为string。

将response设置为unicode编码后:

1
2
3
4
5
6
7
8
9
10
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import requests
import json

url = 'http://www.weather.com.cn/data/sk/101010100.html'
response = requests.get(url)
response.encoding = 'unicode'
print(response.json())
print(response.text)

运行结果:

1
2
{u'weatherinfo': {u'city': u'\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd', u'WD': u'\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd', u'WSE': u'<3', u'temp': u'27.9', u'njd': u'\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd', u'isRadar': u'1', u'cityid': u'101010100', u'AP': u'1002hPa', u'WS': u'\ufffd\ufffd\ufffd\ufffd\ufffd\ufffd3\ufffd\ufffd\ufffd', u'Radar': u'JC_RADAR_AZ9010_JB', u'sm': u'2.1', u'time': u'17:55', u'SD': u'28%'}}
{"weatherinfo":{"city":"������","cityid":"101010100","temp":"27.9","WD":"������","WS":"������3���","SD":"28%","AP":"1002hPa","njd":"������������","WSE":"<3","time":"17:55","sm":"2.1","isRadar":"1","Radar":"JC_RADAR_AZ9010_JB"}}

修改encodingutf-8

1
2
3
4
5
6
7
8
9
10
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import requests
import json

url = 'http://www.weather.com.cn/data/sk/101010100.html'
response = requests.get(url)
response.encoding = 'utf-8'
print(response.json())
print(response.text)

运行结果:

1
2
{u'weatherinfo': {u'city': u'\u5317\u4eac', u'WD': u'\u5357\u98ce', u'WSE': u'<3', u'temp': u'27.9', u'njd': u'\u6682\u65e0\u5b9e\u51b5', u'isRadar': u'1', u'cityid': u'101010100', u'AP': u'1002hPa', u'WS': u'\u5c0f\u4e8e3\u7ea7', u'Radar': u'JC_RADAR_AZ9010_JB', u'sm': u'2.1', u'time': u'17:55', u'SD': u'28%'}}
{"weatherinfo":{"city":"北京","cityid":"101010100","temp":"27.9","WD":"南风","WS":"小于3级","SD":"28%","AP":"1002hPa","njd":"暂无实况","WSE":"<3","time":"17:55","sm":"2.1","isRadar":"1","Radar":"JC_RADAR_AZ9010_JB"}}

使用json.dumps处理(在python2中,未知编码类型的情况下很有用):

1
2
3
4
5
6
7
8
9
10
11
12
#!/usr/bin/env python2
# -*- coding: utf-8 -*-
import requests
import json

url = 'http://www.weather.com.cn/data/sk/101010100.html'
response = requests.get(url)
response.encoding = 'utf-8'
print(response.json())
print(response.text)
data = json.dumps(response.json(), ensure_ascii=False)
print(data)

运行结果:

1
2
3
{u'weatherinfo': {u'city': u'\u5317\u4eac', u'WD': u'\u5357\u98ce', u'WSE': u'<3', u'temp': u'27.9', u'njd': u'\u6682\u65e0\u5b9e\u51b5', u'isRadar': u'1', u'cityid': u'101010100', u'AP': u'1002hPa', u'WS': u'\u5c0f\u4e8e3\u7ea7', u'Radar': u'JC_RADAR_AZ9010_JB', u'sm': u'2.1', u'time': u'17:55', u'SD': u'28%'}}
{"weatherinfo":{"city":"北京","cityid":"101010100","temp":"27.9","WD":"南风","WS":"小于3级","SD":"28%","AP":"1002hPa","njd":"暂无实况","WSE":"<3","time":"17:55","sm":"2.1","isRadar":"1","Radar":"JC_RADAR_AZ9010_JB"}}
{"weatherinfo": {"city": "北京", "WD": "南风", "WSE": "<3", "temp": "27.9", "njd": "暂无实况", "isRadar": "1", "cityid": "101010100", "AP": "1002hPa", "WS": "小于3级", "Radar": "JC_RADAR_AZ9010_JB", "sm": "2.1", "time": "17:55", "SD": "28%"}}

读取文件

python3中默认使用unicode编码,文件读写基本不会出现问题;若有稀奇古怪的问题,可依次排查以下条件:

  • 欲读取文件的编码格式
  • 代码中明确指定的编码格式
  • 操作系统/终端当前的默认编码格式(代码未明确指定时使用)
1
2
3
4
5
6
7
8
import sys
print('default encoding: %s' % sys.getdefaultencoding())

myfile = 'a.txt'
with open(myfile) as f:
data = f.read()

print(data)

运行结果:

1
2
3
4
5
6
# file a.txt 
a.txt: UTF-8 Unicode text
# ./read_file.py
default encoding: utf-8
我简直太帅了。
I am so handsome.

比如同样的代码,在windows上运行有以下错误:

1
2
UnicodeDecodeError: 'gbk' codec can't decode byte 0xab in position 33: illegal multibyte sequence
default encoding: utf-8

open()函数中添加encoding='utf-8'参数后正常。

1
2
3
4
5
6
7
8
import sys
print('default encoding: %s' % sys.getdefaultencoding())

myfile = 'txt/name.txt'
with open(myfile, encoding='utf-8') as f:
data = f.read()

print(data)

输出结果:

1
2
default encoding: utf-8
諸葛亮|關羽|劉備|曹操|孫權|關羽|張飛|呂布|周瑜|趙雲|龐統|司馬懿|黃忠|馬超