$ ls ~yifei/notes/

使用 Chardet 自动检测文本编码

Posted on:

Last modified:

Python 中的 chardet 库可以用来猜测文件的编码,还有一个性能更好的 cchardet 无缝替换。


pip install cchardet
In [1]: import cchardet as chardet

In [2]: chinese_bytes = "中文".encode("utf-8")

In [3]: chardet.detect(chinese_bytes)
Out[3]: {"confidence": 0.7524999976158142, "encoding": "UTF-8"}

© 2016-2022 Yifei Kong. Powered by ynotes

All contents are under the CC-BY-NC-SA license, if not otherwise specified.

Opinions expressed here are solely my own and do not express the views or opinions of my employer.