UniCode
computer-text-internationalization standard
I'm not quite clear on whether you should use UniCode and UTF as different terms
- apparently UTF-8 was invented in 1992 at a coffee shop around the corner from my old high school by Rob Pike and Ken Thompson, which isn't shocking since that's near Murray Hill Bell Labs.
http://en.wikipedia.org/wiki/Unicode
Python encoding/decoding
- WikiGraph scraping uses
page_text = unicodedata.normalize('NFKD', page_text).encode('ascii','ignore')
- https://docs.python.org/2/library/unicodedata.html#unicodedata.normalize
- but I think there's something nicer...
- Unicode Zen in Python 2.x - The Long Version
Edited: | Tweet this! | Search Twitter for discussion