character sets? unicode?

Thread starter Michael
Start date Feb 3, 2005

Michael

Feb 3, 2005

I'm trying to import text from email I've received, run some regular
expressions on it, and save the text into a database. I'm trying to
figure out how to handle the issue of character sets. I've had some
problems with my regular expressions on email that has interesting
character sets. Korean text seems to be filled with a lot of '=3D=21'
type of stuff. This doesn't look like unicode (or am I wrong?) so does
anyone know how I should handle it? Do I need to do anything special
when passing text with non-ascii characters to re, MySQLdb, or any other
libraries? Is it better to save the text as-is in my db and save the
character set type too or should I try to convert all text to some
default format like UTF-8? Any advice? Thanks.

Ask a Question

Want to reply to this thread or ask your own question?

You'll need to choose a username for the site, which only take a couple of moments. After that, you can post your question and our members will help you out.

Ask a Question

Similar Threads

File names, character sets and Unicode	1	Dec 12, 2008
MySQLdb not playing nice with unicode	1	Mar 30, 2013
Outputting signal values to terminal Within Character Array	0	Dec 10, 2021
Python Unicode handling wins again -- mostly	67	Nov 30, 2013
Unicode questions	17	Oct 19, 2010
prob's w foreign char sets ...	9	May 20, 2012
Thinking Unicode	0	Aug 8, 2013
Python 3.3, gettext and Unicode problems	0	Dec 31, 2012

Facebook Twitter Reddit Pinterest Tumblr WhatsApp Email Link

Members online

No members online now.

Total: 25 (members: 0, guests: 25)
Robots: 427

Forum statistics

Threads: 473,769

Messages: 2,569,577

Members: 45,054

Latest member: LucyCarper

Latest Threads

Can I stop HTTPS?
- Started by IBMJunkman
- Today at 2:34 PM
Stephanie Beaudeau Emsworth is Running a Prostitution Ring
- Started by verona
- Today at 4:11 AM
Reverse search for a website
- Started by DRCM
- Yesterday at 7:44 PM
Sign Certificate, Library jsrsasign-latest-all-min.js using function KJUR.jws.JWS.sign('PS256')
- Started by icassiem
- Yesterday at 8:29 AM
Sign Certificate, Library jsrsasign-latest-all-min.js using function KJUR.jws.JWS.sign('PS256')
- Started by icassiem
- Yesterday at 8:23 AM
What are the key advantages of using a SaaS (Software as a Service) model for application development?
- Started by remotedevelopers
- Tuesday at 12:34 PM
How to build a database-driven web page
- Started by av3mar1a153
- Monday at 5:24 PM
Hola
- Started by luuciefer
- Monday at 2:24 AM
Using a DTSX file with GoDaddy
- Started by IBMJunkman
- Sunday at 8:33 PM
Exit the infinity while loop by pressing the button and continue with the switch element.
- Started by NexaHn
- Sunday at 7:06 PM

Top