<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: UTF-8 checklist</title>
	<atom:link href="http://soledadpenades.com/2007/12/11/utf-8-checklist/feed/" rel="self" type="application/rss+xml" />
	<link>http://soledadpenades.com/2007/12/11/utf-8-checklist/</link>
	<description>repeat 4[fd 100 rt 90]</description>
	<lastBuildDate>Mon, 30 Jan 2012 21:18:07 +0000</lastBuildDate>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
	<item>
		<title>By: winden</title>
		<link>http://soledadpenades.com/2007/12/11/utf-8-checklist/#comment-47020</link>
		<dc:creator>winden</dc:creator>
		<pubDate>Fri, 14 Dec 2007 00:42:12 +0000</pubDate>
		<guid isPermaLink="false">http://www.soledadpenades.com/2007/12/11/utf-8-checklist/#comment-47020</guid>
		<description>My app was a japanese+english dictionary and stored the lines in utf8 in memory.

Searching a dictionary is nothing more complex than doing a lot of substring compares using strstr, and that&#039;s safe to do between two utf8 strings due to the binary encoding.

All other per-char stuff was done with wide chars which is easy due to fixed bitsize chars.

bugfix for 4th point above:

wchar_s *s = L&quot;whatever&quot;;</description>
		<content:encoded><![CDATA[<p>My app was a japanese+english dictionary and stored the lines in utf8 in memory.</p>
<p>Searching a dictionary is nothing more complex than doing a lot of substring compares using strstr, and that&#8217;s safe to do between two utf8 strings due to the binary encoding.</p>
<p>All other per-char stuff was done with wide chars which is easy due to fixed bitsize chars.</p>
<p>bugfix for 4th point above:</p>
<p>wchar_s *s = L&#8221;whatever&#8221;;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: sole</title>
		<link>http://soledadpenades.com/2007/12/11/utf-8-checklist/#comment-46989</link>
		<dc:creator>sole</dc:creator>
		<pubDate>Tue, 11 Dec 2007 23:16:09 +0000</pubDate>
		<guid isPermaLink="false">http://www.soledadpenades.com/2007/12/11/utf-8-checklist/#comment-46989</guid>
		<description>I understood that if you just use chars when dealing with utf characters things can easily be broken - for example if uppercasing a string. 
Although I haven&#039;t checked the wide character string functions (I don&#039;t use C for handling strings ;-)) ...</description>
		<content:encoded><![CDATA[<p>I understood that if you just use chars when dealing with utf characters things can easily be broken &#8211; for example if uppercasing a string.<br />
Although I haven&#8217;t checked the wide character string functions (I don&#8217;t use C for handling strings ;-)) &#8230;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: winden</title>
		<link>http://soledadpenades.com/2007/12/11/utf-8-checklist/#comment-46979</link>
		<dc:creator>winden</dc:creator>
		<pubDate>Tue, 11 Dec 2007 10:06:36 +0000</pubDate>
		<guid isPermaLink="false">http://www.soledadpenades.com/2007/12/11/utf-8-checklist/#comment-46979</guid>
		<description>Important steps if you are coding in C using POSIX:

1. Add setlocale(LC_CTYPE,&quot;&quot;); at the start of your main function.

2. Use char where you are managing UTF-8, and internally pack/unpack to wchar_t which is a flat 32bits-wide character when doing internal operations. CPU+cache is fast and memory is slow, so take advantage and pack your strings even while in memory.

3. A literal string with UTF-8 enconding:

char *s = &quot;whatever&quot;;

4. A literal string with UTF-32 encoding:

wchar_t *s = &quot;whatever&quot;;</description>
		<content:encoded><![CDATA[<p>Important steps if you are coding in C using POSIX:</p>
<p>1. Add setlocale(LC_CTYPE,&#8221;"); at the start of your main function.</p>
<p>2. Use char where you are managing UTF-8, and internally pack/unpack to wchar_t which is a flat 32bits-wide character when doing internal operations. CPU+cache is fast and memory is slow, so take advantage and pack your strings even while in memory.</p>
<p>3. A literal string with UTF-8 enconding:</p>
<p>char *s = &#8220;whatever&#8221;;</p>
<p>4. A literal string with UTF-32 encoding:</p>
<p>wchar_t *s = &#8220;whatever&#8221;;</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Nahuel</title>
		<link>http://soledadpenades.com/2007/12/11/utf-8-checklist/#comment-46977</link>
		<dc:creator>Nahuel</dc:creator>
		<pubDate>Tue, 11 Dec 2007 07:01:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.soledadpenades.com/2007/12/11/utf-8-checklist/#comment-46977</guid>
		<description>I&#039;ve worked with django for about 6 month now, I&#039;ve seen it getting decent unicode support.
First make sure you&#039;re using the SVN version, and also make sure you&#039;re writing something like http://www.python.org/dev/peps/pep-0263/ says.
Even though django prints objects right with the __str__ method (something like java&#039;s toString), don&#039;t forget to implement __unicode__ for every model class.
Also keep in mind current python strings get the system encoding, so I don&#039;t feel very confident when I write something like &quot;%s %s&quot; (var1, var2). I tend to write u&#039;%s %s&#039; % (var1, var2).
If you are using some extra python package like reportlab for PDF output, make sure you use the unicode(instance), or u&#039;something %s&#039; % obj
That&#039;s all I have to do to get proper unicode handling, it&#039;s easy though I miss java a little in this particular subject.</description>
		<content:encoded><![CDATA[<p>I&#8217;ve worked with django for about 6 month now, I&#8217;ve seen it getting decent unicode support.<br />
First make sure you&#8217;re using the SVN version, and also make sure you&#8217;re writing something like <a href="http://www.python.org/dev/peps/pep-0263/" rel="nofollow">http://www.python.org/dev/peps/pep-0263/</a> says.<br />
Even though django prints objects right with the __str__ method (something like java&#8217;s toString), don&#8217;t forget to implement __unicode__ for every model class.<br />
Also keep in mind current python strings get the system encoding, so I don&#8217;t feel very confident when I write something like &#8220;%s %s&#8221; (var1, var2). I tend to write u&#8217;%s %s&#8217; % (var1, var2).<br />
If you are using some extra python package like reportlab for PDF output, make sure you use the unicode(instance), or u&#8217;something %s&#8217; % obj<br />
That&#8217;s all I have to do to get proper unicode handling, it&#8217;s easy though I miss java a little in this particular subject.</p>
]]></content:encoded>
	</item>
</channel>
</rss>

