Saturday, April 30, 2011

Finetuning: Encoding


There are a lot of characters that you normally don’t even think about when writing your book that may be very difficult to preserve in your ebook. Take for instance the lowly quote marks in the following phrase:

"To be or not to be"

In this example, you’re seeing typewriter quotes. It’s actually pretty hard in Word to write typewriter quotes anymore; Word automatically converts the example above into the sample below:

“To be or not to be”

Notice that you’re now seeing typographic quotes, true open and close quotes commonly called “sixes” and “nines.” You might think of typographic quotes that only fuddy-duddies worry about, but they serve a real purpose in a book. They unconsciously clue you in when someone has started talking and when they’ve stopped. It’s very handy when there are several characters talking at once.

Unfortunately, however, if you copy the following paragraph from your Word document:

“I’ll go in ahead and confirm the safety is still on and it isn’t cocked.” She relayed this and added her own comment, “I’ll have to go with your guys and relay for Munroe.”
and paste it into the Design view of a blank Dreamweaver document, what you’ll see in Code View will look like this (example 1):
<p class="body">“I’ll go in ahead and confirm the safety is still on and it isn’t cocked.” She relayed this and added her own comment, “I’ll have to go with your guys and relay for Munroe.”</p>
when you would prefer to see something that looks like this (example 2):
<p class="body">&ldquo;I&rsquo;ll go in ahead and confirm the safety is still on and it isn&rsquo;t cocked.&rdquo; She relayed this and added her own comment, &ldquo;I&rsquo;ll have to go with your guys and relay for Munroe.&rdquo;</p>
or even better, this (example 3):
<p class="body">&#8220;I&#8217;ll go in ahead and confirm the safety is still on and it isn&#8217;t cocked.&#8221; She relayed this and added her own comment, &#8220;I&#8217;ll have to go with your guys and relay for Munroe.&#8221;</p>
So what’s wrong with example 1?
Well, you’re looking at the Code View and notice the quote marks are not encoded. You’ve probably visited a website where some characters, often the quote marks, display incorrectly as some weird character. Of course, most web browsers are getting smarter about displaying unencoded characters correctly, but it’s not a chance you want to take.

What’s right with example 2?
In example 2, the quote marks are properly encoded, but this will only happen if in the page properties of the Dreamweaver HTML document you pick the Western ISO Latin-1 encoding. Picking the right encoding informs the web browser (or ebook reader) what character sets are valid in the document. In the Latin 1 encoding, for instance, &ldquo; equates to a double open quote (double sixes).

What’s wrong with example 2?
The Latin 1 encoding is perfect if you’re trying to create a Kindle MOBI file, but the Barnes & Noble Nook and Apple’s iBookstore and Lulu.com prefer EPUB documents, and EPUB documents prefer the UTF-8 encoding.

So example 3 is the correct example?
For EPUB files, example 3 is best, but there’s a catch. If you copy Word content and paste it into a Latin 1 encoded Dreamweaver HTML file, the typographic quotes will be converted into the proper HTML entities — &ldquo; for open double quotes, &rdquo; for close double quotes, &rsquo; for apostrophes — but unfortunately the conversion trick doesn’t work for UTF-8 encoded documents.

What you’ll need to do is copy from Word and paste into a Latin 1 encoded HTML file, and then in the source code find any named HTML entities and replace them with their numeric equivalents. The table below should help. When you’re done, you can change the HTML documents’ encoding from Latin 1 to UTF-8.


&mdash; &#8212;
&ldquo; &#8220;
&rdquo; &#8221;
&lsquo; &#8216;
&rsquo; &#8217;
&hellip; &#8230;
é &eacute; &#223;
á &aacute; &#225;

No comments:

Post a Comment