E-Book Formats

Most technical writers understand online help formats and have worked with at least one over the years. Help file format have evolved from man pages (manual pages in UNIX in the early 1970’s) and HLP files through CHM files and the plethora of HTML-based formats that we have now. E-Book formats are similar in many respects to the common online help formats, but with one crucial difference; they’re designed to work on the small screens of today’s e-readers and tablets.

Although modern e-readers and tablets can display several common file formats, e-books are usually distributed in one of three formats:

  • PDF, for almost all devices
  • EPUB, for most e-readers and tablets, except for Amazon’s Kindles
  • Kindle, for Amazon’s Kindles

To make life a little more complicated, new versions of both EPUB and Amazon’s Kindle format will be appearing in the near future.

PDF

Portable Document Format (PDF) was developed by Adobe to digitally reproduce the format of printed documents. It does that very well. Over the years, Adobe has enhanced the format to include accessibility features and rich media, like embedded video and 3D visualizations. It’s now an open ISO standard (ISO-320001:2008).

Because PDF reproduces the original document’s format, it’s a good choice for newspapers, magazines, and textbooks, where the design and layout are important. Because text does not reflow (except for the latest versions of Acrobat Reader), PDF typically doesn’t work well on small e-reader screens. Some e-readers handle PDFs better than others. Amazon’s newest Kindles have an article mode that can extract text and display it in a more readable format. PDFs may be easier to read on tablet devices, depending on the size of the screen and the format of the document; larger tablets like the iPad make great PDF readers.

It’s also possible to modify some PDFs so that they display in a format that’s more readable on smaller screens. For Kindles, k2pdfopt is a command-line utility that optimizes PDFs for display on the Kindle. It may be sufficient for documents that don’t have a complex format. PDFMasher is a tool from Hardcoded Software that converts PDFs into either MOBI or EPUB format. It’s still under development and has some rough edges, but worked quite well on a PDF of a short story.

 

EPUB

EPUB is the most widely used e-book format. Except for Amazon’s Kindles, almost all other e-readers and tablets support it. It is an open standard developed by the International Digital Publishing Forum, an electronic publishing industry trade and standards association. The latest version EPUB 3.0, was approved in October 2011, so most e-books being sold now follow the older EPUB 2.0.1 standard.

An EPUB book is a set of XML, XHTML, and CSS files compressed into a ZIP archive (with some specific conditions) with an .epub extension.

  • toc.ncx: Hierarchical table of contents, XML format
  • content.opf: Metadata about the book (title, author, publisher, etc.), file manifest, and linear reading order, XML format
  • *.css: CSS files to control formatting
  • *.html: Content of the book, XHTML format

The full structure of an EPUB file, naming conventions, and other restrictions are defined in the EPUB specifications. The Wikipedia article on the EPUB format has a good summary.

So what’s to like about EPUB?

  • It supports text reflow, so it works on any size screen from phones to tablets and PCs.
  • Fonts are resizable, which is great for nearsighted readers like me.
  • Linking is supported (although support for this varies across e-readers).

However, as with most technologies, there are some weak points:

  • Graphics aren’t resizable.
  • Complex tables are problematic.
  • The specification doesn’t support indexes, although you can imitate them with link pages.

Although EPUB is a well-defined standard, there are differences in how different vendors implement it in their devices. These differences are similar to problems that web developers face in supporting multiple web browsers – but worse. Some formatting issues have workarounds, but not all. So it’s quite possible that you will create more than one version of a book if you have to support multiple devices – for example, a troubleshooting guide that will be used by technicians who have Blackberry devices and iPads. For a detailed discussion of these problems, get a copy of How to Format Your eBook for Kindle, NOOK, Smashwords, and Everything Else by Paul Salvette.

The new EPUB 3.0 adds some new features and capabilities to EPUB, including support for:

  • XHLTM5 including MATHML for equations
  • Scalable Vector Graphics (SVG)
  • CSS3
  • JavaScript
  • TrueType fonts

All of these should give authors and book designers much more control over the layout and quality of their e-books, but it may be a while before tool and hardware vendors support the new standard.

As a reader, don’t assume that because you’re buying an EPUB book, that you’ll be able to read it on different brands of e-readers. Most online bookstores add some form of Digital Rights Management (DRM) software to their books (most commonly Adobe Digital Editions). If you buy an e-book from Barnes and Noble for a Nook, you won’t be able to read it on a Kobo, even though both vendors’ formats are nominally EPUB. As with most forms of DRM, it is possible to find some workarounds.

Until recently, EPUB was the only e-book format available for download from libraries, through the Overdrive platform. Overdrive now supports Amazon’s Kindle format, at least in the United States, but it’s likely that EPUB will be the most common format for library downloads for some time to come.

Given that EPUB is a lot like older online help formats, it should be no surprise that many help authoring tools now support EPUB as an output format. There are also many open source and shareware tools available to create EPUB books or to convert other formats in to EPUB. The next article in this series will cover tools in more detail.

Kindle

Amazon’s Kindle format is a proprietary variant of the Mobipocket format, developed by a French firm that Amazon acquired in 2005. Kindle books sold by Amazon have an .azw extension and are essentially Mobipocket format (.mobi or .prc) with Amazon’s DRM wrapper. The Kindle can read unprotected Mobipocket files.  Kindle files are based on HTML, although the format is poorly documented, compared to EPUB. The Mobipocket format uses proprietary file compression and a binary-format header, so you can’t just unzip a .mobi file to edit the content. The Kindle format has several format limitations, of which these are only a few:

  • Tables are limited.
  • There is no control over justification.
  • Text cannot reflow around images larger than one line high.
  • Images do not resize when fonts are resized.

The best information that I’ve found on formatting e-book files for the Kindle is contained in:

Amazon supplies a free command-line tool called KindleGen to create Kindle files from raw HTML files. However, it’s much easier to create a file for the Kindle by creating an EPUB file first and converting it, or by uploading a properly formatted Word file directly to Amazon.

In October, Amazon announced Kindle Format 8, a new format for Kindle e-books. Like EPUB 3.0, it is based on HTML 5 and CSS3 and supports complex formatting with many new layout options for publishers and designers, including those who handle technical communications content. However, it doesn’t include support for audio, video, or scripting, which limits its capabilities more than the EPUB 3.0 standard.

File Save As is not enough

Most of you probably would agree that saving a document as PDF doesn’t make it an e-book. But neither does saving it as EPUB or Kindle format. . In the case of technical publishing and e-books, Marshal McLuan’s famous quote, “The medium is the message,” is an overstatement, but the medium you are publishing to does affect how your content will appear to the reader. If you don’t take that into account, your readers’ experience will be less than ideal.

If you expect to deliver your content to a wide audience, you will have to determine how it will display on four different platforms:

  • Smartphones, where screens are smaller than 4.5” and can display color.
  • E-readers, with 6” E Ink screens that can display only black and white.
  • Tablets, with screens ranging from 7” to 11” that can display color.
  • Laptops and PCs, with large, high-resolution, color displays.

You may also have to consider differences between different brands of devices on the same platform; for example, Nook or Kindle.

So if you thought that having to deal with producing print and HTML versions of your content was complicated enough, you may want to avoid the e-book publishing world, at least until standards become more robust and interoperable. But if you are a technical communicator who doesn’t have that choice, planning and scheduling the delivery of your content must include consideration for e-book formats and platforms, and production on those platforms.

To help provide some guidance on e-book production, I’ll take a look at some of the tools you can use to create e-books in the next article in this series.

 

Resources

Keith Soltys

After a series of sales and office jobs, Keith Soltys discovered technical writing in 1987 and never looked back. He currently works at the Toronto Stock Exchange.

Read more articles from Keith Soltys