Wednesday, June 1, 2011

My first e-book buying mistake, followed by a nifty success

(Warning: the post below may be incredibly boring to anyone who isn't interested in file editing in general or e-book file editing in particular. I just wanted to get all this down in case I needed to remember it later. Plus, I'm proud that I managed to figure all of this out on my own. Yay!)

I haven't owned my Nook for long, and I haven't bought many e-books...up until a couple days ago, when I bought a whole slew of them in order to take advantage of some sales. With two of those books, I messed up.

I messed up differently with each one. With one, I was given a choice of file types and chose one that I thought worked on a Nook - html. I later realized that, no, html files can't be read on a Nook. With the other book, I didn't notice that the place I was buying it from didn't offer the book in EPUB format, or I would have bought it from another site instead. What I ended up with was a choice of 3 file types, only 1 of which could be read on a Nook.

Calibre turned out to be my savior in both cases.

With the html file, I first tried converting the file directly into an EPUB file. The results were ugly, and there were some formatting issues - if I remember right, some of the quotation marks, but not all, were not displaying. The site I bought the book from only allows buyers to download one file type, so I was stuck (although, if I had been really, really stuck, I could have emailed them, requested a file type I could use - in this case PDF would have been my only immediately usable option - and they would hopefully have helped me out). What I ended up doing was opening the html file, copying all the text, and pasting it into Open Office Writer. I saved it as a Rich Text file and then converted that into an EPUB file.

That looked much better than when I had converted the html file directly into an EPUB file, but there were still some formatting oddities I wanted to take care of. So, I went back into the Rich Text file, got rid of the extra spacing between paragraphs that I noticed when I converted the file into an EPUB file, indented the paragraphs, and then turned the Rich Text file back into an EPUB file. It now looks lovely, except for the copyright page, which is a giant blob of not-very-eye-friendly text. I could probably fix that, too, but I've already taken care of the problems I really wanted to take care of, so I think I'm done.

With the other book, I had three file types to choose from: PDF, .lit, and .prc. I can read a PDF file on a Nook, but the results usually aren't pretty at my preferred font size. I've read a novella in PDF format, but I didn't want to have to read a whole book in PDF if there was a better option. I tried converting the PDF file into an EPUB file using calibre. I can't remember exactly what was messed up, but I do remember that the results weren't good.

I vaguely remembered reading on one of the many book blogs I keep track of that calibre is good at changing .lit files into other file formats, so I decided to try converting that - the site I bought this particular book from let me download all file formats I wanted, not just one. When I converted the .lit file into an EPUB file, the results were lovely...except for one thing: all em dashes were turned into single question marks. Double question marks I can handle, but single question marks are too easily confused with question marks that are actually meant to be in the text.

I did some Googling, and what I read seemed to indicate that my problem would be solved if I specified what the character encoding of the original file was, so I tried that. The problem still wasn't fixed. I noticed that, weirdly enough, although the em dashes were displayed as single question marks on my Nook, they displayed just fine on my computer via calibre.

The file conversion screen in calibre has a section called Search & Replace. I had the .lit file open for viewing in calibre, so I copied the first em dash I found, pasted it into the first "Search Regular Expression" field, and typed "--" (two dashes, no quotation marks) in the replacement text field.

When I converted the .lit file into an EPUB file this time, the results were perfect - the double dashes I used to replace the em dashes were displayed as em dashes on my Nook.

It looks like my record editing skills are good for more than just globally editing large numbers of MARC records. I just hope I don't have to do this sort of thing often - I much prefer my e-books to be perfect and usable right away.

