How I self-published a professional paperback and eBook using LaTeX and Pandoc
Introduction
After years working as a Software Engineer I wanted to make my dreams come true, so I quit my desk job to spend my life adventuring around the world. I gambled that I would be able to scrape together a meagre living while enjoying a life I had dreamed of. After driving 40,000 miles through 17 countries from Alaska to Argentina, and then 54,000 miles through 35 countries around Africa, I learned, saw and experienced more than I ever dreamed possible. Five years of global adventures means I enough stories and lessons learned to fill multiple volumes, and I decided I wanted to do just that. I didn’t want the hassle of pitching publishers and all that goes along with that, so I went the self-publishing route, which became a whole new adventure.
In this post I’ll run through all the details from the whys to the hows to a final recap with my thoughts about how my decisions worked out.
Given I used to sling code, I’ll include a lot of code samples used to achieve my goals which you can use too.
This post is not intended to teach you the nitty-gritty details of LaTeX, but rather to give a high(ish) level overview of what can be achieved with the tech-stack I settled on.
CONTENTS:
1. Wait, I can publish my own book?
2. Goals
3. Tech Stack
4. A basic KDP Book in LaTeX
5. Generating an eBook with Pandoc
6. Goal Achieved!
7. The Power of LaTeX
– Power Move One – Conditionals
– Power Move Two – Define your own functions/methods
– Power Move Three – Fuction Overloading
– Power Move Four – Pandoc Verbatim Passthrough with lua script
8. More Powerful Stuff
9. Results – LaTeX Layout Supremacy
10 Results
11. Other Considerations
12. Conclusion
1. Wait, I can publish my own book?
Yes indeed. Amazon’s service Kindle Direct Publishing makes the whole process of self-publishing much, much easier than most people realize. In essence you just upload a pdf of the interior of your book and a pdf containing the front & back cover and spine, and in a few days the book will be for sale, and you can order your own real-life bound book. With barely any extra work you can also submit an industry standard .epub file and your book will also be sold digitally for eReaders.
It’s hard to describe the feeling of holding a published book with your name across the top, though I distinctly remember saying
“Holy S**t, it’s a REAL book!”
There are a few HUGE benefits to going it alone and self-publishing:
- It’s all up to you. The content, formatting, layout, spell checking and literally everything else is under your direct control.
- You can publish anything you want. Maybe you’ve always wanted to write a book about the size of your cat’s furballs, the nifty light-up keyboard you designed and built or simply your journal entries. You can, and nobody will stop you.
- You don’t pay a single cent up front, because the whole thing is “print on demand”. When someone orders a book, Amazon print it, then send it to that customer. You don’t have to pre-purchase 500 or 50,000 copies and have them sitting on a shelf gathering dust while you stress about selling them to get your money back. If the book never sells a single copy, you pay nothing. (But you can order a few copies yourself to give your family )
If and when the book does sell, Amazon handle everything and just deposit money into your bank account each month. - You set your own prices (and therefore profit), within reason. Based on your choices on book size, color or black and white, number of pages and a few other things, Amazon calculate how much it costs to print and send to a customer. It’s your choice how far above that cost price you choose to sell the book for, thus defining your own profit per sale. You can play with Amazon’s calculator here to see exactly the cost for a given book size/type.
There’s also one really big downside to self-publishing:
- It’s all up to you. The content, formatting, layout, spell checking and literally everything else is under your direct control.
2. Goals
- If I was going to self-publish a book, I wanted it to look as professional as possible. While it’s possible to use MS Word to generate the pdf of the interior of the book, I personally think the results look homemade, and I can spot those books a mile away.
- I wanted to generate the pdf for the print book and the .epub file from the same source files. As a developer there’s nothing worse than copy-pasting back and forth, and there’s no way I would tolerate having to fix typos or make edits in multiple places. I wanted to edit the content of the book in exactly one place and generate the two different outputs from that.
- The whole process had to be automated – hopefully with one click or keystoke I would generate a finished pdf AND a fully compliant .epub file with no manual intervention.
3. Tech Stack
After a bunch of research I re-discovered the powerful and free high-quality typesetting language LaTeX, which I had used at University. The more I dug into it, the more I realized it’s a full-blown programming language, and it would certainly give me the pixel-perfect layout control I was looking for, and with very little effort on my end I could create a book that looked as professional as the real thing.
Reading around I came across Pandoc, a free an open source document converter, often called the swiss army knife of document converters. I think that does a disservice, Pandoc is more like 26 and a half swiss army knives glued together, and it’s awesome. If you need to convert <document format A> to <document format B>, Pandoc has your back.
So I decided to write the book in LaTeX which would generate the print-ready pdf, and then run the raw LaTeX files through Pandoc to generate an industry-standard .epub file for the electronic copy.
That sounds easier than it is, but it’s not all that bad.
NOTE: I certainly could have used Markdown or one of the many other document formats that became popular since I graduated in 2004, though I seriously doubt they can provide the level of control I get with LaTeX and the professional-grade layout I achieved.
(I go into more detail on this later)
(Click any code block below to see a larger, more readable version)
4. A basic KDP book in LaTeX
Using the extremely-helpful “Createspace” LaTeX package (found here), it’s straightforward enough to get a bare bones book up and running that is “Amazon compliant” in terms of margins and gutters and page sizes. I created a file called “The Road Chose Me – Volume 2.tex” which looks like:
Line 1 says we’re creating a book and all that implies
Line 3 gives the createspace package the details on the book.
(NOTE: Kindle Direct Publishing (KDP) was previously called Createspace, and now the two terms are more-or-less interchangeable)
- “pocket” means it’s 5.25″ x 8″ in size (There are many sizes to choose from, called ‘trim size’)
- “bleed” false means content does not extend to the very edges of the page (used more for photos/magazines)
- “cream” paper is the color of the paper I choose to print on (instead of white) (Which impacts thickness calculations)
- “color” false is a little self-evident I think.
In just that one line I’ve given the createspace package all the info it needs to set the margins, gutters and a bunch of other stuff, and it will even provide warnings if I violate any of KDP’s rules. For example the warning you see on like 18 is because this tiny example book is only 18 pages in length, while KDP requires a minimum of 25 pages. I’d get similar warnings if I specified a custom gutter that was outside KDP’s mandatory minimums for a given number of pages. Nifty.
(I could specific custom margins and gutters and other values, if I felt the need)
Lines 5 to 10 are standard LaTeX definitions, nothing out of the ordinary.
Lines 12 and 14 include separate chapters of my book. Being a developer I wanted everything in separate source files, and a chapter-at-a-time felt like a really great way to break up the project. This also means when I’m in “writing” mode I don’t have to deal with any LaTeX code, I can just hammer out text and trust the definitions I’ve made in my master file will handle everything.
That file “./chapters/introduction.tex” looks like:
With use of the Lettrine package I get some really nice drop caps at the start of each chapter, and by defining my own chapter headings and subheadings, I get this very neat result (from the pdf of the print book):
5. Generating an eBook with Pandoc
In the same directory as my master LaTeX file I wrote a shell script to run Pandoc and a little extra
the file ./makeEBook.sh looks like this:
Lines 3 and 4 are just removing the output from the last run so I start with a clean slate.
Line 6 … well, yeah, I better explain that.
Pandoc was never designed to parse something as complicated at createspace.sty (which is 917 lines of LaTeX code I can barely parse myself). If we leave it where Pandoc expects to find it, Pandoc will try to parse it and spew errors and problems. So the easy solution is just to rename it so Pandoc can’t find it, then later rename it back again (line 12). Pandoc simply ignores it and moves on without a problem.
Line 8 actually invokes Pandoc to generate the output file (-o) of an .epub from the input file (-f) of the .tex document. I also give Pandoc the –epub-cover-image argument to a pretty jpg, and I use –epub-metadata to ensure the right metadata goes into the final .epub. Lastly I tell Pandoc to generate a Table Of Contents (–toc).
Finally on line 14 I invoke Amazon’s “KindleGen” program which turns the industry-standard .epub file into a proprietary Amazon .mobi file. I found it helpful to run this command as it will spit out errors if the metadata is not set correctly or if there are any other minor glitches in the .epub file that might cause issues later when you upload it to KDP.
NOTE: While writing this I just discovered Amazon no longer offer/support “KindleGen” and instead use “Kindle Previewer“. I’ll have to look into that.
For those who crave completeness, metadata.xml is nothing revolutionary, and looks like:
6. Goal Achieved!
So there you have it. We’ve generated a print-ready pdf file for the paperback edition, and an industry standard and validated .epub file for the digital copy. All done.
Well, no. Not really.
Earlier I said I chose this tool chain because it’s extremely powerful and gives me a TON of flexibility on the layout to satisfy my OCD and make it really professional. Let’s get into some of that now.
NOTE: From now on I won’t show complete code samples, but describe and show how I beefed up what I’ve shown thus far to achieve my goals.
7. The Power of LaTeX
(NOTE: I’m not a LaTeX guru, and I certainly can’t wrangle it to my needs nearly as well as I could a Java BookFactoryFactoryFactory. Chances are I’m about to use some of the wrong terminology, but hopefully I can convey the major points in a way you can understand and follow.)
Also a lot of what I do from here on (and the use of Lettrine above) require the use of LaTeX packages which can be thought of as extensions, or collections of bits n pieces of layout code/control that do cool things. They don’t always have the most verbose names, though with some googling and poking around it’s not too hard to figure out what you need to achieve your goals.
On MacOS I’m using TeX Live Utility to install and manage LaTeX packages, which makes the whole thing fairly painless. Whenever I use code from the LaTeX Stack Exchange I quickly get an error about a missing package – so then I use TeX Live to install it and I’m all good.
As for packages I’ve shown you how the “Lettrine” package does cool drop caps, the “caption” package is handy for captioning images to your liking and I’ll let you guess what the package “multicol” does.
There are mountains of packages, and there is virtually always more than one way to skin the cat to achieve the particular layout you’re looking for. Spend some time searching on tex.stackexchange.com to see many, many code samples and answers that will help you achieve what you’re looking for.
Power Move One – Conditionals
It turns out LaTeX is more-or-less a full-blown programming language, and therefore you can do powerful things like conditionals.
I realized there were times I wanted different results for the print book vs. the eBook, so in my main file “The Road Chose Me – Volume 2.tex” I added a conditional (toggle) called “ebook”. Toggles come from the package etoolbox.
So now “The Road Chose Me – Volume 2.tex” looks like
And anywhere I want in the LaTeX code I can check if that toggle is true or false and do different things based on the state.
In the master file I always leave it false, and toggle it true then false in the shell script ./makeEBook.sh like this
(No comments about my sed magic please. I was sitting on a beach in Mozambique when I wrote that, and I’m proud of it!)
So when regular LaTeX and pdf2latex run they see the conditional as false, but when Pandoc runs it sees the conditional as true, therefore taking a different code path within the LaTeX code. Yes, Pandoc understands these toggles and parses them correctly.
How do I use this toggle for advanced stuff? Stick with me
Power Move Two – Define your own functions/methods
Like all good programming languages, LaTeX lets you define your own functions (usually they’re called macros, but there are other names for them too, depending on the specifics. Nevermind).
Let’s say for example I want to have a nice way to have some kind of pretty break in the text, to signify some kind of natural break. In the eBook (which is just HTML after all) I would just use a horizontal rule, the good ‘ol <hr> tag. But in LaTeX I want to do something different, though I don’t want to have the burden of thinking about which is which when I’m in the flow of writing a chapter.
So I’ll create a macro to do it for me, and I’ll use that eBook conditional to produce a different result in the print-ready pdf and the eBook.
I’ll call this macro “textbreak”, and I can call it in any of my chapters like this
So it’s nice n easy and doesn’t snap me out of the “flow” of writing.
I must define that macro in my main file, “The Road Chose Me – Volume 2.tex” and it must be between \begin{document} and \mainmatter, and before I use it. I also have the macro defined after the toggle is defined. My new macro is defined using the ‘\newcommand’ like this:
So by writing my own macro and checking the conditional I now have slightly different output in the print-ready pdf and the .epub file.
Power Move Three – Function overloading
I don’t really know what it’s called in LaTeX land, but you can overload existing functions (macros) just like in OO programming.
Earlier I showed the use of the “Lettrine” package to get nifty drop caps on the first word of each chapter.
Again, I want the output to be slightly different in the two different output formats. So although I just call “Lettrine” with arguments in each chapter, it actually calls my overloaded version of that function.
To do this you use the ‘\renewcommand’ like so:
My function overload specifies three parameters must be passed, and you’ll notice my overloaded version of the macro only exists when the ebook toggle is true. So for the print-ready pdf LaTeX just uses the standard definition, but for the .epub file Pandoc will use my overloaded version.
Now we’re getting some powerful tools I expect from a programming language!
Wondering what that overloaded version of the “Lettrine” macro looks like?
Read on.
(NOTE: Pandoc understands and parses all of this exactly as it should)
Power Move Four – Pandoc verbatim pass through with lua script
In another extremely powerful feature, it’s possible to define a block of “verbatim” code that will pass right though Pandoc as-is, using a lua script. The final .epub file is just HTML, so that’s what I choose to pass through. My overloaded “lettrine” actually looks like this, using the CSS style to achieve an effect similar to “Lettrine” drop caps.
So my overloaded lettrine says to output parameters 2 and 3 wrapped in some basic HTML and a line before and after that is ‘%%%HTML’.
(If you’re super curious, parameters 2 and 3 are the actual letters of the word being made fancy, parameter 1 is just telling the original Lettrine package exactly how to format it – see the code sample up higher)
Now in my shell script ./makeEBook.sh, I tell Pandoc to use a lua filter while parsing, like so
and finally the file rawHTMLFilter.lua contains:
Which I shamelessly copied from a Pandoc example
But it basically says if the pattern of ‘%%%HTML’ is matched, then just pass it through Pandoc as raw HTML, so we get the result we want in the final .epub file.
I also use the eBook toggle (conditional) to include color images in the eBook version, but only Black and White images in the print-ready pdf. That way I can control the contrast more directly to get the best print results.
(NOTE: Unfortunately Amazon’s KDP service CAN NOT print a book that is partially B&W and partially Color. It’s an all-or-nothing deal, and for my 370 book, all color would be prohibitively expensive, so I have to stick with B&W for all the text AND the images)
8. More powerful stuff
There are a lot more powerful things we can do with this tool chain, including unpacking the .epub file to hand massage it how we please.
An .epub file is just zipped html documents after all.
As I said earlier I’m a bit OCD when it comes to layout and exactly how things look, though when it comes to an .epub file we just have to accept certain limitations. We can’t control the font size or where page breaks will occur on the thirteen millions eReaders that might be used to read our book, but I did go in and change a few things like removing blank lines after HTML ordered lists, stopping copyright symbols forcing a line break, and making sure footnotes were not inside <p> tags, creating a ton of blank lines where I didn’t want them
With some extra code in the ./makeEBook.sh shell script I unpack the .epub file, massage some html, swap in custom master CSS document and then bundle the whole thing back up into a .epub file.
The code to do this is more nasty sed and perl magic, so I’ll just give you an idea here…
9. Results – LaTeX Layout Supremacy
I mentioned earlier that I feel LaTeX provides much more control and power over the final layout than other document formats can. This is most noticeable in the print-ready pdf which becomes the printed book. Here are a few points to consider if you’re choosing a document format (or language.. or whatever they’re called) if you want to print a book
- Gutters are not the same on right and left-hand pages. Because of the way books are bound, a different amount of “blank space” is required on the “inside” of each page – the side of the page that is in the middle of the book where it’s bound.
Right out of the box LaTeX takes care of this and generates a pdf that has different gutters on left hand pages (where the binding is on the right) and right hand pages (where the binding is on the left).
- With a single line I defined that all chapters start on a right hand page. This means sometimes a blank page is inserted, and I really like the way this gives a little more space between chapters, and the consistency of always having chapters start on right hand pages.
- A Table Of Contents for the print book can be generated in a single line of code, and it automatically includes all Chapters. With more LaTeX code the exact look of the TOC can be customized to your heart’s content
- Headers and footers can be heavily customized, and they can be customized for different page types. The following choices are completely arbitrary, and they’re just what I wanted for my book. You can do whatever you want.
I start counting page numbers after the into stuff and after the TOC, and then I add page numbers to the top of right hand pages, BUT NOT on blank pages (from chapters being on the right only) and NOT on chapter heading pages.
I also add the title of the book on the top of left hand pages, but again NOT on “special” pages as above. - Footnotes are handled beautifully, and produce a small superscript auto-incrementing number in the text, and the actual body of the footnote is rendered on the bottom of the corresponding page after a small horizontal line. These results can be customized (numbers per chapter, all footnotes at the end of a chapter / end of the entire book, etc. etc.)
Pandoc parses the LaTeX footnotes with no additional effort, also creating clickable links in the eBook, and all external URLs work perfectly. - Inclusion of images and captions is achieved with a couple of packages (graphicx, caption), and I’m very pleased with the results.
- “Professionally” published books always include a page at the start with all the publisher info and some copyright stuff (called “Front Matter” in the publishing world). I generated my own version purely in LaTeX which also translates into the eBook, and I feel it really adds to the professionalism of the whole thing, making it very hard to tell this book was not published by one of the big players.
10. Other Considerations
If you’re seriously considering going down this route, here are a few other things you’ll need to tackle which are outside the scope of this article.
- Getting an ISBN for your book. If you’re only publishing an ebook KDP can assign you one for free, but if you’re printing a paperback you’ll need to get your own. The process is different in each country. Here in Canada we get 10 for free from Library and Archives Canada, in the US they can be purchased from various outlets.
I personally think having a “real” ISBN makes your book more professional, and I recommend it. - Once you have an ISBN you can either create your own barcode of the back of the book, or leave a blank space and Amazon will generate one for you. I created my own because I had some color copies printed from a different printer, but if I was just using Amazon to print the book I didn’t need to do that.
- The front cover, back cover and spine of your book will really define how professional it looks. If your skills of graphic design are anything like mine I highly recommend paying a graphic designer to create this for you. Amazon have all the specs on sizes and trim and bleed here. Pay close attention to the spine width, it is calculated from the number of pages, paper type and if it’s color or B&W.
- LaTeX editor – there are many too chose from, each with their own pros and cons. I’m using the Eclipse plugin “TeXlipse” which works well enough. I really like that each time I hit save the LaTeX tools are invoked a few seconds later I have the finished pdf file built.
I could have added my shell script “./makeEBook.sh” to that auto-build process, but I found I really didn’t need to.
I’m sure there are many other options. - The .epub file generated by Pandoc is industry standard and can be uploaded to other digital book marketplaces for sale. I uploaded mine to Apple Books using Apple’s tool “iTunes Producer” and I also uploaded it to the Kobo marketplace, but had no sales success there.
11. Conclusion
Overall I’m extremely happy with my choice to use LaTeX and Pandoc to generate my book. LaTeX gives me the pixel-perfect and “pretty” design and layout to satisfy my perfection, and Pandoc effortlessly generates an .epub file in the standard format. The layout of the print books came out so well I think anyone not in the publishing world would have a hard time recognizing my books are not published by a big name publisher.
Best of all, when it came to writing my second book, I had all the heavy lifting in place, and just needed to swap out the actual content to produce a result that is identical. The two books look fantastic side by side, which I’m really happy about.
I’m 100% certain Volume 3 will be produced the same way….. now I just need to have another adventure (it’s in the works!)
The Road Chose Me Volume 1: Two years and 40,000 miles from Alaska to Argentina
The Road Chose Me Volume 2: Three years and 54,000 miles around Africa
Are both available now in paperback and eBook format from Amazon:
Thanks for reading, I’m happy to answer any questions!
-Dan
1 Response
[…] CommentsRead More […]