Markdown to PDF: Making It a Painless Process
Markdown is so easy to write, and PDFs are so easy to pass around. So going from Markdown to PDF should be easy, right? Well, it took some doing, but here’s something that should help folks out with the process.
March 6, 2020
A Little History
Back in December of 2018, I wrote Convert Markdown to PDF with Pandoc and LaTeX, and showed how I got the Markdown to PDF process working, which saved the technical writers at my company hours and hours of work. Being able to whip through the process in seconds instead of days made life a whole lot easier.
But, there were a few issues. As I said previously, I wasn’t happy with the headings. Our finished PDFs had a lot of vertical white space between headings and code blocks. Another weird issue was that LaTeX didn’t really have equivalents to what is in HTML. Instead of <h1>
and <h2>
, LaTeX has section and subsection (with some other names for smaller headings). But they really only go down as far as <h5>
.
While we could live with some of these idiosyncrasies, I was having trouble with a particular project. There was a character that just wouldn’t render, regardless of which font I was using. As part of the ensuing ruckus, I began to look for other ways to convert Markdown to PDF.
Convert Markdown to PDF in a New Way
What I found was a similar process, and the moving parts were things I knew oodles more about than LaTex. It dawned on me one day that I could use Pandoc to get from Markdown to HTML, then use a CSS template (not LaTeX) and something called WeasyPrint to get from there to a finished PDF. So I charged off in that direction, and a week or so later had some beautiful results.
So how does one go about convert Markdown to PDF? Well, it’s a fairly straightforward operation. It’s a two-step process now though, so the easiest way I’ve found is to make a Markdown file, then run a script I wrote on it. That script prompts for some input, then executes a couple of commands. I make changes as I need to, but this is how the script sits at the moment…
The Markdown to PDF Script
#!/bin/bash
# First, the script prints out a blank line, just to have a bit of a buffer between the command prompt and the script output.
echo " "
# Now it lists out all of the files in the current directory that have a .md extension. It does not print out the .md
ls -a *.md | cut -d "." -f 1
# This is prompting us for the file we want to convert. Copy the one from the output of the ls command and paste it here.
read -p "Filename: " filename
# Now the script will account for whether or not we want a Landscape or Portrait layout
PS3='Please enter your choice: '
options=("Landscape" "Portrait")
select opt in "${options[@]}"
do
case $opt in
"Landscape")
while true; do
filename='README'
# This is what happens if we picked Landscape (it uses the style-landscape.css stylesheet)
pandoc -s --template="templates/default.html" -f markdown-smart --toc -c style-landscape.css "$filename.md" -o "$filename.html"
python3 -m weasyprint "$filename.html" "$filename.pdf"
echo " "
echo " "
read -p "Press [Enter] key to make another PDF, or [Ctrl + C] to kill the script"
done
;;
"Portrait")
while true; do
filename='README'
# This is what happens if we picked Portrait (it uses the style-portrait.css stylesheet)
pandoc -s --template="templates/default.html" -f markdown-smart --toc -c style-portrait.css "$filename.md" -o "$filename.html"
python3 -m weasyprint "$filename.html" "$filename.pdf"
echo " "
echo " "
read -p "Press [Enter] key to make another PDF, or [Ctrl + C] to kill the script"
done
;;
*) echo "invalid option $REPLY";;
esac
done
# Either way, if we make a change to the Markdown, we can just hit Enter again to recreate the PDF,
# using all of the same options we selected before (filename, layout, etc.)
# When we're done, we can just hit Ctrl-c to kill the script.
There are comments in this Bash script that explain how it works. Essentially though, it spits out a list of markdown files in the directory where we’re sitting, prompts us to copy and paste a filename, asks for a layout type, then uses Pandoc to make the appropriate HTML and finally utilizes WeasyPrint to finally create the PDF.
I won’t post the whole CSS here, but I’ll run through some of the things that might make life easier for anyone trying to give this a spin. There’s a README.pdf that explains it all too, with examples.
The Markdown to PDF CSS Template
Declaring Fonts
Right up near the top of the file, we declare fonts. I’ve got the ones that IBM open sourced a couple years ago, Plex. There are sans (for most everything) and monospace (for code and preformatted text) fonts I’ve declared. But you can plug in any fonts you want there. Just make sure to grab the regular, bold, and italic version of each, and put them in the same directory that I’ve stuck the Plex fonts in. Then refer to them the same way I did.
Styling Different Pages
Below the fonts declaration, there’s a @page :first
section. This is the cover page. There is a background image declared here, (the FossFolks logo in the example PDF). I’ve set an image size of 450×300 and gotten it working for me. But if you want a different size image, you’ll have to finagle the margins, playing with them until the image sits where you want it to. Moving forward, on other PDFs, you should probably stick to the same sized images so you don’t have to keep dorking with CSS.
The next type of page that’s declared is @page no-chapter
. This is the Table of Contents page, and is set up pretty much the same as the regular pages. Change things here (to do something like get rid of the logo and page number in the lower right maybe) if you want, and it won’t affect the rest of the document.
Next up is the @page
. Anything after the Table of Contents page(s) in the PDF is effected. This looks about like the TOC does, but we can tweak here to alter the rest of the document.
At last there’s @page :blank
. Honestly, I don’t quite recall what exactly this effects. When I figure it out I’ll update the README.
The remainder of the stylesheet should look familiar to anyone who knows CSS.
Table of Contents Problem
There was one wee little issue, with the table of contents. When a list (the H3 headings are the list items) went over a page break, the items on the first page’s part got bumped up a bit.
The fix is to edit boxes.py
. You’ll have to hunt for it, but it’s sitting in whichever directory WeasyPrint got installed into. Try this to find it:
sudo find / -name boxes.py
It should be somewhere like: /usr/local/lib/python3.x/dist-packages/weasyprint/formatting_structure
(on a Linux machine), or /usr/local/lib/python3.x/site-packages/weasyprint/formatting_structure/
(on a Mac).
We’re looking for something in the vicinity of line 320-350 of that file (which may change in future versions) that reads:
if (start or end) and old_style == self.style:
It essentially means “if something is equal to something else”, and we need it to say “if something is NOT equal to something else” instead. We do it by replacing the first of those equals signs with an exclamation point, like this:
if (start or end) and old_style != self.style:
Rendering the TOC should work fine after this change.
The HTML Template
There really isn’t much else we have to mess with. In the templates
directory you’ll find a default.html
where we are able to customize some different things that show up. There’s a title, subtitle, author, email, and date, and we can see where those are showing up in a finished PDF. If we wanted to edit them though, to have something different showing up on the cover page, this is where we’d do it.
Markdown to PDF Summary
It’s a fairly slick process, especially considering where I started. The Pandoc and WeasyPrint communities were wicked helpful. Now, instead of fighting to squeeze a good looking PDF out (and having to really learn LaTeX), I can cough up a pretty cool looking PDF with some CSS, which is something I’m way more familiar with. And I’m hoping by sharing it here that I can save someone else from banging their head on a desk trying to figure it out starting from zero.
You can grab the template from this Github repository. Put it through it’s paces, and let me know how you make out.
Previous
Leave a Reply