How to efficiently transfer HTML to PDF
Converting HTML to PDF.
When converting HTML with css to pdf people often encounter to a problems, wether is the font rendering, css-float, positioning elements to problems with memory on server side. Main question is often how do i really need to write css for PDF, does it has some hidden features that will make all things work.
Well the answer is that there is no shortcuts. Some libraries will do most job for you if you keep html simple, but when you need to change something to more complicated you will probably encounter different results with rendering, converting time, memory usage and so on.
People often use converters on server side because it has more libraries, and you can more easily and directly store data on a server, or pass additional headers to show content of PDF in browser. There aren’t many client side converters. Since converters are expensive if you are using simple HTML it would be better to use client side script to free up server resources. In this example we are only going to examine server side converters.
Let’s get started.
We will examine converting time, memory consuption, and visual look for couple of HTML renderers.
We are going to use 3 templates in this example. One is fairly simple, and other two are more complex regarding code structure, css or the amount of data that needs to be converted. We are going to rate results from 1 to 5 scale, 1 is the poor result and 5 is the excellent results. All of the test was done on same machine (running Linux os) couple of times, and average results was taken for each group of data. We will only valuate free open source PDF converters
You can preview HTML templates that we used in our testings
Simple HTML
Complex HTML
Long HTML
Installation
DOMPDF:
(Os: universal, runs in php) Very easy installation (Under one minute). It uses composer to download and install dependencies. It also has a zip file if you don’t want to use it with composer. It’s a mostly a CSS 2.1 compliant HTML layout and rendering engine written in PHP. It’s a style-driven renderer: it will download and read external stylesheets, inline style tags, and the style attributes of individual HTML elements. It also supports most presentational HTML attributes.
wkhtmltopdf:
(os: windows, mac os, linux, “SmartOS “, “OpenBSD”). Easy download and installation, under a minute. Uses Qt WebKit rendering engine. “These run entirely “headless” and do not require a display or display service.”. In this demonstration we are going to use 64 bit version
mPDF:
(os: universal, runs in php). Easy download and installation, under a minute. It uses composer to download and install dependencies. mPDF is a PHP class which generates PDF files from UTF-8 encoded HTML. It is based on FPDF and HTML2FPDF, with a number of enhancements.
TCPDF:
(os: universal, runs in php). Easy download and installation, under a minute. No dependency required to generate simple HTML. NOTE: There is a new version of TCPDF under development, but i was having difficulties running this new development version.
Testing:
Getting into details
As seen in results if you are going to use simple html for example simple invoices for generating PDF you will probably prefer mpdf. It provided excellent results, with 16 mb of usage, wkhtmltopdf did a decent job too, generating in half time of mpdf but used twice as much memory as mpdf. DOMPDF and tcpdf would require a html optimization to get to desire results.
For a complex html we definitely recommend wkhtmltopdf. Although it uses large amount of memory, results were excellent. Perhaps running it as separate service is good choice, but that depends on how much often you expect to generate complex PDFs. It also took some time to generate PDF, around 18 seconds, but if that’s the cost to have a excellent result i’ll take it. Mpdf was struggling to render correctly column sizes, and it lack some css details, but hey if you don’t need them you can live with mpdf too because it used only 23mb of memory in compared to wkhtmltopdf which used 108. Other libraries are not so worth to be mentioned here.
In long example we used pretty simple html, enough to generate around 15 pages. Peple often need to generate large but simple PDFs. For example some database export. Once again wkhtmltopdf and mpdf delivered good results. Wkhtmltopdf took the win in speed and memory taking only 654ms to generate pdf and 21mb of ram, while mpdf took about 7 seconds and a bit more memory around 80mb, both delivered excellent results. It seems like tcpdf would run forever but around 2 minutes of execution delivered good results using only 6mb of ram, but that time is simply too much for anyone to wait and seems to be funny when we look at half second for wkhtmltopdf. Dompdf produced out of memory exception, which is often the problem whit DOMPDF as i can see on stackoverflow.
It’s worth to mention that wkhtmltopdf has some other features like running Javascript inside, and you can get page number from wkhtmltopdf where you can inject page number to certain html element you choose. You can also provide headers and footers just like in tcpdf library. Wkhtmltopdf gives even more variables to play with it.
Conclusion
Although results vary in many ways, it’s safe to say that mpdf and wkhtmltopdf will get you where you need quickly and with good results. For DOMPDF it will take you some time to add for example fonts to PDF, and to satisfy html to have a good looking PDF. We used dompdf in some simple renderings, but we quickly change it to wkhtmltopdf when it comes to more complex solutions. Of course there are maybe some better PDF converters online but I would say by googling these 4–5 pdf converters are mainly in conversations.
Credits:
https://www.nextstepwebs.com/open-source/invoice (simple example)
https://datatables.net/examples/basic_init/scroll_y.html (Long table)
Stackoverflow for getting commands to test time and memory usage of wkhtmltopdf
FOR MORE INFORMATION CHECK: http://plavatvornica.com/how-to-efficiently-transfer-html-to-pdf/