How PDF.js Works
In this article, we walk you through how PDF.js works to render PDFs, what technologies it uses, and implications for projects using PDF.js.
23 Mar 2020
Brief: If you are looking to migrate from PDF.js, or if you’re just curious about other options, here are some of the best PDF.js alternatives to embed a PDF viewer in a website or an application.
In this article, you’ll learn about some of the key reasons why developers consider switching from PDF.js and how these factors may impact your decision-making, as well as what other open-source, open-source wrappers, and commercial SDK options you can try.
When evaluating an alternative to PDF.js, a good first step is to define what success looks like. This process typically starts by outlining the problems you are facing and what an expected solution will look like. From here you can define objectives and then work backwards to determine the steps needed to achieve your desired outcome.
But even when you go through this process, there can be some uncertainty about choosing the right alternative - what if there is a use case or factor you haven’t considered? Thankfully a recent survey gives us a bit more insight into why businesses and developers move away from PDF.js.
The most common reason developers move away from PDF.js is the challenge faced when trying to add more features or customizations on top of PDF.js. In the previously mentioned survey, over 71% of respondents attempted to add custom features to PDF.js but eventually abandoned their efforts because of the complexity of the PDF spec and the resource time required (learn more about the challenges of customizing PDF.js). The top 5 most common features attempted were:
Adding custom features in a PDF.js viewer is no simple task. A good alternative to building and maintaining these features yourself would be to deploy PDF.js Express as your viewer. PDF.js Express wraps a customizable UI around the PDF.js rendering engine with 26 out-of-the-box annotations, form filling and e-signatures (see the demo). Keep in mind this solution provides the same performance and rendering accuracy as the vanilla PDF.js viewer.
A recent benchmark shows that PDF.js opened 98.6% of PDFs found in the wild. For simple PDFs like invoices or text-based documents, you can expect near perfect reliability. If your workflows include MS Office documents and images, large architectural design or construction plans generated from CAD and BIM solutions, Engineering PDFs generated from Modelling Software or a topographic map with satellite raster backing, you can expect some reliability and performance issues (1-3% of documents failing to open).
Reliability can also be impacted by user behavior. If a user uploads or submits their own files into your workflow, you have likely experienced issues with corrupt PDFs (your viewer may throw an exception and close the document down). This scenario can be especially frustrating as your viewer is perceived as unreliable when in fact it is the user’s corrupt document that has caused the issue.
If reliability is a priority, then it’s important to look for an alternative viewer that has PDF tiling, linearization and parallelization to improve rendering performance. If your workflows encounter corrupt PDFs, look for a viewer that can repair document content on the fly.
10.5% of survey respondents cited PDF.js rendering issues as the main reason for looking for an alternative. In fact, 29% of open issues on GitHub for PDF.js are related to a PDF rendering feature. The cause for these rendering issues was recently investigated in a guide to evaluating PDF.js rendering, which outlines the key factors that result in poor rendering in the PDF.js viewer. Font conversion, image conversion, transparencies, patterns & gradients, OCG layers, color management & zoom are all listed as sources of rendering issues within PDF.js.
|Incorrect Rendering||Correct Rendering|
In certain scenarios, rendering accuracy is especially important: for example -- measuring blueprints on a job site, collaborating on advertising / branding documents or printing to a particular specification. For these use cases, successfully finding an alternative viewer hinges on your ability to assess the underlying rendering engine. Start by asking if the rendering engine is built internally; some commercial solutions build features on top of open-source, where rendering accuracy will stay largely the same. In other situations, rendering occurs after converting to an image server side, resulting in fixed resolutions and mounting infrastructure costs.
A good practice when evaluating a rendering engine is to test a wide assortment of documents. Select a random assortment from your workflows (the more complex the documents, the better) and combine them with public test suite documents. You can find one such test suite released by the Ghent Workgroup, an international body of graphic designers supported by Adobe.
As your application grows, you can expect user requirements to increase in parallel. One of the most common requirements for document viewers is the ability to accept different file types - the most popular file types including MS Office (Word, Excel, PPT), image files (JPG, PNG, TIFF, etc) and in some cases CAD formats.
For a PDF.js, an option exists to use an external solution to convert different file types on-the-fly to PDF and then display them in the viewer. Alternatively you could go with an all-in-one viewer that handles both file conversion and viewing.
Key considerations when supporting additional file types:
When choosing a PDF.js alternative, you have three options to consider: open-source, open-source wrappers, and commercial SDKs.
One solution developers might consider when evaluating alternatives to PDF.js is PDFium - Google’s native open source PDF library and Chrome’s PDF rendering engine. One important note is that PDFium does not provide out-of-the-box client-side rendering in a web app and to enable it requires significant developer resources.
When comparing PDF.js to PDFium, there has been some evidence that rendering performance and accuracy improve with PDFium. In 2017, Dropbox made the switch from PDF.js to a server-side PDFium solution. Their lead developer Jingsi Zhu explains the results:
“The render quality of PDFium surpasses PDF.js on many documents, especially those that use obscure PDF features.”
To gain these performance improvements, Dropbox was willing to allocate developer resources to building a custom PDFium viewer from scratch.This involved customizing text extraction and positioning in PDFium, which Jingsi Zhu noted was trickier than in PDF.js. For example, they encountered challenges accurately drawing text overlays, which required careful study of the PDF standard. To improve performance, they employed multiple optimization techniques, including techniques such as batching requests for metadata and text, rendering more pages than currently visible, and deferring text overlay rendering.
Another consideration when integrating PDFium is that it can be problematic for web applications. PDFium is a native library and the C++ code is required to be compiled into WebAssembly or Asm.js in order to render PDFs client-side, a massive undertaking.. If you are currently using pdf.js within a mobile app, switching to client-side PDFium solution would require integration of nearly 500,000 lines of native code into your solution/codebase.
PDFium has also had its share of security issues, with 88 major vulnerabilities reported within the Common Vulnerabilities and Exposures (CVE) database - roughly 1.5 new vulnerabilities each month.
These vulnerabilities are especially important if you use a commercial SDK that leverages PDFium. In this scenario, patching your application directly is not possible. Those who embed a library that uses PDFium in it will first have to wait on their vendor to update their library, and then update their code. Depending on the responsives of the SDK vendor and customer in patching their solutions, this can leave an organization and users exposed to possible attacks, potentially for months or even years at a time.
SDKs that wrap features around an open-source rendering engine:
Founded in 2011, PSPDFKit developed their solution around the PDFium open-source rendering engine. A fully remote company based in Austria, they provide a wide variety of features within a closed-source viewer that can be customized via API calls.
Launched in 2019, PDF.js Express wraps the PDF.js rendering engine around a customizable viewer. Key features include annotations, form filling and e-signatures (see full PDF.js vs PDF.js Express comparison). The viewer UI is open-source and allows unlimited customizability.
SDKs that provide custom-built rendering engines:
Founded in 1998, PDFTron was recently ranked the top choice for those considering a commercial PDF SDK by Stax. Located in Vancouver, BC, PDFTron SDK offers a wide variety of features and cross-platform compatibility. The company has recently moved into the AI document analysis and recognition space. They also offer a free document application Xodo that has been downloaded over 11 million times on the Google Play Store.
Founded in 2001, Foxit is a chinese-based company that offers a robust feature-set across a variety of platforms. The original developers of PDFium, Foxit worked with Google in 2014 to open-source their proprietary rendering engine. Along with the SDK, they currently have several B2C products including Phantom PDF, Studio Photo, and Foxit Reader.
Datalogics is an enterprise software company formed in 1967 and the official reseller of the Adobe PDF library. In 2010, Adobe selected Datalogics to resell the Adobe Reader Mobile SDK. Datalogics also offers a PDF Java Toolkit and a customized eReader (DL Reader) for iOS, Android, and Windows.
Finding the right PDF.js alternative is largely dependent on your needs and objectives. If you are primarily dealing with simple documents like invoices or receipts and need to add features like annotations, then PDF.js Express could be a good first step. If you have workflows with complex documents, require reliable and fast rendering, or need to support additional document file formats, then a commercial SDK might be your best option. If you are considering a commercial SDK, we've put toghether a how to choose a pdf.js alternative guide to help you with your evaluation.