How PDF.js Works
In this article, we walk you through how PDF.js works to render PDFs, what technologies it uses, and implications for projects using PDF.js.
15 May 2020
<canvas> element. Web Workers handle a lot of the processing.
The usual place to encounter PDF.js is as Firefox’s default PDF reader. But as PDF.js works almost anywhere, it can be found in many websites and applications, where it is used to view mostly small and straightforward PDFs such as invoices and reports.
Slack, for example, uses PDF.js to view PDFs from directly within its application. PDF.js also features within Zenodo.org, an open science database, where PDF.js is used to view PDF reports, presentations, and datasets. Dropbox formerly used PDF.js to power its document previews, building annotations on top, before switching to an alternative.
“We intentionally kept our scope narrow: display and text selection support for small PDF files.”
PDF.js was originally launched in 2011 as an HTML5 technology experiment by the Mozilla Foundation, spearheaded by former Mozilla CTO Andreas Gal and Mozilla developer Chris Jones, and supported by Mozilla Labs.
Gal wrote that with literally billions of PDFs floating around on the web, Mozilla wanted a way to render these directly in the browser securely.
Formerly, users would default to their desktop readers -- or use security challenged plugins, like Silverlight or Acrobat.
PDF.js was later integrated into Mozilla Firefox, first as an extension and then as the built-in PDF viewer for versions 19+.
However, since integrating PDF.js into Firefox, Mozilla seems to have lost interest and moved on.
PDF.js creators Andreas Gal and Chris Jones have since left Mozilla. And Mozilla briefly considered replacing PDF.js in Firefox circa 2016 via Project Mortar, suspended after it was considered too time-intensive to support and maintain.
“…PDFs are not a fundamental part of the web. Therefore, Mozilla would like to not have to spend a lot of effort supporting them. …despite a lot of effort [PDF.js] still has two significant shortcomings: it is a little on the slow side, and it doesn't support all PDF functionality, including form-filling. Fixing those shortcomings would be a lot of work. (PDF is a huge format.)“
An open-source community with about 300 known contributors and a handful of very active contributors now supports and maintains PDF.js. These volunteers have gradually extended PDF.js to include new viewer display features, more PDF rendering features, and so on.
As a result, PDF.js has gotten incrementally better than it was at launch. And it compares favorably today with other open-source libraries such as Google and Foxit’s PDFium in terms of security and initialization time due to a more compact PDF.js web package.
Additionally, unlike some other libraries, PDF.js supports document streaming, enabling files optimized for fast web view to display for the user almost instantly -- without having to wait on the entire file to download first, which with some files, can take minutes.
PDFs can then be loaded into the PDF.js viewer via a web server.
Developers and integrators also embed PDF.js to enable PDF rendering in a web application or server, used to pass images and text down to the client.
They may also attempt a custom UI over top of PDF.js and extend it with functionality such as annotations, form filling, and e-signatures not supported by the open-source project.
Check out these guides to learn more about quick & easy ways to use PDF.js:
Some relatively simple features you can quickly add to PDF.js include:
The PDF.js project is available on GitHub, a popular software development platform owned by Microsoft.
The prebuilt version is available for a fast deployment -- or one can get the source code for deeper customization. (The PDF.js community asks not to use the prebuilt viewer UI “as is” without restyling it.)
Inside the download package, PDF.js is organized into three layers:
Core - In charge of parsing and interpreting PDF binary instructions for the browser.
Display - An API exposing functions to render PDFs into a viewport.
Viewer - A sample user interface that supports basic viewer features like search, rotate, zoom, a page thumbnail sidebar, different view modes, and so on.
The Mozilla team confronted a few challenges when writing PDF.js.
First was a PDF spec over 1,000 pages long -- second, a browser
<canvas> element that did not support all of the PDF graphics model.
As noted by PDF.js creator Andreas Gal and others on the Mozilla dev team, canvas did not support certain PDF patterns, gradients, and transparencies. Mozilla had to develop their own workaround and extensions.
Additionally, since the canvas part of HTML was not intended to support interactive text, the team would have to design a text overlay to enable text select, text search, and copy/paste, as well as accessibility features for applications such as text-to-speech.
Implementing fast, high-quality printing via PDF.js would also be a challenge as canvas would pass PDF information to the printer as images rather than vectors.
To compensate for these canvas limitations, the PDF.js community considered developing a Scalable Vector Graphics (SVG) backend for PDF.js. The SVG backend today is not as developed, accurate, or fast as the canvas backend, particularly for complex documents. And thus most developers use PDF.js with canvas.
In more detail, challenges within PDF.js today include:
Rendering Accuracy - PDF.js does not support the full PDF specification and some of its support for rendering features is incomplete. Areas with multiple open support issues on GitHub or outstanding feature requests include:
Check out this article on PDF.js rendering issues for a deep dive including potential implications for a project.
OCGs - PDF.js does not support Optional Content Groups (OCGs) an open feature request on the PDF.js GitHub since July 2011. OCGs are commonly used to enable toggleable visual layers in documents such as maps, multilingual documents, and technical drawings. Without OCG support, PDFs may render with missing or inaccessible information, or they may render very slow due to having to render layers normally switched off by default (such as satellite images in elevation maps).
Text Select/Search - Text select is the #1 leading support issue on the PDF.js GitHub with 90+ open issues as of writing.
These issues largely stem from how the PDF.js rendering engine draws the text overlay. Text search is also unreliable, particularly when searching phrases with extra white spaces or a line break between words.
Image Quality at High Zoom - Users have reported difficulty reading small text or measuring within large and complex documents with PDF.js due to blurriness, apparent at zoom magnification factors of 400%+.
These issues stem from PDF.js resizing images rather than vector due to the absence of canvas ‘tiling’, an open feature request on Github since September 2015.
Reliability - According to one analysis, a small fraction of documents (1-3%) will crash the PDF.js viewer or idle it indefinitely, either due to the PDF being corrupt or too complex.
Older Browsers - While PDF.js offers some cross-browser support, the community states within the FAQ that it cannot support all browsers due to many present and outstanding issues. They will support the latest versions of Chrome, Firefox, or Edge, for example, but not older versions of Internet Explorer (e.g., IE9/10), Safari, or mobile browsers.
Check out our article on what browsers are supported by PDF.js for a deep dive.
Performance - While it initializes very quickly, PDF.js overall performance is not the best according to many sources, including Mozilla. In one survey, 21% of 57 organizations who responded cited performance was too slow as their primary reason for seeking an alternative to PDF.js, a solution they tried but could not meet their UX requirements. PDF.js is also reported to struggle with graphics-heavy PDFs according to another benchmark published on hacks.mozilla.org.
PDF.js also offers limited support for features that would smooth out the reading experience on mobile.
For example: the PDF.js viewer in presentation mode does not react to gestures such as press, swipe, pinch zoom, etc. -- functions deemed outside the scope of the project. Mobile pinch zoom in general has remained an open feature request on the PDF.js GitHub since January 2013.
PDF.js currently has a dark theme. But it does not support a night mode, another open feature request for years due to being unable to change the PDF.js viewer text and background color.
A challenge for those developing on top of PDF.js is extending it with features such as annotations, form filling, and signatures. These features are not supported out-of-the-box with PDF.js, which is why PDF.js Express was created, to simplify the addition of these features to a PDF.js viewer.
According to the previous survey, 57 unique organizations reported the most common features they wanted to add to PDF.js were…
Of these respondents, 71.4% tried to add some of these PDF.js features themselves but ultimately found it too difficult or time-intensive to build, support, and/or maintain.
These difficulties stem from the fact that PDF.js was not intended as an easily extensible component in another commercial product:
“PDF.js was designed to be Firefox’s integrated PDF viewer, rather than a component of another product, so it provided limited support for our use case…”
For instance, PDF.js does not support an API for adding features like annotations to the UI. And the project has little documentation to walk developers through adding these features. Existing documentation is incomplete and stale (with broken links). PDF.js support is very fast, generally, with responses to support forum requests in a day or two. But for more complex issues (e.g., adding annotation, signature, form filling, etc.) responses can take longer.
Adding features to PDF.js such as annotations thus requires familiarizing oneself with the PDF.js code base, which, in turn, assumes an understanding of the PDF specification.
The PDF.js contributor community also considers PDF.js a PDF reader only. It therefore considers out of scope, does not support, and has no plans of developing features that would embed annotations, manipulate pages, redact content, and so on, manually or programmatically. When developing these features in-house, you are thus largely on your own and responsible for fixes and support.
Other features, such as interactive forms and form filling, have been on the community roadmap for years.
The practical implications of PDF.js being a PDF reader only for developers are as follows:
You can use PDF.js to enable rendering of annotations already within the document file -- or in an annotations overlay via a separate JSON or XFDF file. But it would not be possible to burn these annotations into the file with PDF.js.
Likewise, PDF.js will not let you fill PDF forms directly. One would have to create an overlay to capture form information (a method we use in PDF.js Express via XFDF). Burning form data into the PDF would require an external workaround or another library to edit that form data into the document itself.
Determining whether PDF.js is the right fit for your project necessitates assessment of a few considerations:
Document rendering requirements
For example, do your files require color management and accurate, high-fidelity printing? Will documents make use of spot colors or complex transparencies?
UX requirements in terms of performance and reliability
Do users expect to use many larger and more complex documents, such as annual reports, maps, designs, technical drawings, PowerPoints with videos and big images, and so on? What is their level of tolerance for documents that render slowly, or that crash or freeze the app?
Desired UI features
Are users satisfied with basic viewing capabilities? Or will they want annotations, form filling, and e-signatures to mark up and comment on their documents?
File format requirements
PDF.js will open and render only PDFs. But users may also wish to view their MS Office, image, and CAD files, and more.
For example, are users required to work on an older version of Internet Explorer due to a corporate policy?
PDF.js is not yet mobile-friendly, but users may wish to tap in from their mobile devices to read and inspect more complex documents, like drawings, manuals, and so on.
Available team bandwidth
Additional team resources will have to be dedicated to learning, building, supporting, and maintaining any customizations built on top of PDF.js, which may also impact the completion date, cost, and time to market for new features.
We provide additional tips & pointers on how to perform your PDF.js assessment in our build vs. buy guide to PDF.js.
In summary, PDF.js will work well under the following conditions:
While PDF.js is adequate for many projects, it isn’t suited to everyone.
Those curious about comparing or actively seeking PDF.js alternatives can check out our guide to PDF.js alternatives.
A PDF.js alternative we offer is our commercial wrapper product PDF.js Express.
PDF.js Express simplifies implementation of additional PDF capabilities, like annotations, filling forms, and e-signatures via a modern React-based UI wrapped around the PDF.js rendering engine.
The solution is, therefore, ideal for those wishing to drop in basic PDF viewing, annotating, and e-signing capabilities into their software. Additionally, Express is fully future-proofed with an easy upgrade path to a commercial-grade PDF SDK should your feature, platform, or file format requirements evolve.
For more information on PDF.js, consider the following tutorials and guides:
If you have any questions about implementing PDF.js Express in your project, please contact us and we will be happy to help!