How PDF.js Works
In this article, we walk you through how PDF.js works to render PDFs, what technologies it uses, and implications for projects using PDF.js.
27 Mar 2020
According to a market research report by Stax Inc. out of 197 PDF SDK customers surveyed, 70% expressed interest in switching SDKs despite the high costs. This surprisingly high number of dissatisfied customers underlines the importance of spending time upfront when evaluating how a solution will perform and scale with your application.
To help you with your decision, we’ve put together this this blog post that goes through the key areas to evaluate when selecting your PDF.js alternative solution:
According to a recent survey of 50+ organizations, the top reason organizations switch from PDF.js is lack of out-of-box features available in the open-source viewer. These features, such as annotations, form filling, and signatures, often prove more difficult and time-intensive than anticipated to support in house.
Therefore, you will want to start your evaluation of an alternative commercial solution by comparing different features sets. When comparing PDF library feature sets, the most important considerations are as follows:
To organize your approach to answering these questions, let’s create a list of all the features that can be included in your viewer. We’ve created a simple feature matrix you can use as a template. But feel to create your own matrix from scratch.
The next step after compiling your feature matrix is to determine what features are immediate needs, future needs, or not relevant. Now create your shortlist by comparing your immediate and future requirements amongst the different PDF SDKs.
Predicting demand for future feature requirements can be a challenging task during your evaluation. Nevertheless, it's an important step given the high percentage of organizations that express a desire to switch their commercial PDF SDK.
A simple example of how your feature requirements can shift could be an uptick in requests from your users to add a feature, like OCRing documents. If your current SDK supports this feature, it’s as easy as dropping it into your application. Otherwise, you’ll have to find an OCR library to merge into your product, requiring developer resources to integrate, test, and maintain the solution. These extra steps can add additional costs and more importantly have an impact on future product release dates.
Lastly, it's also good practice to get a sense of new features being introduced by the solution you are evaluating. You can look at the cadence of its release announcements, study the changelogs, or ask them directly about their product roadmap.
If you require a modern UI, annotations, form filling or e-signatures, PDF.js Express could be a good option. (See how PDF.js compares to PDF.js Express)
Another common reason developers investigate PDF.js alternatives is to support additional file types in their viewer. Indeed, as an embedded viewer grows more popular, users bring a wider set of document requirements. A simple example would be requests to view, mark up, and sign MS Office documents along with PDF documents.
A short-term solution is to convert the different file types server-side before displaying them in the client-side viewer.
But this approach often impacts viewer performance; users can experience longer wait times as files are downloaded, converted to PDF, and uploaded server-side. Server costs will also increase as users upload a wider variety of documents, including large and complex documents. You may also need to dedicate further dev resources to integrate and manage multiple document libraries.
An alternative approach would be to consider an SDK that combines viewing and file conversion in a single solution.
If this approach suits your requirements, then you will want to evaluate the SDK for the following:
To evaluate file types supported, follow the same process we followed to evaluate SDK feature sets; i.e., define your immediate and future file-type requirements, and then use your list to create your shortlist of SDK solutions.
Next, you will want to consider how the SDK converts files -- client-side or server-side? Client-side conversion means the file will never land on a server but is instead converted directly in the browser. This can help you to save on network bandwidth, and server infrastructure costs, especially when users convert a large volume of files. Conversely, SDKs that convert server-side will increase server and storage costs as users grow. One important consideration is that client-side conversion might not provide the same support for all file formats you require (e.g., CAD files). In cases like these it’s good to inquire about hybrid client / server solutions.
Lastly, consider the accuracy of the SDK's file conversion. Each solution will take a different approach to converting files to PDF and you will likely find some variance in quality. To evaluate conversion accuracy, make sure you test a wide assortment of file types on each solution.
When it comes to scaling your app, cross-platform compatibility is a compelling offer. Consider a PDF library that can let you scale your viewer across several platforms if cross-platform support is on your roadmap.
During your evaluation you should consider:
Number of compatible platforms. Is the SDK available on all the major platforms you require? (e.g., Web, iOS, Android, Windows, macOS, and Linux?)
A simple and unified API. Is the AP well-built and is it consistent cross-platform? (i.e. you use similar logic to build and design your solution across several platforms)
Here are some benefits of a consistent cross-platform API:
To learn if an API is unified, start by reading through the API documentation. Check to see if some platforms have features not available on other platforms. A simple way to do this is to add a column to your feature evaluation matrix for each platform. Next, mark off solutions if they have platform-specific documentation for a feature.
You may also wish to take note of platform-specific code samples. Are the samples offered in a language native to that platform? (e.g., Java for Android, Swift for iOS, etc.)
Lastly, as you review the code samples, see if the class/method names and flow are the same.
Commercial SDKs commonly render documents in their viewer in one of two ways:
Through an in-house rendering engine built from the ground up OR by leveraging open-source libraries like PDF.js or PDFium.
A vendor with a rendering engine built entirely in-house has much more control over rendering accuracy and performance, and can usually support issues directly. If you are evaluating an SDK with an internally built rendering engine, it's recommended that you get a sense of their responsiveness by discussing their support practices around accuracy and performance issues.
Alternatively, if a solution has been built around an open-source rendering engine, the vendor may have difficulty responding to accuracy and performance issues. They can either support an issue directly by making contributions to the open-source code-base or by relaying your issue to volunteers in the open-source community.
To get a sense of whether a vendor can support the rendering engine, you can review the contributions they’ve made to the open-source engine via its code collaboration platform.
For example, PDFium uses Gerrit for commit management; you are able to review thousands of merged PDFium contributions here.
To find contributions made by a specific vendor, you can refine your search to something like "status:merged vendor-name".
If you find the vendor does not have a strong contribution history, then that suggests you may need to rely on the open-source community to address specific issues with the rendering engine.
Another important issue is viewer security, as the rendering engine offers one of the largest attack surfaces, entailing hundreds of thousands of lines of code.
In practice, code that is closed-sourced offers a significant security deterrent, as hackers will commonly scour public repositories and security vulnerability databases looking for weaknesses in open-source that offer a good ROI. Should you embed an open-source library with an unpatched vulnerability, you will not be hard to find.
Whether the engine is built internally or via open-source, you will want to test its accuracy. In the next section, we show you how you can do just that.
The longer a user waits for a document to load, the more likely they are to end the sessions. Repeated studies show most users will make a snap decision on whether to switch tasks at about 2-4 seconds, with most users terminating their session at around the 10-second mark.
Needless to say, being able to deliver fast PDF rendering in the browser is one of the most important factors in creating a good user experience.
As you evaluate PDF.js alternatives, consider the types of technologies used within a viewer to improve document load times. Here are three technologies some SDKs utilize to improve their performance:
A final consideration is how the rendering engine handles corrupted PDFs. Some PDF SDKs repair corrupted documents on the fly, preventing the viewer from crashing when it encounters a corrupted file. Having an SDK with a good “repair” engine is especially important if users will upload arbitrary files -- as users are likely to blame the viewer when it crashes and not the corrupted document.
To test rendering performance and reliability, start with a large number of your users' preferred documents and test them across their preferred browsers and devices.
It is a good idea to include the following types of documents as they have more demanding rendering requirements
Interact heavily on documents when testing. Scroll to the middle, zoom in and out, and pan side to side repeatedly and in various places. If you are testing a server-based SDK, replicate anticipated load and usage to gauge performance. What would happen if thousands of users upload and interact with documents? Do things slow down? Will you need more servers?
PDFs are incredibly complex documents, generated in a variety of ways by thousands of different tools. Each PDF SDK will also render PDFs a bit differently -- and some more accurately than others.
Common rendering issues include:
If your users view simple PDFs, like invoices or receipts, testing rendering accuracy is more straightforward. But the task becomes more complex if users need to perform actions such as measurements on construction drawings or collaborate on marketing materials, where image quality at high zoom factors and color management are major considerations.
In the latter scenarios, you may wish to gather a large assortment of documents across all platforms and native applications to capture a wider array of rendering behaviors.
Here is a list of document types that have proven challenging for viewers:
Over the past 10 years, we’ve seen an unprecedented rise of disruptive technology across all sectors. As you start to decipher the factors involved with this explosive growth, you often find a common characteristic in these organizations -- being hyper-focused on improving user experiences.
The same holds true in the document-viewer space, where designing an intuitive and easy-to-use UI is often a make-or-break-it consideration for end users.
If UX is an important requirement for your organization, then you will want to determine how much flexibility you have to customize different viewer elements.
Typically, viewers come with two different ways to customize the UI: via API calls and/or through an open-source UI.
With a viewer solely dependent on an API for UI customization, you face certain limitations. For example, customization will be limited to predefined parameters and you will not be able to perform an in-depth evaluation of what can be improved or adjusted. This will prove problematic if your UI team is especially strict or if you have to meet specific UI requirements, such as compliance with accessibility standards.
In contrast, open-source UIs provide the highest degree of freedom and control, as you have complete access to the source code and therefore complete visibility into what can be improved or adjusted. This control, in turn, provides more flexibility in meeting unique user requirements -- like creating a custom annotation for an approval workflow. You may also benefit from faster development times by incorporating UI elements from past developer contributions.
The final consideration when evaluating an alternative to PDF.js will be to assess how easy it will be to build a successful Proof-of-Concept (PoC). Some factors to consider include:
As you build your PoC, it's important to assess the vendor’s support team. A quick response to your inquiry is a strong indication of support team responsiveness. It’s also good to evaluate how thoroughly the support team handles your inquiry.
Here are a few additional considerations:
In summary, over 70% of developers consider a switch from their current PDF SDK. Therefore, when evaluating a PDF.js alternative, it's important to consider both your short-term and long-term requirements to avoid the mistake of having to perform a costly switch in the future.
Towards that end, follow these steps and procedures as outlined above.
By following these steps, you are sure to avoid past mistakes and build a solution that meets both your short-term and long-term document viewing requirements.