How to Choose a PDF.js Alternative

27 Mar 2020

author
Nick Johansson

According to a market research report by Stax Inc. out of 197 PDF SDK customers surveyed, 70% expressed interest in switching SDKs despite the high costs. This surprisingly high number of dissatisfied customers underlines the importance of spending time upfront when evaluating how a solution will perform and scale with your application.

To help you with your decision, we’ve put together this this blog post that goes through the key areas to evaluate when selecting your PDF.js alternative solution:

  • Features
  • File formats
  • Platforms
  • Open-source vs closed-source rendering
  • Rendering performance and reliability
  • Rendering accuracy
  • UI customization
  • Building your proof-of-concept

Out-of-the-box features

According to a recent survey of 50+ organizations, the top reason organizations switch from PDF.js is lack of out-of-box features available in the open-source viewer. These features, such as annotations, form filling, and signatures, often prove more difficult and time-intensive than anticipated to support in house.

Therefore, you will want to start your evaluation of an alternative commercial solution by comparing different features sets. When comparing PDF library feature sets, the most important considerations are as follows:

  • Does the library meet your immediate feature requirements out of box?
  • Does it have a robust feature set to cover off your future feature requirements as you and your end-user’s needs evolve?
  • Has the commercial solution demonstrated the ability to consistently release new and innovative features to their library?

To organize your approach to answering these questions, let’s create a list of all the features that can be included in your viewer. We’ve created a simple feature matrix you can use as a template. But feel to create your own matrix from scratch.

The next step after compiling your feature matrix is to determine what features are immediate needs, future needs, or not relevant. Now create your shortlist by comparing your immediate and future requirements amongst the different PDF SDKs.

Predicting demand for future feature requirements can be a challenging task during your evaluation. Nevertheless, it's an important step given the high percentage of organizations that express a desire to switch their commercial PDF SDK.

A simple example of how your feature requirements can shift could be an uptick in requests from your users to add a feature, like OCRing documents. If your current SDK supports this feature, it’s as easy as dropping it into your application. Otherwise, you’ll have to find an OCR library to merge into your product, requiring developer resources to integrate, test, and maintain the solution. These extra steps can add additional costs and more importantly have an impact on future product release dates.

Lastly, it's also good practice to get a sense of new features being introduced by the solution you are evaluating. You can look at the cadence of its release announcements, study the changelogs, or ask them directly about their product roadmap.

If you require a modern UI, annotations, form filling or e-signatures, PDF.js Express could be a good option. (See how PDF.js compares to PDF.js Express)

Available file-types

Another common reason developers investigate PDF.js alternatives is to support additional file types in their viewer. Indeed, as an embedded viewer grows more popular, users bring a wider set of document requirements. A simple example would be requests to view, mark up, and sign MS Office documents along with PDF documents.

A short-term solution is to convert the different file types server-side before displaying them in the client-side viewer.

But this approach often impacts viewer performance; users can experience longer wait times as files are downloaded, converted to PDF, and uploaded server-side. Server costs will also increase as users upload a wider variety of documents, including large and complex documents. You may also need to dedicate further dev resources to integrate and manage multiple document libraries.

An alternative approach would be to consider an SDK that combines viewing and file conversion in a single solution.

If this approach suits your requirements, then you will want to evaluate the SDK for the following:

  • Number of files types supported
  • Type of conversions: server-side vs client-side
  • Conversion accuracy

To evaluate file types supported, follow the same process we followed to evaluate SDK feature sets; i.e., define your immediate and future file-type requirements, and then use your list to create your shortlist of SDK solutions.

Next, you will want to consider how the SDK converts files -- client-side or server-side? Client-side conversion means the file will never land on a server but is instead converted directly in the browser. This can help you to save on network bandwidth, and server infrastructure costs, especially when users convert a large volume of files. Conversely, SDKs that convert server-side will increase server and storage costs as users grow. One important consideration is that client-side conversion might not provide the same support for all file formats you require (e.g., CAD files). In cases like these it’s good to inquire about hybrid client / server solutions.

Lastly, consider the accuracy of the SDK's file conversion. Each solution will take a different approach to converting files to PDF and you will likely find some variance in quality. To evaluate conversion accuracy, make sure you test a wide assortment of file types on each solution.

Platform compatibility

When it comes to scaling your app, cross-platform compatibility is a compelling offer. Consider a PDF library that can let you scale your viewer across several platforms if cross-platform support is on your roadmap.

During your evaluation you should consider:

Number of compatible platforms. Is the SDK available on all the major platforms you require? (e.g., Web, iOS, Android, Windows, macOS, and Linux?)

A simple and unified API. Is the AP well-built and is it consistent cross-platform? (i.e. you use similar logic to build and design your solution across several platforms)

Here are some benefits of a consistent cross-platform API:

  1. Consistent UX: Since the APIs are aligned cross-platform with similar classes and methods, users on different platforms will share access to the same features and capabilities.
  2. Faster development: Since the APIs are unified, learning curves are shortened. Devs can study and copy solutions developed on one platform and easily scale to new platforms.
  3. Streamlined Quality Assurance: when releasing product updates, devs spend less time testing a single unified solution vs maintaining multiple different solutions that might not be designed to work together.

To learn if an API is unified, start by reading through the API documentation. Check to see if some platforms have features not available on other platforms. A simple way to do this is to add a column to your feature evaluation matrix for each platform. Next, mark off solutions if they have platform-specific documentation for a feature.

You may also wish to take note of platform-specific code samples. Are the samples offered in a language native to that platform? (e.g., Java for Android, Swift for iOS, etc.)

Lastly, as you review the code samples, see if the class/method names and flow are the same.

Open-source vs closed-source rendering

Commercial SDKs commonly render documents in their viewer in one of two ways:

Through an in-house rendering engine built from the ground up OR by leveraging open-source libraries like PDF.js or PDFium.

A vendor with a rendering engine built entirely in-house has much more control over rendering accuracy and performance, and can usually support issues directly. If you are evaluating an SDK with an internally built rendering engine, it's recommended that you get a sense of their responsiveness by discussing their support practices around accuracy and performance issues.

Alternatively, if a solution has been built around an open-source rendering engine, the vendor may have difficulty responding to accuracy and performance issues. They can either support an issue directly by making contributions to the open-source code-base or by relaying your issue to volunteers in the open-source community.

To get a sense of whether a vendor can support the rendering engine, you can review the contributions they’ve made to the open-source engine via its code collaboration platform.

For example, PDFium uses Gerrit for commit management; you are able to review thousands of merged PDFium contributions here.

To find contributions made by a specific vendor, you can refine your search to something like "status:merged vendor-name".

If you find the vendor does not have a strong contribution history, then that suggests you may need to rely on the open-source community to address specific issues with the rendering engine.

Another important issue is viewer security, as the rendering engine offers one of the largest attack surfaces, entailing hundreds of thousands of lines of code.

In practice, code that is closed-sourced offers a significant security deterrent, as hackers will commonly scour public repositories and security vulnerability databases looking for weaknesses in open-source that offer a good ROI. Should you embed an open-source library with an unpatched vulnerability, you will not be hard to find.

In terms of open-source PDF rendering, PDF.js is more secure than PDFium. PDF.js uses memory-safe JavaScript, which has proven safer than PDFium (in native C++). This has resulted in only 4 PDF.js security vulnerabilities being reported to the CVE over the last 8 years, as opposed to the 88 major vulnerabilities reported for PDFium.

Whether the engine is built internally or via open-source, you will want to test its accuracy. In the next section, we show you how you can do just that.

Rendering performance and reliability

The longer a user waits for a document to load, the more likely they are to end the sessions. Repeated studies show most users will make a snap decision on whether to switch tasks at about 2-4 seconds, with most users terminating their session at around the 10-second mark.

Needless to say, being able to deliver fast PDF rendering in the browser is one of the most important factors in creating a good user experience.

As you evaluate PDF.js alternatives, consider the types of technologies used within a viewer to improve document load times. Here are three technologies some SDKs utilize to improve their performance:

  • PDF linearization let’s you optimize a PDFs so it can be streamed into your application section by section (think YouTube streaming videos), allowing users to open really big files almost instantly.
  • PDF tiling let’s the viewer break up a large page image into smaller pieces (tiles). These tiles are then loaded selectively into the viewer based on how the user interacts with the document (zooms, scrolls, and pans). By loading tiles selectively, the viewer is able to handle large pages such as an architectural diagram more efficiently, and achieve higher image quality, maintaining the crispness of vector graphics and legibility of text even at higher zoom factors of 400%+.
  • PDF parallelization uses multi-threading to render many pages simultaneously, so a large document can be interacted with sooner. Pages are loaded ahead of the user, allowing them to jump ahead without interruption of their experience.

A final consideration is how the rendering engine handles corrupted PDFs. Some PDF SDKs repair corrupted documents on the fly, preventing the viewer from crashing when it encounters a corrupted file. Having an SDK with a good “repair” engine is especially important if users will upload arbitrary files -- as users are likely to blame the viewer when it crashes and not the corrupted document.

Performance testing

To test rendering performance and reliability, start with a large number of your users' preferred documents and test them across their preferred browsers and devices.

It is a good idea to include the following types of documents as they have more demanding rendering requirements

  • Large files sizes (1GB+)
  • Complex Graphics
  • Many pages (1000+)
  • Corrupt PDF documents

Interact heavily on documents when testing. Scroll to the middle, zoom in and out, and pan side to side repeatedly and in various places. If you are testing a server-based SDK, replicate anticipated load and usage to gauge performance. What would happen if thousands of users upload and interact with documents? Do things slow down? Will you need more servers?

Testing rendering accuracy

PDFs are incredibly complex documents, generated in a variety of ways by thousands of different tools. Each PDF SDK will also render PDFs a bit differently -- and some more accurately than others.

Common rendering issues include:

  • Text with the wrong font, spacing or kerning
  • Non-jpeg image compression types are not supported (e.g., JPEG 2000 and JBIG2)
  • Soft mask and other PDF transparency issues
  • Issues with gradients and patterns

If your users view simple PDFs, like invoices or receipts, testing rendering accuracy is more straightforward. But the task becomes more complex if users need to perform actions such as measurements on construction drawings or collaborate on marketing materials, where image quality at high zoom factors and color management are major considerations.

In the latter scenarios, you may wish to gather a large assortment of documents across all platforms and native applications to capture a wider array of rendering behaviors.

Here is a list of document types that have proven challenging for viewers:

  • CAD-based PDFs with very large and complex designs
  • Reports, textbooks, and marketing material using shadings, gradients, soft masks, and patterns
  • Geospatial maps with OCG layers that are switched off by default
  • Pre-press documents requiring advanced color management
  • Content extraction of tables and text to ensure text read order or table arrangement are in tact
  • Test suite released by the Ghent Workgroup

UI Customization

Over the past 10 years, we’ve seen an unprecedented rise of disruptive technology across all sectors. As you start to decipher the factors involved with this explosive growth, you often find a common characteristic in these organizations -- being hyper-focused on improving user experiences.

The same holds true in the document-viewer space, where designing an intuitive and easy-to-use UI is often a make-or-break-it consideration for end users.

If UX is an important requirement for your organization, then you will want to determine how much flexibility you have to customize different viewer elements.

Typically, viewers come with two different ways to customize the UI: via API calls and/or through an open-source UI.

With a viewer solely dependent on an API for UI customization, you face certain limitations. For example, customization will be limited to predefined parameters and you will not be able to perform an in-depth evaluation of what can be improved or adjusted. This will prove problematic if your UI team is especially strict or if you have to meet specific UI requirements, such as compliance with accessibility standards.

In contrast, open-source UIs provide the highest degree of freedom and control, as you have complete access to the source code and therefore complete visibility into what can be improved or adjusted. This control, in turn, provides more flexibility in meeting unique user requirements -- like creating a custom annotation for an approval workflow. You may also benefit from faster development times by incorporating UI elements from past developer contributions.

Building your proof-of-concept

The final consideration when evaluating an alternative to PDF.js will be to assess how easy it will be to build a successful Proof-of-Concept (PoC). Some factors to consider include:

  • The time to get started and integrate the SDK into your application
  • Access to sample projects that you can plug into your environment to get setup faster
  • Clear, concise and complete documentation for feature sets
  • Out-of-the-box code samples for many modern frameworks
  • Breadth of programing languages to work within
  • Ability to quickly customize UI
  • Access to free support during PoC stage

As you build your PoC, it's important to assess the vendor’s support team. A quick response to your inquiry is a strong indication of support team responsiveness. It’s also good to evaluate how thoroughly the support team handles your inquiry.

Here are a few additional considerations:

  • Are you working directly with developers who have hands-on experience with your platform or framework?
  • Are responses thorough and complete?
  • Does the support team seem persistent in helping you achieve your desired results?

Conclusion

In summary, over 70% of developers consider a switch from their current PDF SDK. Therefore, when evaluating a PDF.js alternative, it's important to consider both your short-term and long-term requirements to avoid the mistake of having to perform a costly switch in the future.

Towards that end, follow these steps and procedures as outlined above.

  • Set clear objectives and key requirements before shortlisting PDF.js alternatives
  • Test each viewer's rendering performance, reliability, and accuracy
  • Define your UI requirements upfront
  • And continually gauge support team responsiveness as you build a PoC

By following these steps, you are sure to avoid past mistakes and build a solution that meets both your short-term and long-term document viewing requirements.