Possible cause: The server your user was connected to became unhealthy, resulting in a redirection during initial load.
Solution: Monitor your server's health checks and GetPerf API to prevent users from connecting to soon-to-be unhealthy and unhealthy servers.
Possible cause: Your server is overloaded and requires a capacity upgrade. This results in users being constantly disconnected.
Solution: Consider raising the number of CPUs available to the server. If the memory is showing as at its limit, increase it as well. Improving the disk speed for the on disk cache may also result in improved performance. You may also want to consider autoscaling your servers for load. We recommend using our GetPerfInfo API for determining if you want to scale.
Cannot connect to the server
After deploying your server, you find your client is unable to connect to the server.
Possible cause: The webviewerServerURL provided to your WebViewer constructor is incorrect.
Solution: Verify the correct URL is being provided that is able to access the server. You can test the URL via the health check API localhost:8090/blackbox/HealthCheck.
Possible cause: Port 8090 (or your custom port) is being used by another process.
Solution: Verify if port is being used elsewhere (i.e. running: lsof -i :8080)
Possible cause: Websockets are not properly configured on your infrastructure or not supported.
Solution: Ensure all parts of your infrastructure can support websockets, including load balancer and proxy servers.
Possible cause: You did not bind the correct ports on the server.
My non PDF documents are failing when using WebViewer Server
Possible cause: The links provided to WVS lack a file extension.
Solution: Ensure your links have a file extension, or provide an extension (ext) argument via loadDocument containing the proper extension.
My server has failed to fetch a document
Possible cause: The file is not accessible by WebViewer Server.
Solution: Grant WebViewer Server permission to access the file server or structure the infrastructure so that WebViewer Server can reach the file server.
Possible cause: You are passing localhost links to the container when the file server is not in the same network.
Solution: Use the real IP address of the servers when providing file links, Docker networks are isolated from your normal machine network.
Possible cause: The file server requires authentication.
Solution: Provide a larger disk to the server, we require 50GB at minimum but recommend enough that your disk large enough such that it will not become maxed out in a matter of minutes during high load.
Possible cause: Your caching options should be reconfigured.
Possible cause: Your Docker agent is storing old container images.
Solution: Adjust your settings so old Docker images are pruned, and prune existing Docker images. The WebViewer Server Docker image is nearly 2GB, so if enough versions are pulled, the data consumed can be quite large. These images can normally be pruned on local machines via docker system prune.
Core dumps consuming large amounts of data
Possible cause: The host system is configured for verbose core dumps.
Solution: Lower the verbosity of these core dumps. This means when the application crashes, WebViewer Server will generate a massive debugging file > 1 GB, these are not managed or deleted. They may consume a large amount of disk space. The following website details how to configure core dumps and prevent this behaviour. https://linux-audit.com/understand-and-configure-core-dumps-work-on-linux/
WebViewer is unable to render some supported formats
Possible cause: You are running in client only mode.
Solution: This may occur if webviewerServerURL was not correctly set in the WebViewer constructor or your WebViewer Server is not available. Ensure the URL provided there is accessible and valid your server is passing health checks.
Possible cause: You are lacking certain license permissions.
Solution: Speak to sales about adding new permissions, such as Office/HTML2PDF and CAD, to your license key. If this is the cause, you will note license failure errors in your logs, they will point to what license feature is missing.
Password protected documents are not working
Possible cause: You are using a version of WebViewer Server older than 1.5.8.
Solution: Use the newest version of WebViewer Server.
Cloud Infrastructure Issues
My network load balancer on AWS is not working
The network load balancer does not support web sockets; however, the classic and application balancers do.
My server is disconnecting when using an AWS classic load balancer
If using the classic load balancer all communication will have to be done over SSL and TCP ports rather than HTTP and HTTPS. This is because this balancer does not provide native support for Websockets. This requires a custom configuration of the load balancer. We recommend moving to an application load balancer.
Data storage and WebViewer Server
WebViewer Server is a read/write intensive application. A single document load would result in around 5~ files being written on average, and those 5 files would all be read by a client. This leads to the server being severely limited by low disk speeds. In turn, one of the key pieces of a well performing server is offering a good disk.
NVMe direct disk
Giving your server an NVMe disk directly dedicated to the server will be the most ideal situation. By having one directly connected to the server you will get the fastest possible read and write speeds and not be subject to external interference.
Network shared storage
Network shared storage offers the advantage of allowing caches to be shared between servers, but this can lead to problems if the network storage does not offer enough speed for your load.
The best thing to do is verify first what kind of speeds your network storage provides in terms of read and writes, this can be done via real world disk tests such as using dd to test disk write speeds.
If these speeds are not capable of exceeding 80 MB/s (~2500 IOPS) with a single server in use, you will likely encounter issues in your server when at load. It should also be considered that this max speed will be lowered if you have multiple servers attached to a single network disk. We do not recommend shared network storage when dealing with a high load environment, but it is possible to make it work if your network share has enough performance.
Attached disk instances via Azure/AWS/GCP
These instances are subject much to the same issues as network shared storage is, but do generally not have to deal with external entities consuming read/write speeds. These instances should be tested if you are unsure about their performance. If your provider cannot guarentee a certain IOPS or base speed for the disk, these disks may be sharing a pipeline with other users and provide lower speeds than expected. If using any attached instance one should have a confirmation that they will maintain a certain IOPS level. One such example of this is GP3 from Amazon, which provides a baseline of 3000 IOPS.
Many Azure attached disks lack a guaranteed minimum and should be avoided. Dedicated disks would be preferred here.
Understanding logs
The logs produced by WebViewer Server can be hard to understand, in this section, we'll go over common flows and how the logs work.
A typical document conversion
This section goes over how to read things which occur in the logs, all information here is based on the 2.2.4 version of WebViewer Server.
You will see two types of log messages, the first are the Tomcat application logs. The second are logs directly from the internal worker.
Logs prefixed with bb-worker are logs directly from the internal worker, all other logs are from Tomcat. Logs from bb-worker may come in bursts rather than in time with other log information. Enabling TRN_DEBUG_MODE will increase the number of logs you see by raising the log level to Debug.
The client connects to the server via a websocket. After this, the client would begin sending requests over the websocket.
sh
1pdfd-tomcat_1 | 2024-08-16/19:28:26.356/UTC [http-nio2-0.0.0.0-8090-exec-2] INFO BlackBoxWSEndpoint - Creating WS connection at endpoint ...
The websocket is given the ID 7445d53b-97c4-4a1a-abb7-bd853d400bd3, there is 1 user currently connected to the server.
sh
1pdfd-tomcat_1 | 2024-08-16/19:28:26.357/UTC [http-nio2-0.0.0.0-8090-exec-2] INFO ConnectionService - Creating WS connection 7445d53b-97c4-4a1a-abb7-bd853d400bd3, 1 connections total
1pdfd-tomcat_1 | 2024-08-16/19:28:26.361/UTC [http-nio2-0.0.0.0-8090-exec-2] INFO DocReference - Setting local path for http://192.168.1.251/testdocs/1.ppt to Fetched/CYIVlM4MyneW9y9L4p3hiZJ3Qc3rJR04ywYg2hIKdgU=.ppt...
2pdfd-tomcat_1 | 2024-08-16/19:28:26.361/UTC [http-nio2-0.0.0.0-8090-exec-2] INFO DocReference - Fetching http://192.168.1.251/testdocs/1.ppt to location 1.ppt
We see the first occurence of the job ID, this is 7445d53b-97c4-4a1a-abb7-bd853d400bd3. You may use this ID to cross reference certain log statements.
sh
1pdfd-tomcat_1 | 2024-08-16/19:28:26.361/UTC [http-nio2-0.0.0.0-8090-exec-2] INFO blackbox - 7445d53b-97c4-4a1a-abb7-bd853d400bd3 accessing http://192.168.1.251/testdocs/1.ppt(yAaYE0TIVh7ED_zH2JT9iYGipXebkXJXllOPaTRYqhw=) for page info
WebViewer Server queues a collection of jobs for the document, note the unique filename key for this document CYIVlM4MyneW9y9L4p3hiZJ3Qc3rJR04ywYg2hIKdgU. You can cross reference this key with logs to determine what file the work belongs to.
sh
1pdfd-tomcat_1 | 2024-08-16/19:28:26.362/UTC [http-nio2-0.0.0.0-8090-exec-2] INFO ServerJob - Starting pages: Image/Fetched/CYIVlM4MyneW9y9L4p3hiZJ3Qc3rJR04ywYg2hIKdgU=.ppt_dir/pageinfo.json
This task requires a conversion, so we add it to the special conversion queue.
sh
1pdfd-tomcat_1 | 2024-08-16/19:28:26.362/UTC [pool-4-thread-16] INFO DocManagement - Adding task /usr/local/apache-tomcat/static_data/Fetched/CYIVlM4MyneW9y9L4p3hiZJ3Qc3rJR04ywYg2hIKdgU=.ppt from http://192.168.1.251/testdocs/1.ppt to convert queue
Work on the actual document starts here, the actual file is fetched to begin work.
sh
1pdfd-tomcat_1 | 2024-08-16/19:28:26.363/UTC [pool-5-thread-8] INFO DocManagement - Kicking off potential fetch of http://192.168.1.251/testdocs/1.ppt
2pdfd-tomcat_1 | 2024-08-16/19:28:26.372/UTC [pool-5-thread-8] INFO DocManagement - Copying http://192.168.1.251/testdocs/1.ppt to /usr/local/apache-tomcat/static_data/Fetched/download543363156732667107.tmp
3pdfd-tomcat_1 | 2024-08-16/19:28:26.374/UTC [pool-5-thread-8] INFO DocManagement - Moving temp file /usr/local/apache-tomcat/static_data/Fetched/download543363156732667107.tmp to /usr/local/apache-tomcat/static_data/Fetched/CYIVlM4MyneW9y9L4p3hiZJ3Qc3rJR04ywYg2hIKdgU=.ppt[0]
Conversion starts here, when a file needs to be converted the conversion must complete before any other tasks can start.
sh
1pdfd-tomcat_1 | 2024-08-16/19:28:26.375/UTC [pool-5-thread-8] INFO DocManagement - Kicking off convert method for /usr/local/apache-tomcat/static_data/Fetched/CYIVlM4MyneW9y9L4p3hiZJ3Qc3rJR04ywYg2hIKdgU=.ppt
2pdfd-tomcat_1 | 2024-08-16/19:28:26.376/UTC [pool-5-thread-8] INFO Conversion - Starting PDFNet conversion from /usr/local/apache-tomcat/static_data/Fetched/CYIVlM4MyneW9y9L4p3hiZJ3Qc3rJR04ywYg2hIKdgU=.ppt to /usr/local/apache-tomcat/static_data/Converted/Fetched/CYIVlM4MyneW9y9L4p3hiZJ3Qc3rJR04ywYg2hIKdgU=.pdf
The convert operation is assigned a job here, 410aa502-a594-459e-b880-42bce42be547 - this key is used by the internal worker. This is followed by the contents of the job - you can see the type of job from the op field in the statement - in this case, it's convert. This allows you to link a specific file to an operation directly.
1pdfd-tomcat_1 | 2024-08-16/19:28:26.397/UTC [pool-5-thread-8] INFO Conversion - Done conversion: /usr/local/apache-tomcat/static_data/Fetched/CYIVlM4MyneW9y9L4p3hiZJ3Qc3rJR04ywYg2hIKdgU=.ppt to /usr/local/apache-tomcat/static_data/Converted/Fetched/CYIVlM4MyneW9y9L4p3hiZJ3Qc3rJR04ywYg2hIKdgU=.pdf
At this point, the PDF is ready and other jobs start processing.
sh
1pdfd-tomcat_1 | 2024-08-16/19:28:26.397/UTC [pool-5-thread-8] INFO Util - Rendering image for page 0 of /usr/local/apache-tomcat/static_data/Converted/Fetched/CYIVlM4MyneW9y9L4p3hiZJ3Qc3rJR04ywYg2hIKdgU=.pdf
The first item for the document is returned, this is the pageinfo job generated from the converted PDF. This information would come back on the websocket, WebViewer would then use it to fetch the result.
sh
1pdfd-tomcat_1 | 2024-08-16/19:28:26.400/UTC [pool-4-thread-1] INFO ServerJob - Task Complete:
2pdfd-tomcat_1 | Sending: pages result to 1 subscribers
At this point, more similar jobs would be started and returned. This is the general flow when any user loads a document.
Common messages and their meanings
A common issue is when a crash occurs on the server. When a crash occurs on the server, this results in a HeartbeatException error. This error is due to the internal heartbeat for a document job failing. The document which was part of this would have crashed the internal worker, and caused it to restart. You will likely see several failures when this error occurs.
This error may constantly repeat, which would signify multiple crashes. This may be caused by one of two things:
Constant submissions of multiple files which crash the internal worker
The server is lacking in memory or available file handles
When the worker crashes due to a file as a result of this error, it will mark that file as 'dangerous' and will refuse to process it in future executions.
sh
1java.util.concurrent.ExecutionException: com.pdftron.server.HeartbeatException: heartbeat: failed for job - bd3a4b0c-1063-4c8e-8b0b-c14f326e141c
2 at java.base/java.util.concurrent.CompletableFuture.reportGet(CompletableFuture.java:396) ~[?:?]
3 at java.base/java.util.concurrent.CompletableFuture.get(CompletableFuture.java:2073) ~[?:?]
4 at com.pdftron.server.BlackBoxPoller.waitForResult(BlackBoxPoller.java:35) ~[PDFTronSharedServer.jar:?]
5 at com.pdftron.server.BlackBoxExecutor.sendJobAndWait(BlackBoxExecutor.java:127) ~[PDFTronSharedServer.jar:?]
6 at com.pdftron.server.BlackBoxClient.doOperation(BlackBoxClient.java:70) ~[PDFTronSharedServer.jar:?]
7 at com.pdftron.server.Operation.doDocOperation(Operation.java:69) ~[PDFTronSharedServer.jar:?]
8 at com.pdftron.server.TypeOperation$DocInfo.doOperation(TypeOperation.java:175) ~[PDFTronSharedServer.jar:?]
9 at com.pdftron.server.BlackBoxServerJobs$DocPageInfoJob.doWork(BlackBoxServerJobs.java:418) ~[PDFTronSharedServer.jar:?]
10 at com.pdftron.server.ServerJob.run(ServerJob.java:239) [PDFTronSharedServer.jar:?]
11 at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) [?:?]
12 at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) [?:?]
13 at java.base/java.lang.Thread.run(Thread.java:840) [?:?]
14Caused by: com.pdftron.server.HeartbeatException: heartbeat: failed for job - bd3a4b0c-1063-4c8e-8b0b-c14f326e141c
15 at com.pdftron.server.BlackBoxPoller.call(BlackBoxPoller.java:89) ~[PDFTronSharedServer.jar:?]
16 at com.pdftron.server.BlackBoxPoller.call(BlackBoxPoller.java:16) ~[PDFTronSharedServer.jar:?]
17 at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[?:?]
These logs will regularly appear throughout the log, they signify the server's current status. While most of what is here is self explanatory, the queue sizes line should be clarified. This line represents how many active jobs are on each of our queues.
sh
1pdfd-tomcat_1 | 2024-08-16/17:34:22.582/UTC [pool-2-thread-1] INFO Monitor - STATUS - Free [32934MB] Total [64214MB] CPU [0.555170%]
2pdfd-tomcat_1 | 2024-08-16/17:34:22.582/UTC [pool-2-thread-1] INFO Monitor - DISK - Free [965GB] Total [0GB]
3pdfd-tomcat_1 | 2024-08-16/17:34:22.582/UTC [pool-2-thread-1] INFO Monitor - JAVA MEMORY - Used [62MB] Max [42949MB]
4pdfd-tomcat_1 | 2024-08-16/17:34:22.582/UTC [pool-2-thread-1] INFO Monitor - QUEUE SIZES - Convert [0] Fetch [0] Main [0]
This log statement will appear if your server job queues are becoming backed up, in this case - this server is in a state of failure due to errors blocking the queue.
This should follow a health failure, in this case, a queue overload failure.
sh
12024-07-29T10:18:58.910Z 2024-07-29/10:18:58.910/UTC [pool-2-thread-4] WARN Monitor - Health check failure detected: status 503
Logs from BlackBoxWSEndpoint and ConnectionService will generally talk about the user websockets connecting and disconnecting. This represents connected users.
sh
12024-07-28T23:48:18.912Z 2024-07-28/23:48:18.911/UTC [pool-2-thread-3] INFO BlackBoxWSEndpoint - Closing ws connection from endpoint: CloseReason: code [1001], reason [Connection 89830a51-1ff7-4d37-8888-81f384120cc0 closed for inactivity.]
22024-07-28T23:48:18.912Z 2024-07-28/23:48:18.911/UTC [pool-2-thread-3] INFO ConnectionService - Closing Connection, 18 outstanding
Sometimes these logs may contain exceptions, they are generally innocuous and represent a user abruptly disconnecting. It may also signify issues at the infrastructure level outside of WebViewer Server causing abrupt disconnections.
22024-07-29T02:28:32.557Z 2024-07-29/02:28:32.557/UTC [http-nio2-0.0.0.0-8090-exec-32] INFO BlackBoxWSEndpoint - Closing ws connection from endpoint: CloseReason: code [1006], reason [Closing WebSocket connection due to an error]
These logs represent the cachemanager, this manager controls the disk cache. These logs will generally tell you the current settings and any cleanup that was done to the cache.
sh
1pdfd-tomcat_1 | 2024-08-16/19:51:00.377/UTC [pool-6-thread-5] INFO CacheManager - Starting CacheManager with limit of 10000MB and max age of 30 minutes
2pdfd-tomcat_1 | 2024-08-16/19:51:00.379/UTC [pool-6-thread-5] INFO CacheManager - Skipping cleanup, cache size 23MB below limit of 10000MB
3pdfd-tomcat_1 | 2024-08-16/19:51:00.379/UTC [pool-6-thread-5] INFO CacheManager - Exited CacheManager
Throughout logs there will be references to highQ, lowQ and convQ. These represent the queues. If they are higher than 0 this means files are waiting to be worked on. If they reach high numbers, exceeding 100, you should investigate if your server is able to handle the load being sent to it. The ideal situation is this value always staying at 0.
HighQ is the queue used for all processing of documents other than conversions. LowQ is the queue used for processing internal tasks. ConvQ is the queue used for processing file conversions.
Generally, we would expect the server to take only a few seconds to be ready for load after startup. There is a short grace period of time where initial setup occurs. This means an instance should not be used until **10 seconds after initial startup. During this startup folders are created and options are set, we recommend tracking the initial startup in logs to see what settings/versions the server started with. The first 100 lines should cover any startup information.
Any shutdown interrupts sent to the server should be immediately responded to with at most, a 5 second delay.