shinyapps.io Scaling and Performance Tuning
R is a single threaded application which means that a Shiny application cannot serve two different users at precisely the same time. This is not an issue in most cases because most computations only take tens or hundreds of milliseconds. As a result, a single R process can usually serve 5 to 30 requests/second. However, as your applications get more complex, requiring more time to service a single request, and as more users interact with the application simultaneously, you may find that the user experience for your applications does not meet your expectations.
Shinyapps.io lets you optimize the performance of your apps with several tuning parameters. To see your current settings go to the Settings page for any application. The default settings have been chosen to address the needs of most applications.
There are several ideas that are important when considering the various tuning options that are available.
The diagram below shows how these ideas relate to each other.
An application is a combination of files that you upload to shinyapps.io. These files must include a ui.R file and a server.R file, and can also include data files.
A running application will have at least one Application Instance. You can add additional instances if the application is hosted on a paid tier.
An Application Instance is a single server that responds to requests from end users. Shinyapps.io will start at least one Application Instance when a user first visits your application, and shinyapps.io will shut down this instance (or these instances) when the application is idle.
Each Application Instance will run one or more R Workers to fulfill user requests.
A worker is a special type of R process that an Application Instance runs to service requests to an application. Each Application Instance can run multiple workers. Each worker process is capable of servicing multiple end users depending on the configuration and performance requirements of the application. If there are no processes available to handle a new request, the Application Instance will start a new worker process.
A browser connection is a connection between a user’s web browser and a worker serving your application.
A user creates a browser connection when they first send a request to your application through their web browser, or when they refresh their browser after it has gone idle. Shinyapps.io assigns each new browser connection to a worker. The worker responds by creating a session for the browser connection to use.
The architecture described above uses two load factors to fine tune the performance of your applications.
Worker Load Factor - The threshold percentage after which a new browser connection will trigger the addition of a new worker.
Instance Load Factor - The threshold percentage after which a new connection will trigger the addition of a new Application Instance (limited to the maximum instance limit, free tier is 1)
Each load factor is based on the idea of a threshold percentage, which is the percentage of available connections or processes that are allowed to open before shinyapps.io launches another worker or Application Instance. Both settings are configurable in the Advanced tab within the Settings page for a given application.
You can also use the Settings page to change:
Each of these changes will further fine tune the performance of your application.
The diagram below shows how shinyapps.io handles user requests throughout the life cycle of an application.
Assuming the following settings:
Instance Load Factor (default is 50%) Worker Load Factor (default is 5%) Max worker processes (default is 3) Max # of concurrent connections supported per worker (default is 50)
Determining when another worker would be started:
Max # of Concurrent connections per worker * Worker Load Factor 50 * 5% = 2.5 (meaning the 3rd Browser Connection would add another worker up to the Max worker processes)
Determining when another Application Instance would be started:
Max # of connections per worker * Max worker processes * Instance Load Factor 50 * 3 * 50% = 75 (meaning the 76th connection would cause an additional instance to be started) If you have
When should you worry about tuning your applications? You should consider tuning your applications if:
Your application has several requests that are slow and you have enough concurrent usage that people’s expectations for responsiveness aren’t being met. For example, If your response time for some key calculations takes one second and you would like to make sure that the average response time for your application is less than two seconds, you will not want more than two concurrent requests per worker.
Sudden large spikes of traffic have poor performance even though you have configured multiple Application Instances. However, additional new users have good performance.
Your application suddenly goes grey and you see in your logs that the application was “killed”.
Remedy: There are two possible solutions:
An application isn’t fitting in memory even for the largest Application Instance size
Your application stops accepting additional users beyond 150 connections.
Remedy: A few things to try would be:
An application that has a significant initialization time (loading lots of data, or talking to 3rd party web services) sometimes doesn’t load.