BLACK LIVES MATTER
Join us and donate
The premier IDE for R
RStudio anywhere using a web browser
Put Shiny applications online
Shiny, R Markdown, Tidyverse and more
Do, share, teach and learn data science
An easy way to access R packages
Let us host your Shiny applications
A single home for R & Python Data Science Teams
RStudio for the Enterprise
Easily share your insights
Control and distribute packages
RStudio
RStudio Server
Shiny Server
R Packages
RStudio Cloud
RStudio Public Package Manager
shinyapps.io
RStudio Team
RStudio Server Pro
RStudio Connect
RStudio Package Manager
Webinars Working with Spark RStudio Pro Administration
Part 3 - Advanced features of sparklyr
August 23, 2017
RStudio recently announced a new open-source package called sparklyr that facilitates a connection between R and Spark using a full-fledged dplyr backend with support for the entirety of Spark’s MLlib library. Due to Spark’s ability to interact with distributed data with little latency, it is becoming an attractive tool for interfacing with large datasets in an interactive environment. In addition to handling the storage of data, Spark also incorporates a variety of other tools including stream processing, computing on graphs, and a distributed machine learning framework. Some of these tools are available to R programmers via the sparklyr package.
In this four-part series, we’ll discuss how to leverage Spark’s capabilities in a modern R environment. The sparklyr Series:
Javier is the author of “Mastering Spark with R”, pins, sparklyr, mlflow and torch. He holds a double degree in Math and Software Engineer and decades of industry experience with a focus on data analysis. Javier is currently working on a project of his own; and previously worked in RStudio, Microsoft Research and SAP.