Datasets in Reproducible Research with 'pins'

Open source code is an essential piece in making science reproducible. Tools like 'rmarkdown' and GitHub facilitate running and sharing outcomes with colleagues and with the broad scientific community

Datasets in Reproducible Research with 'pins'

February 4, 2020

Open source code is an essential piece in making science reproducible. Tools like 'rmarkdown' and GitHub facilitate running and sharing outcomes with colleagues and with the broad scientific community at large. However, it is less clear what tools should be used to retrieve, store and share datasets; while it is possible to make datasets part of your workflows today, it is usually hard and we are often left with manually sharing or downloading links to datasets. Not only that, but it's also hard to share or discover datasets. In this talk we will introduce for the first time the 'pins' package. A package designed to: pin, discover and share resources. Meaning that, you can use 'pins' to simplify your data science workflows by easily fetching resources from GitHub, Kaggle, CRAN and RStudio Connect. We will present a 'pin' as a generic resource that can contain tabular datasets like CSVs, unstructured data like JSON files, image archives as ZIP files and so on. This talk will be highly interactive showing you how to get started by installing 'pins' from CRAN, retrieve and cache resources, share and discover useful and fun data resources to improve and enhance your day-to-day workflows.

A 5 minute presentation in our Lightning Talks series


About the speaker

Javier Luraschi

Javier is experienced in technologies ranging from desktop, web, mobile and backend; to augmented reality and deep learning applications. He previously worked in Microsoft Research and SAP and holds a double degree in Mathematics and Software Engineering. Javier is the creator of packages like sparklyr, r2d3, cloudml and author of "Mastering Spark with R".