Simplified Data Quality Monitoring of Dynamic Longitudinal Data: A Functional Programming Approach - Posit

Products

Products

products

Explore our open source, cloud, and enterprise products

Open Source

RStudio IDE RStudio Server Shiny Shiny Server R Packages Quarto

Enterprise

Posit Team Posit Workbench Posit Connect Posit Package Manager Posit Academy

Cloud

Posit Cloud Public Package Manager shinyapps.io Amazon SageMaker MS Azure

View all Products

Pricing

We make data science available to everyone, regardless of your economic means.

Academic

Small & Medium Business

Self service

Enterprise
Solutions

Solutions

Solutions

Accelerate results with your data and our solutions.

Solutions

Databricks Python Pharma Finance Serious Data Science Business Intelligence

Cloud Platform

Amazon SageMaker Azure Posit Cloud AWS Google Cloud

Go to Solutions

Customer stories

Our customers inspire us every day. We hope you’ll be inspired as well.

explore customer stories
Learn & Support

Learn & Support

Resources

Data science resources in one place.

Blog Cheatsheets Videos
View All

Support

Reach out. We’re here to help.

Documentation Posit Community
All Support

Education

Let us help you build data science skills.

Champions Posit Academy Data Science Hangout

Certified Partners

Posit Full Service Certified Partners resell Posit’s professional products.

Learn more

our community

We’re committed to the open source mission. And you’re a huge part of that.

Go to Posit Community

Certified Partners

Posit Full Service Certified Partners resell Posit’s professional products.

Learn more
Explore More

Explore More

About

We help the world make sense of data.

Careers PBC Annual Report Swag

Analyst Reports

Industry reports about Posit and our suite of professional products.

posit::conf

Bringing together ideas, technologies, and people. Join us virtually or in person.

Posit Community

Community is a core part of Posit culture. See what sets us apart.

Go to Community

Additional Websites

Interested in learning more about Posit? Explore these resources.

Analyze

tidyverse ggplot2 dplyr tidyr purrr

Communicate

Shiny Quarto flexdashboard R Markdown

Integrate

Professional Drivers Launcher Plugin SDK Environments sparklyr plumber reticulate

Model

Tensorflow tidymodels Spark MLib Vetiver

Posit Community

Community is a core part of Posit culture. See what sets us apart.

Go to Community
Pricing

Download RStudio

Videos
Simplified Data Quality Monitoring of Dynamic Longitudinal Data: A Functional Programming Approach

Videos

Ensuring the quality of data we deliver to customers or provide as inputs to models is often one of the most under-appreciated and yet time-consuming responsibilities of a modern data scientist. This task is challenging enough when working with static data, but when we have access to dynamic, longitudinal, continuously updating data, that complexity can become an asset. We will demonstrate how to to simplify data quality monitoring of dynamic data with a functional programming approach that enables early and actionable detection of data quality concerns.

Using purrr as well as tidyr and nested tibbles, we will illustrate the five key pillars of enjoyable, user-friendly data quality monitoring with relevant R code: Readability, Reproducibility, Efficiency, Robustness, and Compositionality.

Readability: FP empowers us to abstract away from the mechanics and implementation of comparing two or more related datasets and move towards declaring the intent of features and metrics we want to compare.

Reproducibility: By avoiding side-effects and dependencies on external states and inputs, and using functional units which can be easily tested over a variety of inputs, FP reduces the burden to create reproducible code. Perhaps more importantly, FP supports not just reproducibility of results, but reproducibility of workflows that can be continually applied to dynamic datasets.

Efficiency: FP enables more efficient code through lazy evaluation, caching, and simplifying implementation over parallel backends.

Robustness: FP allows greater testability of our code through modularization and elegant error-handling, with customized fail-safes for data that differs in expected ways over time.

Compositionality: FP encourages higher-level reasoning with functions, which in turn drives both readability–through higher-level, more abstract code–and robustness, through modifying function behavior in case errors are encountered.

Subscribe to more inspiring open-source data science content.

We love to celebrate and help people do great data science. By subscribing, you'll get alerted whenever we publish something new.