Free and Open Source
Software for Data Science

J.J. Allaire
rstudio::conf 2020

1/29/2020

Overview

  • Origins of RStudio

  • Why free and open source software?

  • Tools for scientific and technical computing

    • Where do they come from?
    • How are they financially supported?
    • Are they trustworthy?
  • Corporations and their discontents

  • RStudio’s role and responsibilities

Origins of RStudio

The Bill James Baseball Abstract (1983)

14-year old me is shocked to learn that most of what I’ve learned from baseball experts and insiders is entirely wrong.

Macalester Political Science (1989)

20-year old me is disturbed by the observation that if baseball commentators and insiders have it that wrong, then many others likely do too!

  • In public policy, we make decisions that affect the well being of hundreds of millions of people.

  • In medicine, we decide what drugs to develop and treatments to administer.

  • In business, we decide what strategies to undertake, products to develop, and what people and teams are most effective.

Sadly, the way knowledge was produced and consumed in these domains looked a lot like pre-Bill James baseball. Could anything be done about this?

Macalester Political Science (1989)

Software seemed like an important part of the answer….

UW Madison Political Science (1992)

PhD Program Dropout (1993)

Aspiring software engineer…


“What a computer is to me is it’s the most remarkable tool that we’ve ever come up with, and it’s the equivalent of a bicycle for our minds.”

Steve Jobs

Software Tool Builder (1994 - 2007)

  • Built tools for programming, research, and writing.

  • Worked almost entirely on proprietary software, which as I discovered has its lifetime bounded entirely by the fortunes of the company sponsoring it’s development.

  • Worked in startup companies, which as I discovered are built to be sold (where typically sold ≈ destroyed).

  • While I found working on software tools very rewarding, I found that working on proprietary software in startups had the seeds of it’s own demise built in from the beginning. This was very disappointing!

Unemployed Tool Builder (2008)

What to do next…

  • I wanted to build tools that were both durable (having impact for many, many years) and accessible (available to all who wanted to use them regardless of their economic means). To me, this meant working on open source software.

  • I knew I wasn’t interested in the frenetic world of software startups (especially given that they are built from the outset to be destroyed).

  • When I discovered R, it took me only about 24 hours to conclude that it was what I wanted to spend (at least!) the next 10 years working on…

RStudio (2008)

Open source software for my first passion, data analysis!

  • A meaningful way to help enhance our collective ability to understand and improve the world we live in.

  • It seemed very much like I had something to offer the community (experience building tools for a technical audience with a focus on accessibility and productivity).

  • It was so early in the development of R that 1 or 2 people could make meaningful contributions to the community (no startup company required).

  • Started working on RStudio IDE, and was joined by Joe Cheng (who I had worked with previously) a few months later.

  • Mission: Open source software for statistical computing

Why free and open source software?

What is Free? (Gratis versus Libre)

  • The English adjective free is commonly used in one of two meanings: “for free” (gratis) and “with little or no restriction” (libre).

  • Richard Stallman summarized the nature of libre in a slogan: “Think free as in free speech, not free beer.”

  • Both meanings of free are relevant here!

  • We may tend to focus too much on the fact that our software comes without cost. That’s a great benefit, but it’s also critical that it come unencumbered with restrictions (and a guarantee that it will remain so).

Four Essential Freedoms

https://www.gnu.org/philosophy/free-sw.en.html

  • The freedom to run the program as you wish, for any purpose (freedom 0).

  • The freedom to study how the program works, and change it so it does your computing as you wish (freedom 1). Access to the source code is a precondition for this.

  • The freedom to redistribute copies so you can help others (freedom 2).

  • The freedom to distribute copies of your modified versions to others (freedom 3). By doing this you can give the whole community a chance to benefit from your changes. Access to the source code is a precondition for this.

Reproducibility

The baseline requirement for reproducibility is that I can run your software!


  • Work based on proprietary software is inherently less reproducible.

  • At best, I need a (potentially expensive) license to reproduce your work.

  • At worst, your work can never be reproduced because the vendor has gone out of business or otherwise made older versions of it’s products inaccessible.

Reproducibility

Nature: The case for open computer programs

The article (written in 2012) cites 2 systems known to enable packaging of code, data, and text. One of them is Sweave….

Reproducibility

Sweave: Pioneered by the R community in 2002!

Resiliency

Where does software go when it dies?

  • Software products (and companies) come and go. We don’t really want our research tied to the fate of a specific product or vendor.

  • A variation on this theme: Software isn’t abandoned but rather the vendor decides to dramatically raise prices once it’s customers are highly dependent on it.

  • Four Essential Freedoms ensure that this can’t happen with free software!

  • Oracle => Sun => MySQL. Community responded with MariaDB.

Resiliency

Examples from R

  • Many vendors have provided offerings around R over the years (TIBCO, Oracle, IBM, Microsoft, Revolution Analytics, Google, H2O, RStudio, etc.)

  • Vendor commitment to R can however vary considerably (e.g. companies can be acquired or shift strategies).

  • Users that make investments in open-source R code are protected from both this variation as well as from vendors aggressively raising prices once customers are dependent on software that can only be obtained from a single-source.

Participation

Who decides what methods are “supported”?

  • A single software vendor is fundamentally incapable of keeping up with the breadth and depth of methodological innovation that occurs in science.

  • Low cost of creating and distributing open source packages fosters robust “long tail” of tools for even very small communities of practitioners.

Participation

What happens when the development of statistical computing tools is open? CRAN as an exemplar…

Accessibility

  • Data literacy has become fundamentally important to science and the global economy.

  • All individuals and organizations should be enfranchised with the tools of data science.

  • This is an imperative not unlike public education. We have a better democracy and civilization if everyone has access to the fundamental tools of inquiry, regardless of their economic means.

  • Free and open source software inherently provide broad accessibility.

Tools for scientific and technical computing

Scientific and technical computing companies

Some traditional leaders:

  • SAS Institute (SAS)
  • MathWorks (MATLAB)
  • Wolfram Research (Mathematica)

Some shared characteristics:

  • Started in academia and grew very slowly
  • Private / closely held companies
  • Principle mission is to support research and science
  • Proprietary software

What’s problematic?

  • Proprietary software makes data science and scientific research less accessible and reproducible (I need to own your software in order to verify and build upon your results).

  • Proprietary software centralizes decisions about what methods and techniques will be used (can slow innovation and prevent wide adoption of new ideas).

  • The combination of proprietary lock-in + logic of self-perpetuation can also incentivize these companies to hold their customers hostage.

However, without an economic engine to fund development, adequate progress is often not made.

Open source tools for scientific and technical computing

  • SageMath

  • GNU Octave

  • R / Tidyverse / RStudio

  • Python / Pandas / Jupyter

Roots very similar to the proprietary software vendors. Started in academia and grew slowly.

However, the means of “protecting” the software is to make it open source (as opposed to starting a company and keeping it private for the long-term)

What’s problematic?

  • Are these projects adequately funded to:

    • Sustain momentum
    • Invest in usability
    • Solve the hardest problems
    • Solve the “boring” problems
  • Even stipulating adequate resources, is project organization cohesive enough to deliver the software that users need?

  • Are organizations comfortable adopting the software without clear visibility to long-term project health?

How to fund open source development

  • Funded by grants (Jupyter best success story here so far)

  • Funded by companies with an interest in the software (Linux and the initial model for Ursa Labs)

  • Funded by venture capital (time horizon often too short)

Are any of these models adequate for scientific and technical computing?

Is open-source a viable way to build these tools?

Wolfram open source (cont.)

Net of the Wolfram case for proprietary software

  • Need to have strong technical leadership to solve hard problems.

  • Need to assemble a group that works together to achieve a set of shared goals.

  • Need a financial engine that can compensate talented people to work on the software full time for many years.

We agree with all of these things but think it’s possible to do this with open-source software.

RStudio Evolution

Virtuous Cycle

Open source and commercial software

Virtuous Cycle

Open source software adopted in complex environments

Virtuous Cycle

Commercial software enables investment into open source

Open Source vs. Commercial

What’s the operative principle?

  • Core productivity tools, packages, protocols, and file formats should be open source:

    • Ensure universal access to core tools (enfranchisement)
    • Ensure that fundamental long-term dependencies are on software that provides the 4 freedoms.
  • Tools that facilitate adoption of R in large/complex environments are commercial (necessarily so, as building and supporting enterprise software is very expensive)

  • Online services that make R more convenient to use (e.g. RStudio Cloud, shinyapps.io) are commercial (as these are also expensive to build and run).

RStudio, a new kind of scientific and technical computing company

  • Like proprietary software companies, we have a financial engine that allows a group to sustain coordinated effort for many years.

  • Like SAS, MathWorks, and Wolfram, we think it’s critical to remain independent to pursue our mission.

  • Unlike these companies, we are open source, which is much better for the community of practitioners.

  • Unlike these companies, we have minimal proprietary lock-in (by design, so as to ensure we can’t drift from our mission).

But can we be trusted?

No, not at face value

  • In today’s world, corporations are by-default not trustworthy (they are pure profit maximizers).

  • The community needs to trust that our mission is first and foremost open-source software and that we won’t expeditiously sell out our mission to maximize profits.

  • Customers need to be able to trust that we seek a relationship of mutual benefit, and that we won’t simply become another vendor with abusive licensing practices.

  • How do we make our motivation as clear as possible, and build real long-term trust? Need to talk about the nature of corporations…

Corporations and their discontents

What is a corporation?

“A corporation is an organization, usually a group of people or a company, authorized to act as a single entity (legally a person) and recognized as such in law.”

—Wikipedia (https://en.wikipedia.org/wiki/Corporation)

Why do corporations exist?

Liability, continuity, and capital formation

  • Before corporations, businesses were essentially undertaken by individuals, or perhaps, by partnerships of individuals.

  • Individuals were thus liable for everything the enterprise did.

  • Further, when a partner left, new contracts had to be established.

  • As the industrial age progressed, the need to assemble large amounts of capital also become important.

Earliest corporations

  • Early English trading companies formed by royal act (e.g. East India Company)

  • Then, legislatures enabled the creation of corporations for enterprises that they believed needed capital to deliver needed improvements—canals, bridges, railways, banks, and utilities.

  • Eventually legislatures came to see the power of corporations to steer capital to productive use as an important public good, without regard to any particular industry.

Essential nature of a corporation

  • An institution created by government in order to benefit the societies they governed.

  • Allowed investors to aggregate resources into an artificial person, without fear of personal liability.

  • This, in turn, allowed for massive, efficient investment vehicles that create the goods and services that benefit society.

  • Corporations are a legal entity granted special privileges (i.e. being a virtual “person”) because it is understood that this will create public benefit.

Primary purpose of corporations

Two competing theories

  • Stakeholder primacy — The corporation is created by the government and therefore has a social function. Thus, directors should consider not only shareholder returns but also all other constituencies that have a stake in the corporation, such as its employees, its debt-holders, the environment, and the community.
  • Shareholder primacy — The corporation has a single purpose: to maximize value for its shareholders (as “owners”), within the bounds of law. A business corporation is organized and carried on primarily for the profit of the stockholders.
  • In the Anglo-American legal tradition (US, UK, and similar) shareholder primacy has won

Dodge v. Ford Motor Co. (1919)

Ford stopped paying dividends to shareholders in order to produce less expensive products and to increase employee wages.

“My ambition is to employ still more men, to spread the benefits of this industrial system to the greatest possible number, to help them build up their lives and their homes. To do this we are putting the greatest share of our profits back in the business.” — Henry Ford

“A business corporation is organized and carried on primarily for the profit of the stockholders. The powers of the directors are to be employed…to attain that end, and does not extend to a change in the end itself, to the reduction of profits, or to the nondistribution of profits among stockholders in order to devote them to other purposes.” — Michigan Supreme Court

Revlon, Inc. v. MacAndrews & Forbes Holdings, Inc. (1985)

The board of Revlon was faced with an acquisition proposal that appealed to the shareholders but that the board believed would result in a poor outcome for the corporate enterprise, including its bondholders. The board took measures to defeat the takeover bid (which involved selling the company to a different bidder).

The Delaware Supreme Court rejected the idea that the board had the duty, or even the option, to consider the interests of stakeholders other than shareholders in a sale process. The legal implication being that the sole objective for directors had to be immediate wealth maximization for shareholders, even if the high bid might destroy large amounts of bondholder value (or, by extension, worker or community value).

Many people find this theory lacking

  • Imposition of public health and environmental externalities.

    “I suspect most Union Carbide shareholders would have been happy to accept a somewhat lower dividend if this allowed Union Carbide to adopt safety measures that would have prevented the deadly explosion in Bhopal, India, that killed 2,000 and severely injured thousands more.” — Lynn Stout (Cornell Law School)

  • Accountability for systemic risks (e.g. recent financial crisis).

  • Should the special legal status granted to corporations by the state carry any reciprocal obligation to the public good?

  • Shouldn’t companies be able to consider the welfare of their own employees and community?

Constituency Statutes

In response to shareholder primacy, 33 states have adopted constituency statutes that permit directors to consider one or more of the following non-stockholder interests:

  1. Employees, customers, creditors, suppliers, and communities in which the corporation has facilities;
  2. National and state economies and other community and societal considerations;
  3. The long-term and short-term interests of the corporation and its stockholders;
  4. The desirability of remaining independent, and the resources, intent, conduct (past, stated, and potential) of a person seeking to acquire control of the corporation; and
  5. The corporation’s officers.

Business Roundtable (2019)


  • The pledge was signed by a group of 181 CEOs from the Business Roundtable, a public policy organization made up of a coalition of CEOs, including the chief executives of Amazon, Walmart, JPMorgan Chase and Apple.

  • The statement didn’t list any specific policy changes but instead laid out high-level goals, such as considering customers, the environment, employees, suppliers and the community at large.

  • This is the first time since 1997 the Business Roundtable has said that corporations shouldn’t exist solely to serve shareholders.

Basketball shoes and corporate governance

AND 1

Basketball shoe company founded in 1993

A socially responsible business before the concept was well known:

  • Great parental leave benefits, widely shared ownership of the company, on-site yoga classes, etc.

  • Gave 5% of its profits to local charities promoting high-quality urban education and youth leadership development.

  • Worked with its overseas factories to implement a best-in-class supplier code of conduct to ensure worker health and safety, fair wages, and professional development.

What happened to AND 1?

  • 4MM in revenues in 1995

  • Took on external investors in 1999

  • 250MM in revenues in 2001

  • Competition from Nike led to dip in sales and ultimately to the sale of the company

  • The sale was done to maximize shareholder value, and led to the company’s preexisting commitments to its employees, overseas workers, and local community stripped away within a few months.

This was needless to say extremely disappointing to the founders!

Then what…B Lab

  • Do we need a new form of corporate governance? How could we make this happen?

  • Jay Coen Gilbert and Bart Houlahan (from AND 1) and their friend Andrew Kassoy (former Wall Street private equity investor) get together to start B Lab, a non-profit dedicated to creating a new form of corporation.

Benefit Corporations

A new type of corporation that is a reaction to the shareholder primacy regime

  • Directors must account for stakeholders (shareholders, community, employees, etc.) in their decisions (legal requirement, not an option).

  • Name a public beneficial purpose as part of their charter.

Delaware Public Benefit Corporation

Legislation identifies the corporate purpose of a PBC as:

A corporation “intended to produce…public benefits and to operate in a responsible and sustainable manner. To that end, a public benefit corporation shall be managed in a manner that balances the stockholders’ pecuniary interests, the best interests of those materially affected by the corporation’s conduct, and the public benefit or benefits identified in it’s certificate or incorporation.”

Consequently, the directors of a PBC have obligations which:

“Require them to balance the stockholders’ pecuniary interests, the best interests of those materially affected by the corporation’s conduct, and the public benefit or public benefits identified in it’s certificate of incorporation.”

Benefit Corporation Examples

Benefit Corporation Examples (cont.)

A movement whose time has come

RStudio, Inc. RStudio, PBC

  • RStudio is now a certified Delaware Benefit Corporation.

    We’ve always run the company for the benefit of all stakeholders (shareholders, employees, community, customers).

    Now this is a fundamental part of our corporate DNA.

RStudio’s Public Benefit

Creation of free and open source software for data science, scientific research, and technical communication:

  1. To enhance the production and consumption of knowledge by everyone, regardless of economic means.

  2. To facilitate collaboration and reproducible research, both of which are critical for ensuring the integrity and efficacy of scientific work.

This is built into our charter, and our directors and officers now have a fiduciary duty to pursue these public benefits along with balancing the needs of all our stakeholders.

Public Benefit Report

  • As part of being a PBC, RStudio will release an annual report describing how it has served it’s public beneficial purpose.

  • The first report is available now at: https://www.rstudio.com/about/pbc-report

  • Some highlights:

    • 250 open source projects
    • 36 full-time engineers dedicated to open source software (54% of company engineering resources)
    • Hundreds of millions of downloads of open source products/packages.
  • Will continue to report on these same metrics annually.

B Corp Certification

  • RStudio has been formally certified as a B Corporation by B Lab.

  • Certified B Corporations amend their legal governing documents to require their board of directors to balance profit and purpose.

  • Certified B Corporations also must achieve a minimum verified score on the B Impact Assessment—a rigorous assessment of a company’s impact on its workers, customers, community, and environment.

  • B Impact Report is publicly available at https://bcorporation.net/directory/rstudio.

Reliable governance for the long term

  • Our plan is to remain independent (not sell the company).

  • Besides being a Public Benefit Corporation, we want to provide assurance that we will remain so, and that the company can’t lose its independence against its will.

  • RStudio does have outside investors (as minority shareholders). Prior to the PBC conversion, our financing documents provided special rights to my shares that enabled blocking undesirable outcomes (e.g. sale of the company).

  • However, if we want the company to be around in 100 years, I will need to live to 150 to continue exercising those rights!

  • Along with the Benefit Corporation transition, we’ve also made some changes to where and how those shares are held as well as how the rights are exercised. These changes ensure that no matter what happens to any one individual, the company will still be able to maintain its independence.

Who benefits from RStudio’s success?

  • We’ve tried to build a company where the community, our employees, our customers, and science itself are major beneficiaries of our success.

  • RStudio is profitable, and we plan to use these profits to purchase stock back from our shareholders over time (as opposed to selling the company or going public).

  • Once our shareholder commitments are met, we will dedicate a substantial portion of our profits to philanthropic causes that benefit open source software and open science, and will document these donations in our annual public benefit report.

Resources / Q & A