Microsoft and AWS Collaborate To Develop ‘PyWhy’: A New Github Home For ‘DoWhy’ (A Causal Machine Learning Library From Microsoft)

As computing systems become more actively involved in societally essential areas such as healthcare, education, and government, it is crucial to accurately forecast and comprehend these interventions’ causal repercussions. Traditional machine learning algorithms based on pattern recognition and correlational analyses are insufficient for decision-making without an A/B test.

To fill this gap, Microsoft researchers created a platform that executes the process of causal inference analysis from start to finish to assist data scientists in better understanding and applying causal inference. They developed the DoWhy in 2018. Since then, the library has been doing precisely that, cultivating a community committed to using causal inference principles in data science. “DoWhy” is a Python package that attempts to encourage causal thinking and analysis, many ways machine learning libraries have done for prediction. DoWhy provides a four-step interface for causal inference that focuses on clearly modeling and confirming causal assumptions as feasible. 

Traditional machine learning approaches aim to anticipate a result. Consider a public utility business that wants to minimize their customers’ water use using a marketing and incentives campaign. The success of a rewards program is difficult to assess since any drop in water consumption by participating consumers is masked by their decision to engage in the program. 

Suppose we see that a rewards program member uses less water than others. How do we know if the program motivates their lower water consumption or if consumers who were already expecting to cut their water usage choose to join it? Given information about the determinants of consumer behavior, causal approaches may separate confusing variables and determine the impact of this incentive program.

We only see one of the two possible outcomes for every consumer and cannot directly examine the program’s impact. Procedures created to validate traditional machine learning models, which compare predictions to observable ground truths, cannot be applied. Instead, new techniques are required to acquire confidence in the dependability of causal inference. Most importantly, we must collect domain knowledge, reason about our modeling choices, validate our key assumptions when possible, and examine the sensitivity of our outcomes to assumption violations when validation is not possible.


A utility introduces a reward program and now are the results obtained a direct consequence of the rewards program?

The novel modeling assumptions of causal techniques provide the most significant barrier to data scientists who are only beginning to investigate causal inference. DoWhy can assist them in comprehending and implementing the approach. The library focuses on the four phases of an end-to-end causal inference analysis, which are detailed in prior work, DoWhy: an End-to-End Library for Causal Inference, and a linked blog post:

Modeling: Causal reasoning begins with developing a coherent model of the causal assumptions. We must be specific about what we already know to achieve a legitimate answer to our cause-and-effect inquiry.

Identification: Next, we utilize the model to determine whether the causal question can be answered, and we submit the expression to be calculated. 

Estimating: Once we have identified the causal impact, we may select various statistical and machine learning-based estimation approaches to answer our causal inquiry. 

Refutation: Once we have our response, we must do all possible to put our underlying assumptions to the test. 

The DoWhy library distinguishes itself from previous causal inference toolkits by focusing on the four phases of the end-to-end causal inference process. DoWhy supplements existing libraries that focus on particular processes by providing users with the benefits of other libraries in a uniform, seamless API.

Making causation a cornerstone of data science practice would need an even more significant collaborative effort to build a consistent basis for our industry. DoWhy is transitioning to a separate open-source governance model in a new PyWhy GitHub group to provide access to this vital knowledge resource. Amazon Web Services (AWS) collaborates to provide new technologies based on structural causal models.

Determining the contributions of different causal mechanisms to the outcome of causal sequence

PyWhy aims to create an open-source environment for causal machine learning that advances the state-of-the-art while making it accessible to practitioners and researchers. 

It can be installed from GitHub using:

pip install dowhy

DoWhy requires the following packages installed:

  • numpy
  • scipy
  • scikit-learn
  • pandas
  • networkx (for analyzing causal graphs)
  • matplotlib (for general plotting)
  • sympy (for rendering symbolic expressions)
This Article is written as a summay by Marktechpost Staff based on the Research Articles from Microsoft and Amazon. All Credit For This Research Goes To The Researchers of This Project. Check out the docs / github

Please Don't Forget To Join Our ML Subreddit

Aneesh Tickoo is a consulting intern at MarktechPost. He is currently pursuing his undergraduate degree in Data Science and Artificial Intelligence from the Indian Institute of Technology(IIT), Bhilai. He spends most of his time working on projects aimed at harnessing the power of machine learning. His research interest is image processing and is passionate about building solutions around it. He loves to connect with people and collaborate on interesting projects.

[Announcing Gretel Navigator] Create, edit, and augment tabular data with the first compound AI system trusted by EY, Databricks, Google, and Microsoft