WeSearch

The “Robust” Data Scientist: Winning with Messy Data and Pingouin

https://www.facebook.com/kdnuggets· ·5 min read · 0 reactions · 0 comments · 3 views
The “Robust” Data Scientist: Winning with Messy Data and Pingouin

This article uncovers the craftsmanship of using robust statistics in data science processes: illustrating what to do when data fail tests due to not meeting standard assumptions.

Original article
KDnuggets · https://www.facebook.com/kdnuggets
Read full at KDnuggets →
Opening excerpt (first ~120 words) tap to expand

Image by Editor # Introduction A harsh truth to begin with: textbook data science usually becomes a lie in the real world. Concepts and techniques are taught on finely curated, beautifully bell-curved data variables, but as soon as we venture into the wild of real projects, we are hit with lots of outliers, unduly skewed distributions, and indomitable variances. A previous article on building an exploratory data analysis (EDA) pipeline with Pingouin showed how to detect, through tests, cases when the data violates a variety of assumptions like homoscedasticity and normality. But what if the tests fail? Throwing the data away isn't the solution: turning robust is. This article uncovers the craftsmanship of using robust statistics in data science processes.

Excerpt limited to ~120 words for fair-use compliance. The full article is at KDnuggets.

Anonymous · no account needed
Share 𝕏 Facebook Reddit LinkedIn Threads WhatsApp Bluesky Mastodon Email

Discussion

0 comments

More from KDnuggets