97 Things Every Data Engineer Should Know: Collective Wisdom from the Experts
data
With this in-depth book, data engineers will learn powerful, real-world best practices for managing data—both big and small. Contributors from companies including Google, Microsoft, IBM, Facebook, Databricks, and GitHub share their experiences and lessons learned on cleaning, prepping, wrangling, and storing data. I contributed 6 of the 97 essays.
I contributed six chapters to the book:
- Develop communities - not just code: On building developing communities along with code bases and empowering versus patronizing your data product’s customers
- Give data products a front-end with latent documentation: On low effort practices for improving data documentation and usability
- There’s no such thing as data quality: On the value of data “fit for purpose”
- The many meanings of missingness: On causes and consequences of null field encoding
- Column names as contracts: On embedding metadata and performance “contracts” in column names
- Data validation needs more than summary statistics: On the importance of context-informed data validation