Cookie cutter data science
I ’ve recently discovered https://github.com/drivendata/cookiecutter-data-science/ – an effort to standardise the pipeline of a data science project. The practices proposed seem ideal to me – from a very sane directory structure to discussing makefiles as project pipeline DAGs.
Not sure how I managed to survive using completely custom project templates for each project. In fact, I will argue, one should go ahead and automate the whole report/presentation generation pipeline. There aren’t that many data science patterns anyway (let’s say regression/classification/causal inference/time-series analysis/arbitrary optimisation), so it shouldn’t be that hard.