Date: 2017-10-12
Version: 1.0.0
Source Repository:
DSF(Data Science Framework) is python-based framework to compose and execute data analysis logic.
Let’s get start from installing and creating project.
Purpose of DSF is making data scientists more forcussing on logic by supporting points below:
In DSF, all logics are written by using pipeline. A pipeline starts with data loading tasks and finally ends with data writing tasks after many calculation tasks. This structure help reader understanding.
Calculation components define how calculations behave. All you need to do with DSF is choosing calculation components and connect them to a pipeline.
You can define new calculation components from large pipeline as a calculation component set. Calculation component sets are very useful to manage to accumulate and reuse.
DSF has many types of Data Warehousing rule. One of them probably fit where you want to store data and format of data.
DSF has batch scheduler and executor. This feature automatically schedule and execute your pipeline according to dependence relationship. Schedule interval can be set by cron style.
It is hard to manipulate very big data in a computer, but DSF can partition data and execute calculation. However it’s slow because there is no parallel processing feature in current version.