Spine
Spine is an open-source project library that helps you create production-grade ETL pipelines quickly and conveniently.
basic terms
Context:
Contextis the entry point into Spine.- inherits from the abstract class
BaseContext - Includes execution engine, run_id, logger, and more.
- There are multiply implementations of SpineContext, such as
SpineSparkContext, andSpinePolarsContext
Step:
- A
Stepis a basic unit that processes data. Each step gets a Context and a Dataframe as input, and returns a modifiedDataframe. - You can find many common step implementations in spinelibs, such as DB connectors.
Workflow:
Workflowrepresents the flow of the pipeline.- Inherits from the abstract class
Workflow. - Contains multiple
Steps. - Currently, there is one implementation of the executor -
DagWorkflow.
Executor
- gets a
Workflowand aContextas input, and executes theWorkflow. - Currently, there is one implementation of the
Executor-AsyncWorkflowExecutor.