Introduction
DCNiOS is an open-source command-line tool that easily manages the creation of event-driven data processing flows. DCNiOS reads a file with a workflow defined in a YAML structure. Then, DCNiOS creates this workflow in an Apache NiFi cluster. DCNiOS uses transparently the Apache NiFi ProcessGroups to create predefined workflows.
Apache NiFi ProcessGroup is a group of Processors that compose a dataflow. DCNiOS uses predefined ProcessGroups that make simple actions like interacting with third-party elements (e.g., consuming from Kafka) or changing the data content (e.g.encoding the data in base64) to compose a complete dataflow.
In DCNiOS documentation, the ProcessGroups are split by purpose into three main groups: 'Sources', 'Destinations', and 'Alterations'.
- 'Sources' interact with third-party elements as the input data receiver.
- 'Destinations' interact with third-party elements as an output data sender.
- 'Alterations' that do not interact with third-party elements and change the format of the data flow.
Getting Started
Prerequisites
- OSCAR cluster containing the user-defined OSCAR Services. You can see some examples in GitHub.
- Apache NiFi cluster deployed.
- A Python distribution such as Anaconda or Python version 3.7.6
- An input source (e.g.: dCache, Kafka, S3 AWS, SQS AWS)
IM can deploy a Kubernetes cluster that includes an OSCAR cluster and Apache NiFi.
Installation
-
Download the repository in a local folder.
git clone git@github.com:interTwin-eu/dcnios.git
cd dcnios -
Set up a virtual environment (optional but recommended):
You can choose between Conda or Venv for managing Python dependencies.
Using Conda
Create a new Conda environment and activate it:
conda create --name dcnios python=3.7.6
conda activate dcniosUsing Venv
Create a virtual environment with Venv and activate it:
python -m venv dcniosenv
source dcniosenv/bin/activate -
Install Dependencies
Install all the required dependencies listed in requirements.txt:
pip install -r requirements.txt
Alternatively, you can install the minimum required dependencies for DCNiOS:
pip install pyyaml==6.0 requests==2.28.2 oscar_python==1.0.3
-
Verify Installation
Once you've installed the dependencies. You can verify this by running a simple Python script. Try the following Python command to check if the packages are installed correctly:
python -c "import yaml, requests, oscar_python; print('All dependencies are correctly installed.')"