Your Web News in One Place

Help Webnuz

Referal links:

Sign up for GreenGeeks web hosting
April 23, 2022 05:52 am GMT

A new DAG execution tool built with Go

Introduction

I work as a software developer for a Japanese company, developing and maintaining large ETL(Extract, Transform, Load) pipelines that have been maintained for many years.

I am developing a new DAG execution tool with Go (Please visit the repository and leave it a star! ).

I named it jobctl, after taskctl from which the initial code was forked.

What is jobctl ?

jobctl is a tool that generates and executes a DAG (Directed acyclic graph) from simple YAML definitions. jobctl also comes with a simple and convenient web UI .

How it looks

  • JOBs: Overview of all JOBs in your environment.

JOBs

  • Detail: Current status of the job.

Detail

  • Timeline: Timeline of each steps in the pipeline.

Timeline

  • History: History of the execution of the pipeline.

History

Why need a new DAG management tool?

Currently, my environment has many problems. Hundreds of complex cron jobs are registered on huge servers and it is impossible to keep track of the dependencies between them. If one job fails, I don't know which job to re-run. I also have to SSH into the server to see the logs and manually run the shell scripts one by one.

So I needed a tool that can explicitly visualize and manage the dependencies of the pipeline.

How nice it would be to be able to visually see the job dependencies, execution status, and logs of each job in a web browser, and to be able to rerun or stop a series of jobs with just a mouse click!

I considered many potential tools such as Airflow, Rundeck, Luigi, DigDag, JobScheduler, etc.

But unfortunately, they were not suitable for my existing environment. Because they required a DBMS(Database Management System) installation, relatively high learning curves, and more operational overheads. We only have a small group of engineers in our office and use a less common DBMS.

Finally, I decided to build my own tool that would not require any DBMS server, any daemon process, or any additional operational burden and is easy to use. That is jobctl.

Why Go?

I chose Go as the development language for jobctl because Go is my favorite language for many reasons.

It offers ease of multi-threaded programming, static typing, speed, high compatibility, simple syntax, and convenient standard libraries.

I am taking advantage of these Go benefits to develop jobctl.

Architecture

The jobctl's architecture is very simple. It uses local JSON files as data storage. It also uses a UNIX socket for communication with running processes.

jobctl Architecture

jobctl stores the execution status of jobs in local JSON files. Therefore, it does not require any DBMS such as Postgres or MySQL. It does not have a scheduler function and is intended to be used with cron. All that is required is a single jobctl binary file. This makes it very easy to start using.

Job Definition using YAML

Job definitions can be expressed in simple YAML. The simplest example is as follows

name: A sample jobsteps:  - name: "1"    command: echo hello world  - name: "2"    command: sleep 10    depends:      - "1"  - name: "3"    command: echo done!    depends:      - "2"

sample_job_1_dag

When this job is executed, it will look something like this.

sample_job_1

Parameterization, Environment variables, Preconditions

Environment variables, parameters, preconditions, and event handlers can be defined as needed.

name: A sample jobenv:  LOG_DIR: ${HOME}/logslogDir: ${LOG_DIR}params: foo barsteps:  - name: "check precondition"    command: echo start    preconditions:      - condition: "`echo $1`"        expected: foo  - name: "print foo"    command: echo $1    depends:      - "check precondition"  - name: "print bar"    command: echo $2    depends:      - "print foo"  - name: "failure and continue"    command: "false"    continueOn:      failure: true    depends:      - "print bar"  - name: "print done"    command: echo done!    depends:      - "failure and continue"handlerOn:  exit:    command: echo finished!  success:    command: echo success!  failure:    command: echo failed!  cancel:    command: echo canceled!

sample_job_2_dag

When this job is executed, it will look something like this.

sample_job_2

About Development

In our environment, it is in the process of using jobctl to improve existing jobs. I will continue to add and improve features as we use them in the future. I would be very happy if you could participate in the development.

Please look at the repository and leave it a star . Thank you!

Please join in development!

Compared to other similar tools, jobctl is very simple, but it still needs a lot of work in documentation, feature additions, and improvements. Therefore, I would like to get contributors or collaborators to develop jobctl. For this reason, I have made jobctl an organization repository. Any kind of issues, PRs, or pieces of advice are very welcomed!


Original Link: https://dev.to/yohamta/a-new-dag-execution-tool-built-with-go-2ml5

Share this article:    Share on Facebook
View Full Article

Dev To

An online community for sharing and discovering great ideas, having debates, and making friends

More About this Source Visit Dev To