Create Dynamic Workflow in Apache Airflow

Problem

For a long time I search a way to properly create a workflow where the tasks depends on dynamic value based on a list of tables content in a text file.

Context explanation through a graphical example

A schematic overview of the DAG’s structure.

                |---> Task B.1 --|
                |---> Task B.2 --|
 Task A --------|---> Task B.3 --|--------> Task C
                |       ....     |
                |---> Task B.N --|

The problem is to import tables from a db2 IBM database into HDFS / Hive using Sqoop, a powerful tool designed for efficiently transferring bulk data from a relational database to HDFS, automatically through Airflow, an open-source tool for orchestrating complex computational workflows and data processing pipelines.

Inserted data are daily aggregate  using Sparks job, but I’ll only talk about the import part where I schedule the Sqoop job to dynamically import data into HDFS.

How to hide credentials in logstash configuration files?

2018-03-27-logstash-keystore-blog

How to hide credentials in logstash configuration files?

logstash 6.2 let you protect credentials with the keystore.

Let’s see how to use logstash-keystore?

e.g. In the following, we will hide the ‘changeme’ password from the elasticsearch output of your logstash pipeline config file.

To create a logstash.keystorefile, open a terminal window and type the following commands

./bin/logstash-keystore create
./bin/logstash-keystore add es_password

ℹ️ the default directory is the same directory as the logstash.yml settings file.

./bin/logstash-keystore list should show you es_password as answser.

📌 The option -path.settings will set the directory for the keystore. (e.g. bin/logstash-keystore --path.settings /etc/logstash/.keystore create). The keystore must be located in Logstash’s path.settings directory.

📌 When you run Logstash from an RPM or DEB package installation, the environment variables are sourced from /etc/sysconfig/logstash. You might need to create /etc/sysconfig/logstash ; Please keep in mind that this file should be owned by root with 600 permissions.

# use es_password in the pipeline:
output {
	elasticsearch {
		hosts => …
		user => “elastic”
		password => “${es_password}”
	}
}

ℹ️ you can set the environment variable LOGSTASH_KEYSTORE_PASS to act as the keystore password.

Documentation

➡️ Official guide – logstash-keystore

To get help with the cli, simply use: $ ./bin/logstash-keystore help

Kibana – Setup log rotation

2018-02-05-elk-kibana-log-rotation

:information_source: In this document we will see how to properly setup a log rotation for kibana. The e.g. is based on CentOS7 but can be easily adapt for ubuntu or any other linux distribution.

Documentation

  • Kibana doesn’t handle log rotation, but it is built to work with an external process that rotates logs, such as logrotate.
  • The logrotate utility is designed to simplify the administration of log files on a system which generates a lot of log files. Logrotate allows for the automatic rotation compression, removal and mailing of log files. Logrotate can be set to handle a log file daily, weekly, monthly or when the log file gets to a certain size.
  • Logrotate can be installed with the logrotate package. It is installed by default and runs daily.
  • The primary configuration file for logrotate is /etc/logrotate.conf; additional configuration files are included from the /etc/logrotate.d directory.

Prerequisites

kibana

  • Have the following elements (pid.file AND logging.dest) setup in the kibana.yaml configuration file
    server.port: 5601
    server.host: "${KIBANA_SRV}"
    elasticsearch.url: "http://${ES_SRV}:9200"
    kibana.index: ".kibana"
    pid.file: /var/run/kibana/kibana.pid
    logging.dest: /var/log/kibana/kibana.log
    
  • :warning: verify if kibana is well authorised to create /var/log/kibana.log file.
    $ mkdir -p /var/log/kibana/ && chown -R kibana:kibana /var/log/kibana/
    $ mkdir -p /var/run/kibana/ && chown -R kibana:kibana /var/run/kibana/
    

logrotate

verify that logrotate is properly installed

$ logrotate --version
    logrotate 3.8.6

verify that logrotate configuration include the logrotate.d directory

$ cat /etc/logrotate.conf
  . . .
  include /etc/logrotate.d

Configuration

logrotate file

We will create a new logrotate configuration for kibana

$ cat << EOF > /etc/logrotate.d/elk-kibana
/var/log/kibana/*.log {
  missingok
  daily
  size 10M
  create 0644 kibana kibana
  rotate 7
  notifempty
  sharedscripts
  notifempty
  compress
  postrotate
    /bin/kill -HUP $(cat /var/run/kibana/kibana.pid 2>/dev/null) 2>/dev/null
  endscript
}
EOF

Verify your file syntax with the following command

logrotate -vd /etc/logrotate.d/elk-kibana

If you didn’t get any error, you can manually start the first rotation with

logrotate -vf /etc/logrotate.d/elk-kibana

Crontab file

If the ligne include /etc/logrotate.d is well present in /etc/logrotate.conf and logrotate.conf present in /etc/cron.daily/logrotate you don’t need to do any more setup.

grep "logrotate.conf" /etc/cron.daily/logrotate
    /usr/sbin/logrotate -s /var/lib/logrotate/logrotate.status /etc/logrotate.conf

 

Happy logs rotation! 
G.

Why CNCF landscape matters

I grant you, “Cloud Native” has something of a buzzword, but there is still a reality behind all that. A Cloud Native application leverages and takes advantage of Cloud features. And today, a native Cloud application likely to be cut into microservices, that these microservices turn in containers, and that these containers are orchestrated by Kubernetes.

But who has looked at these technologies in recent years is well aware of how fast they are evolving, which makes the technology watch even more relevant, but also more complicated, much more complicated. Indeed, where to find these new projects, how to follow them, how to evaluate their degree of maturity, is it time to adopt them to solve our production problems?

Read more

Ansible – how to collect information about remote hosts with Gathers facts

 Anisble – how to collect information about your remote hosts

In order to do so, we will discuss about ansible modue: Gathers facts

:information_source: Official webpage: https://docs.ansible.com/ansible/setup_module.html

Display facts from all hosts and store them at /tmp/facts indexed by hostname

ansible all -m setup --tree /tmp/facts

➡ now check the file to have a clear view off all variables (facts) collected by ansible for your host like the well known {{ inventory_hostname }}

Read more

Usefulness of an MQ layer in an ELK stack

Usefulness of an MQ layer in an ELK stack

First things first, disambiguation; let’s talk about these strange words.

What does ELK means?

An ELK stack is composed of Elasticsearch, Logstash, and Kibana; these 3 components are owned by the elastic.co company and are particulary useful to handle Data. (ship, enrich, index and search your Data)

What does MQ means?

In computer science, The message queue paradigm is a sibling of the publisher/subscriber pattern.
You can imagine an MQ componant works like several mailboxes; meaning that publisher(s)/subscriber(s) do not interact with the message at the same time, many publishers can post a message for one subsriber and vice versa.

➡ Redis or Kafka or RabbitMQ can be used as buffer in the ELK stack.

Read more

Package manager tools for Debian like

Introduction

The needs are simple, create local repository and mirror the official one for Debian like systems.

RedHat like systems got multiple, Debian like, well not so much.

In this case we need to manage Ubuntu servers, well Canonical has Landscape, but we want to keep it simple.

The possibilities

For Debian Like there’s 2 possibilities :

  • aptly which can seems to be integrate with ansible
  • pulp_deb which is a community project to add to an installation of Katello/Pulp

Introducing the Rancher Partner Network

on Oct 18, 2016

This morning, we’re excited to follow on the launch of the Rancher Partner Network – a group of leading organizations focused on building top-notch cloud and container solutions for their customers. These are vendors with whom we collaborate, and whom we trust and endorse to help enterprises bring containers into their development workflows and production environments. The Rancher Partner Network includes consulting partners, systems integrators, resellers, and service providers from the Americas, Europe, Asia, and Australia.
Rancher Partners

Read more

FinTechMeets: Unlocking the Power of Big Data

FinTechMeets: Unlocking the Power of Big Data

How harnessing big data & machine learning will change the face of business

Yesterday, on September 22, 2016, lux future lab held its second FinTechMeets at BGL BNP Paribas in Kirchberg, focusing on what has become a hot topic: big data and how to harness it.

Roughly 350 IT and financial professionals attended the lunchtime gathering, which centered on a global overview of big data, machine learning and their practical applications.

Jonathan Basse, Founder of Data Essential, and Jed Grant, CEO & Founder of KYC3, shared their insights based on years of experience developing solutions that access this wealth of information.

Read more

Contact Us