ETL超強調度工具airflow——CentOS 7 安裝airflow

前面我們安裝好了消息隊列的redis,這次我們就正式安裝airflow了。

該文是基於python虛擬化環境來安裝,非虛擬化也是一樣,虛擬化我只是不想破環系統環境。



安裝python虛擬環境

pip install virtualenv

設置環境變量

sudo vi /etc/profile

將如下內容添加到末尾

export PYTHON_HOME=/usr/local/python3

export PATH=$PATH:$PYTHON_HOME/bin

source /etc/profile

創建虛擬環境存儲文件夾

mkdir /softwares/pyenv_for_airflow

cd pyenv_for_airflow/

創建python虛擬環境

virtualenv --no-site-packages airflow_env

賦權

chmod +x -R *

激活虛擬環境

cd bin

source ./activate


安裝依賴組件

yum -y install gcc zlib-devel bzip2-devel openssl-devel ncurses-devel sqlite-devel readline-devel tk-devel gdbm-devel db4-devel libpcap-devel xz-devel

yum -y install python-devel mysql-devel

yum -y install python3-devel

yum -y install cyrus-sasl cyrus-sasl-devel cyrus-sasl-lib

pip install paramiko

pip install pymysql

pip install sqlalchemy

vi /etc/profile

export AIRFLOW_HOME=/softwares/airflow

export SLUGIFY_USES_TEXT_UNIDECODE=yes

#即時生效

source /etc/profile

安裝airflow,all全安裝

pip install apache-airflow[all]

# 我選擇全安裝,因為我嘗試過只是安裝部分,有些功能就出現按bug。

初始化數據庫

cd /softwares/pyenv_for_airflow/airflow_env/lib/python3.7/site-packages/airflow/bin

./airflow initdb

查看其生成文件

cd /softwares/airflow/

創建mysql後臺數據庫

<code>create database airflow_db default charset utf8 collate utf8_general_ci; 
create user 'airflow'@'%' identified by 'airflow_db';
create user 'airflow'@'localhost' identified by 'airflow_db';
grant all on airflow_db.* to 'airflow'@'%';
flush privileges;

-----------------------------------------utf8mb4字符的---------------------------------------------------------------
create database airflow_db default charset utf8mb4 collate utf8mb4_unicode_ci;
create user 'airflow'@'%' identified by 'airflow_db';
create user 'airflow'@'localhost' identified by 'airflow_db';
grant all on airflow_db.* to 'airflow'@'%';
flush privileges;/<code>

·配置airflow使用LocalExecutor執行器,及使用MySQL數據庫

vi airflow/airflow.cfg

executor = LocalExecutor

sql_alchemy_conn = mysql://root:[email protected]:3306/airflow_db

[webserver]

base_url = http://airflow.mn01:8085

web_server_port = 8085

調整時區

default_timezone = Asia/Shanghai

還需要修改3個文件

#1、修改webserver頁面上右上角展示的時間:

vi ${PYTHON_HOME}/lib/python3.7/site-packages/airflow/www/templates/admin/master.html

<code>var UTCseconds = (x.getTime() + x.getTimezoneOffset()*60*1000);
$("#clock").clock({
"dateFormat":"Y-m-d ",
"timeFormat":"H:i:s %UTC%",
"timestamp":UTCseconds
}).click(function(){
alert('{{ hostname }}');
});

改為:
var UTCseconds = x.getTime();
$("#clock").clock({
"dateFormat":"Y-m-d ",
"timeFormat":"H:i:s",
"timestamp":UTCseconds
}).click(function(){
alert(
/<code>

#2、修改airflow/utils/timezone.py

<code>#在 utc = pendulum.timezone('UTC') 這行(第27行)代碼下添加

from airflow import configuration as conf
try:
tz = conf.get("core", "default_timezone")
if tz == "system":
utc = pendulum.local_timezone()
else:
utc = pendulum.timezone(tz)

except Exception:
pass

#修改utcnow()函數 (在第69行)
#d = dt.datetime.utcnow()
d = dt.datetime.now()/<code>


#3、修改airflow/utils/sqlalchemy.py

<code>#在utc = pendulum.timezone('UTC') 這行(第37行)代碼下添加

from airflow import configuration as conf
try:
tz = conf.get("core", "default_timezone")
if tz == "system":
utc = pendulum.local_timezone()
else:
utc = pendulum.timezone(tz)
except Exception:
pass /<code>

重新初始化數據庫

./airflow initdb

啟動服務

cd /softwares/pyenv_for_airflow/airflow_env/lib/python3.7/site-packages/airflow/bin

./airflow webserver -D

可能錯誤

<code>錯誤1:

啟動可能報錯:FileNotFoundError: [Errno 2] No such file or directory: 'gunicorn' ,找不到gunicorn。

 


airflow webserver啟動時,會調用subprocess.Popen創建子進程,webserver使用gunicorn,啟動參數:

1: ['gunicorn', '-w', '4', '-k', 'sync', '-t', '120', '-b', '0.0.0.0:8080', '-n', 'airflow-webserver', '-p', '/home/admin/airflow/airflow-webserver.pid', '-c', 'airflow.www.gunicorn_config', '--access-logfile', '-', '--error-logfile', '-', 'airflow.www.app:cached_app()']

執行gunicorn啟動時,因為在PATH中找不到該命令報錯。

創建gunicorn軟連接

ln –fs /home/admin/python3.6/bin/gunicorn/bin/gunicorn /bin/gunicorn

或者將/usr/local/python3/bin添加到PATH,export PATH=$PATH:/usr/local/python3/bin
 

#即使生效

source /etc/profile
 

錯誤2:

有可能會啟動不了,可以查看err日誌,

一般報錯什麼pid已經存在,這時候需要刪除airflow目錄下的airflow-webserver-monitor.pid文件/<code>

啟動其它服務

./airflow scheduler -D

./airflow worker -D

#啟動flower

./airflow flower-D

默認的端口為 5555,您可以在瀏覽器地址欄中輸入 "http://hostip:5555" 來訪問 flower ,對 celery 消息隊列進行監控。

設置開機啟動服務

#1、創建啟動shell腳本

cd /softwares/

mkdir shellscripts

cd shellscripts/

touch startairflow.sh

vi startairflow.sh

<code>#!/bin/bash
# chkconfig: 2345 10 90
# description:airflow開機自啟腳本

#因為pid文件存在啟動會報錯,所以啟動服務前先判定是否存在pid文件,存在刪除先
airflow_path="/softwares/airflow/"
airflow_webserver_monitor_name="airflow-webserver-monitor.pid"
airflow_webserver_pid_name="airflow-webserver.pid"
airflow_scheduler_pid_name="airflow-scheduler.pid"
airflow_worker_pid_name="airflow-worker.pid"

if [ -x "$airflow_path" ]; then
echo "$airflow_path existed"
cd "$airflow_path"
if [ -f "$airflow_webserver_monitor_name" ]; then
echo "$airflow_webserver_monitor_name existed, i can delete it"
rm -rf "$airflow_webserver_monitor_name"
fi

if [ -f "$airflow_webserver_pid_name" ]; then
echo "$airflow_webserver_pid_name existed, i can delete it"
rm -rf "$airflow_webserver_pid_name"
fi

if [ -f "$airflow_scheduler_pid_name" ]; then
echo "$airflow_scheduler_pid_name existed, i can delete it"

rm -rf "$airflow_scheduler_pid_name"
fi

if [ -f "$airflow_worker_pid_name" ]; then
echo "$airflow_worker_pid_name existed, i can delete it"
rm -rf "$airflow_worker_pid_name"
fi
fi

#進入python虛擬環境
cd /softwares/pyenv_for_airflow/airflow_env/bin

#激活虛擬環境
source ./activate

#啟動相應的airflow 服務
/softwares/pyenv_for_airflow/airflow_env/lib/python3.7/site-packages/airflow/bin/airflow webserver -D
/softwares/pyenv_for_airflow/airflow_env/lib/python3.7/site-packages/airflow/bin/airflow scheduler -D
#LocalExecutor模式不需要啟動worker
#/softwares/pyenv_for_airflow/airflow_env/lib/python3.7/site-packages/airflow/bin/airflow worker -D /<code>

#2、將bash腳本cp到inti.d

sudo cp startairflow.sh /etc/init.d/startairflow

#3、加入到自啟動中

#增加執行權限

cd /etc/init.d/

sudo chmod +x startairflow

#加入自動啟動

sudo chkconfig startairflow on

#查看是否增加到自啟動,2345為on即設置OK

chkconfig --list


· 將airflow命令加入PATH系統變量中,不需要每次指定到airflow bin目錄下執行

sudo vi /etc/profile

#增加如下內容到末尾

export AIRFLOW_CLI_HOME=/usr/local/python3/lib/python3.7/site-packages/airflow/

export PATH=$PATH:$AIRFLOW_CLI_HOME/bin

#立即生效

source /etc/profile



分享到:


相關文章: