手動搭建MySQL高可用架構MHA,小白也能輕鬆搭建

推薦學習

前言

MySQL作為數據存儲工具,可以說是整個架構體系中最重要的一環都不為過。無論是怎樣的架構,怎樣的設計,都不能離開關係型數據庫。如果數據庫故障了,整個系統肯定是不可用的,所以MySQL的高可用非常重要。本篇主要從理論上講解常見的MySQL高可用架構MMM和MHA,以及從零開始,一步步搭建一個高可用的MHA架構。

高可用

MySQL主從複製常見的高可用架構有兩種,MMM和MHA。想要實現MySQL主從複製的高可用,需要實現以下幾點功能

對主從複製集群中的master節點進行監控當master節點宕機後把VIP(Virtual IP Address,即虛擬IP)遷移到新的master節點重新配置集群中的其他slave節點從新的master同步

MMM架構

MMM(Master-Master replication manager for MySQL)是一套支持雙主故障切換和雙主日常管理的腳本程序。主要用來監控

管理Master-Master(雙主)複製,雖然叫做雙主複製,但是同一個時刻只有一個master,另一個作為master的備份,以加速在主主切換時刻備選master的預熱,一方面實現了故障自動切換的功能,另一方面也可以實現多個slave的讀負載均衡

MMM的整體架構圖如下所示

結合MMM的架構圖,我們可以知道

整個架構需要兩個master節點,兩個master節點互為主備。同一時刻,只能有一個master對外提供服務可以有多個用於讀操作的slave節點給master分配一個VIP,只能在主備之間切換;給每個slave節點分配一個讀VIP,可以在任意slave節點上切換

當master宕機時,MMM管理工具會把所有的slave節點切換為主備的slave。並且把寫VIP遷移到主備服務器上,slave節點從新的master節點上同步數據,整個過程簡單粗暴,所以無法保證數據的一致性。

當slave節點宕機時,MMM管理工具會把讀VIP遷移到其他slave節點,slave節點可以有多個VIP。

MMM架構缺點

故障切換簡單粗暴,容易丟失事務(可以採用半同步複製改進)不支持GTID的複製方式(可以自行修改perl腳本)社區不活躍,很久沒更新新版本

MHA架構

MHA(Master High Availability)是一款開源的 MySQL 的高可用程序。MHA管理工具在監控到master節點故障時,會提升擁有master最新數據的slave節點成為新的master節點,並且會讓其他的slave節點從新的master節點上同步數據。MHA還提供了master節點的在線切換功能,即按需切換master/slave節點。

MHA的架構圖如下所示

從MHA架構圖可以知道,MHA只監控master的健康狀態,當master宕機時,MHA管理工具會從master所有的slave中選出一個最接近master的節點提升為新的master。

MHA故障遷移

MHA管理下的MySQL主從複製,master故障之後,會經過如下過程進行故障轉移

移除宕機的master的VIP,挑選具有最新數據的slave
嘗試從宕機的master保存二進制日誌(如果僅僅是MySQL實例宕機,則有可能成功)應用差異的中繼日誌(relay log)到其它slave,因為作為備選master的節點的中繼日誌,和其它slave節點的中繼日誌可能有差異,所以要把備選master節點的中繼日誌應用到其它slave節點應用從master保存的二進制日誌(如果第二步成功)把備選的master提升為新的master配置其他的slave從新的master同步,把寫VIP遷移到新的master上

MHA優勢

支持GTID的複製方式和基於日誌點的複製方式可以從多個slave節點中選取最適合的master會嘗試從舊的master中儘可能保存更多的未同步日誌

MHA不足

不一定能獲取到原master的未同步日誌需要自行開發寫VIP轉移腳本只監控master,而沒有對slave實現高可用

MHA適用場景

適用基於GTID的複製方式使用一主多從的複製架構希望更少的數據丟失場景

搭建MHA

整個MHA的搭建過程雖然不算複雜,但是涉及到的步驟較多,建議先整體閱讀一下,再動手實踐

配置節點間SSH免密

首先在master節點(192.168.1.101)上執行,一路回車即可

<code>ssh-keygen/<code>

執行結果如下

<code>Generating public/private rsa key pair. Enter file in which to save the key (/root/.ssh/id_rsa): Enter passphrase (empty for no passphrase): Enter same passphrase again: Your identification has been saved in /root/.ssh/id_rsa. Your public key has been saved in /root/.ssh/id_rsa.pub. The key fingerprint is: SHA256:HFpSaM7IVW+TjQVUM0m1JBNgrnhH85O3wuur58sev1E root@localhost.localdomain The key's randomart image is: +---[RSA 2048]----+ | oo.==X+o | | +. + =.* . | | . *. o X . . | | o o* = + . | | o S . + . E| | . . . o o | | + o | | ..= . | | .*Ooo. | +----[SHA256]-----+/<code>

把生成的/root/.ssh/id_rsa拷貝到三個節點上(包括自己)

<code>ssh-copy-id -i /root/.ssh/id_rsa root@192.168.1.101 ssh-copy-id -i /root/.ssh/id_rsa root@192.168.1.102 ssh-copy-id -i /root/.ssh/id_rsa root@192.168.1.103/<code>

執行完成後,在192.168.1.101使用ssh命令連接到102和103上是不需要密碼的

<code>ssh 192.168.1.102/<code>

上述操作需要在192.168.1.102和192.168.1.103上均執行一遍

安裝yum擴展包

下載wget

<code>https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm/<code>安裝rpm -ivh

<code>epel-release-latest-7.noarch.rpm 1/<code>修改vim

<code>/etc/yum.repos.d/epel.repo/<code>

只需要修改一項內容,就是epel節點下的gpgcheck

<code>[epel] ... ## 只需要修改epel節點下的gpgcheck屬性 gpgcheck=0 .../<code>

上述操作需要在所有節點上均執行一遍

安裝依賴

執行如下命令

<code>yum -y install perl-DBD-MySQL ncftp perl-DBI.x86/<code>

上述操作需要在所有節點上均執行一遍

安裝MHA管理工具

下載地址:https://download.csdn.net/download/Baisitao_/12505957

<code>## 安裝mha-node rpm -ivh mha4mysql-node-0.57-0.el7.noarch.rpm /<code>

上述操作需要在所有節點上均執行一遍

安裝管理節點

嚴格來說,監控工具應該安裝在一個單獨的節點,此處為了節約一個節點,就安裝在192.168.1.103上。

<code>yum -y install perl-Config-Tiny.noarch perl-Time-HiRes.x86_64 perl-Parallel-ForkManager perl-Log-Dispatch/<code>

安裝完成後就可以開始安裝mha-manager了

<code>rpm -ivh mha4mysql-manager-0.57-0.el7.noarch.rpm/<code>

創建mha目錄

在監控節點(192.168.1.103)上,創建mha的配置目錄

<code>## 配置目錄 mkdir -p /etc/mha/<code>

每個節點上創建mha工作目錄

<code>## 工作目錄,該目錄用於master宕機時,slave將master的bin log拷貝到此目錄 mkdir -p /root/mha/<code>

創建mha賬戶

在master節點(192.168.1.101)上,創建mha需要用到的賬戶,並授權

<code>## 創建用戶 create user dba_mha@'192.168.1.%' identified by 'your password'; ## 授權 grant all privileges on *.* to dba_mha@'192.168.1.%';/<code>

編輯配置文件

在監控節點(192.168.1.103)上新建並編輯配置文件

<code>vim /etc/mha/mysql-mha.conf/<code>

配置如下內容,根據自己的實際情況進行修改(password、ip、目錄等)

<code>[server default] user=dba_mha ## 注意改成自己的密碼 password=your password manager_workdir=/root/mha manager_log=/root/mha/manager.log remote_workdir=/root/mha ssh_user=root repl_password=your password ping_interval=1 master_binlog_dir=/home/mysql/sql_log ssh_port=22 master_ip_failover_script=/usr/bin/master_ip_failover secondary_check_script=/usr/bin/masterha_secondary_check -s 192.168.1.101 -s 192.168.1.102 -s 192.168.1.103 [server1] hostname=192.168.1.101 candidate_master=1 [server2] hostname=192.168.1.102 candidate_master=1 [server3] hostname=192.168.1.103 ## 該節點也是監控節點,所以關閉master候選 no_master=1/<code>

從配置文件可以看到,參數master_ip_failover_script配置了master故障時,需要執行寫VIP的故障轉移腳本/usr/bin/master_ip_failover。所以還需要配置這個腳本,創建並編輯這個腳本

<code>vim /usr/bin/master_ip_failover/<code>

配置如下內容,根據自己的實際情況進行修改

<code>#!/usr/bin/env perl use strict; use warnings FATAL => 'all'; use Getopt::Long; my ( $command, $orig_master_host, $orig_master_ip,$ssh_user, $orig_master_port, $new_master_host, $new_master_ip,$new_master_port, $orig_master_ssh_port,$new_master_ssh_port,$new_master_user,$new_master_password ); my $vip = '192.168.1.88/24'; my $key = '1'; my $ssh_start_vip = "/sbin/ifconfig ens33:$key $vip"; my $ssh_stop_vip = "/sbin/ifconfig ens33:$key down"; my $ssh_Bcast_arp= "/sbin/arping -I ens33 -c 3 -A 192.168.1.88"; GetOptions( 'command=s' => \$command, 'ssh_user=s' => \$ssh_user, 'orig_master_host=s' => \$orig_master_host, 'orig_master_ip=s' => \$orig_master_ip, 'orig_master_port=i' => \$orig_master_port, 'orig_master_ssh_port=i' => \$orig_master_ssh_port, 'new_master_host=s' => \$new_master_host, 'new_master_ip=s' => \$new_master_ip, 'new_master_port=i' => \$new_master_port, 'new_master_ssh_port' => \$new_master_ssh_port, 'new_master_user' => \$new_master_user, 'new_master_password' => \$new_master_password ); exit &main(); sub main { $ssh_user = defined $ssh_user ? $ssh_user : 'root'; print "\n\nIN SCRIPT TEST====$ssh_user|$ssh_stop_vip==$ssh_user|$ssh_start_vip===\n\n"; if ( $command eq "stop" || $command eq "stopssh" ) { my $exit_code = 1; eval { print "Disabling the VIP on old master: $orig_master_host \n"; &stop_vip(); $exit_code = 0; }; if ($@) { warn "Got Error: $@\n"; exit $exit_code; } exit $exit_code; } elsif ( $command eq "start" ) { my $exit_code = 10; eval { print "Enabling the VIP - $vip on the new master - $new_master_host \n"; &start_vip(); &start_arp(); $exit_code = 0; }; if ($@) { warn $@; exit $exit_code; } exit $exit_code; } elsif ( $command eq "status" ) { print "Checking the Status of the>

配置文件中,值得注意的地方(第14行開始)如下圖所示

vip表示讀的虛擬IP,而不是master節點的IP。ens33是網絡接口的名稱,可以通過ifconfig查看

這個腳本實現了master故障時,寫VIP的自動轉移。

腳本編輯完成後,賦予可執行的權限

<code>chmod +x /usr/bin/master_ip_failover/<code>

檢查配置

由於配置內容比較多,不能保證全部都正確,所以可以先校驗一下相關配置,在監控節點(192.168.1.103)上執行

檢查SSH配置

<code>masterha_check_ssh --conf=/etc/mha/mysql-mha.conf/<code>執行結果

<code>Tue Jun 9 22:11:11 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jun 9 22:11:11 2020 - [info] Reading application default configuration from /etc/mha/mysql-mha.conf.. Tue Jun 9 22:11:11 2020 - [info] Reading server configuration from /etc/mha/mysql-mha.conf.. Tue Jun 9 22:11:11 2020 - [info] Starting SSH connection tests.. Tue Jun 9 22:11:16 2020 - [debug] Tue Jun 9 22:11:12 2020 - [debug] Connecting via SSH from root@192.168.1.103(192.168.1.103:22) to root@192.168.1.101(192.168.1.101:22).. Tue Jun 9 22:11:14 2020 - [debug] ok. Tue Jun 9 22:11:14 2020 - [debug] Connecting via SSH from root@192.168.1.103(192.168.1.103:22) to root@192.168.1.102(192.168.1.102:22).. Tue Jun 9 22:11:15 2020 - [debug] ok. Tue Jun 9 22:11:19 2020 - [debug] Tue Jun 9 22:11:11 2020 - [debug] Connecting via SSH from root@192.168.1.101(192.168.1.101:22) to root@192.168.1.102(192.168.1.102:22).. Tue Jun 9 22:11:17 2020 - [debug] ok. Tue Jun 9 22:11:17 2020 - [debug] Connecting via SSH from root@192.168.1.101(192.168.1.101:22) to root@192.168.1.103(192.168.1.103:22).. Tue Jun 9 22:11:18 2020 - [debug] ok. Tue Jun 9 22:11:25 2020 - [debug] Tue Jun 9 22:11:12 2020 - [debug] Connecting via SSH from root@192.168.1.102(192.168.1.102:22) to root@192.168.1.101(192.168.1.101:22).. Tue Jun 9 22:11:13 2020 - [debug] ok. Tue Jun 9 22:11:13 2020 - [debug] Connecting via SSH from root@192.168.1.102(192.168.1.102:22) to root@192.168.1.103(192.168.1.103:22).. Tue Jun 9 22:11:24 2020 - [debug] ok. Tue Jun 9 22:11:25 2020 - [info] All SSH connection tests passed successfully./<code>

通過日誌可以看到SSH的配置正確

檢查主從複製的配置

<code>masterha_check_repl --conf=/etc/mha/mysql-mha.conf/<code>

執行結果

<code>Tue Jun 9 22:22:43 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jun 9 22:22:43 2020 - [info] Reading application default configuration from /etc/mha/mysql-mha.conf.. Tue Jun 9 22:22:43 2020 - [info] Reading server configuration from /etc/mha/mysql-mha.conf.. Tue Jun 9 22:22:43 2020 - [info] MHA::MasterMonitor version 0.57. Tue Jun 9 22:22:45 2020 - [info] GTID failover mode = 1 Tue Jun 9 22:22:45 2020 - [info] Dead Servers: Tue Jun 9 22:22:45 2020 - [info] Alive Servers: Tue Jun 9 22:22:45 2020 - [info] 192.168.1.101(192.168.1.101:3306) Tue Jun 9 22:22:45 2020 - [info] 192.168.1.102(192.168.1.102:3306) Tue Jun 9 22:22:45 2020 - [info] 192.168.1.103(192.168.1.103:3306) Tue Jun 9 22:22:45 2020 - [info] Alive Slaves: Tue Jun 9 22:22:45 2020 - [info] 192.168.1.102(192.168.1.102:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled Tue Jun 9 22:22:45 2020 - [info] GTID ON Tue Jun 9 22:22:45 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306) Tue Jun 9 22:22:45 2020 - [info] Primary candidate for the new Master (candidate_master is set) Tue Jun 9 22:22:45 2020 - [info] 192.168.1.103(192.168.1.103:3306) Version=5.7.30-log (oldest major version between slaves) log-bin:enabled Tue Jun 9 22:22:45 2020 - [info] GTID ON Tue Jun 9 22:22:45 2020 - [info] Replicating from 192.168.1.101(192.168.1.101:3306) Tue Jun 9 22:22:45 2020 - [info] Not candidate for the new Master (no_master is set) Tue Jun 9 22:22:45 2020 - [info] Current Alive Master: 192.168.1.101(192.168.1.101:3306) Tue Jun 9 22:22:45 2020 - [info] Checking slave configurations.. Tue Jun 9 22:22:45 2020 - [info] read_only=1 is not set on slave 192.168.1.102(192.168.1.102:3306). Tue Jun 9 22:22:45 2020 - [info] read_only=1 is not set on slave 192.168.1.103(192.168.1.103:3306). Tue Jun 9 22:22:45 2020 - [info] Checking replication filtering settings.. Tue Jun 9 22:22:45 2020 - [info] binlog_do_db= , binlog_ignore_db= Tue Jun 9 22:22:45 2020 - [info] Replication filtering check ok. Tue Jun 9 22:22:45 2020 - [info] GTID (with auto-pos) is supported. Skipping all SSH and Node package checking. Tue Jun 9 22:22:45 2020 - [info] Checking SSH publickey authentication settings on the current master.. Tue Jun 9 22:22:50 2020 - [warning] HealthCheck: Got timeout on checking SSH connection to 192.168.1.101! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 342. Tue Jun 9 22:22:50 2020 - [info] 192.168.1.101(192.168.1.101:3306) (current master) +--192.168.1.102(192.168.1.102:3306) +--192.168.1.103(192.168.1.103:3306) Tue Jun 9 22:22:50 2020 - [info] Checking replication health on 192.168.1.102.. Tue Jun 9 22:22:50 2020 - [info] ok. Tue Jun 9 22:22:50 2020 - [info] Checking replication health on 192.168.1.103.. Tue Jun 9 22:22:50 2020 - [info] ok. Tue Jun 9 22:22:50 2020 - [info] Checking master_ip_failover_script status: Tue Jun 9 22:22:50 2020 - [info] /usr/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=192.168.1.101 --orig_master_ip=192.168.1.101 --orig_master_port=3306 IN SCRIPT TEST====root|/sbin/ifconfig ens33:1 down==root|/sbin/ifconfig ens33:1 192.168.1.88/24=== Checking the Status of the>

根據輸出可以看到,主從複製配置也正確。

更多的檢查方式可以通過ll /usr/bin/ |grep master命令查看

master首次配置VIP

由於MHA工具只會在故障時遷移VIP,所以第一次啟動MHA的時候,需要手動給master節點(192.168.1.101)配置一個寫VIP,配置方式如下,在master節點(192.168.1.101)上執行如下命令(參數需要根據實際情況修改)

<code>/sbin/ifconfig ens33:1 192.168.1.88/24/<code>

ens33是網絡接口的名稱,192.168.1.88是寫VIP,這些配置在master_ip_failover腳本中已經指定過。

配置寫VIP之前,使用ifconfig輸出如下

<code>ens33: flags=4163 mtu 1500 inet 192.168.1.101 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::bce6:1d30:472c:d811 prefixlen 64 scopeid 0x20 inet6 2409:8a4c:a13:3f30:9d96:8b33:ca89:c62c prefixlen 64 scopeid 0x0 ether 00:0c:29:28:70:7c txqueuelen 1000 (Ethernet) RX packets 979338 bytes 460658144 (439.3 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 693198 bytes 278374776 (265.4 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 208973 bytes 18422224 (17.5 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 208973 bytes 18422224 (17.5 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0/<code>

配置寫VIP之後,ifconfig名稱輸出如下

<code>ens33: flags=4163 mtu 1500 inet 192.168.1.101 netmask 255.255.255.0 broadcast 192.168.1.255 inet6 fe80::bce6:1d30:472c:d811 prefixlen 64 scopeid 0x20 inet6 2409:8a4c:a13:3f30:9d96:8b33:ca89:c62c prefixlen 64 scopeid 0x0 ether 00:0c:29:28:70:7c txqueuelen 1000 (Ethernet) RX packets 1040146 bytes 477864466 (455.7 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 733075 bytes 299370855 (285.5 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0 ens33:1: flags=4163 mtu 1500 inet 192.168.1.88 netmask 255.255.255.0 broadcast 192.168.1.255 ether 00:0c:29:28:70:7c txqueuelen 1000 (Ethernet) lo: flags=73 mtu 65536 inet 127.0.0.1 netmask 255.0.0.0 inet6 ::1 prefixlen 128 scopeid 0x10 loop txqueuelen 1000 (Local Loopback) RX packets 222701 bytes 19630288 (18.7 MiB) RX errors 0 dropped 0 overruns 0 frame 0 TX packets 222701 bytes 19630288 (18.7 MiB) TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0/<code>

啟動MHA

在監控節點(192.168.1.103)上執行如下命令(默認前臺運行)

<code>masterha_manager --conf=/etc/mha/mysql-mha.conf/<code>

執行之後輸出日誌如下

<code>Tue Jun 9 22:38:05 2020 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Jun 9 22:38:05 2020 - [info] Reading application default configuration from /etc/mha/mysql-mha.conf.. Tue Jun 9 22:38:05 2020 - [info] Reading server configuration from /etc/mha/mysql-mha.conf../<code>

可以看到MHA已經成功啟動。

除此之外,/root/mha目錄下還有兩個相關的文件manager.log和mysql-mha.master_status.health,分別用來記錄MHA日誌和master節點的健康狀態。

至此,MHA架構已經搭建完成。

因為master節點的VIP是192.168.1.88,所以寫操作只需要連接這個VIP即可。如果連接不上,請開啟MySQL允許遠程訪問

故障切換日誌

MHA高可用搭建後,理論上是高可用的,即master宕機後,馬上會提升一個slave為新的master。但是理論歸理論,我們還是要實踐下。

以下日誌是master宕機(只停止了MySQL服務)後,MHA監控工具打印的日誌。

<code>Thu Jun 11 20:59:53 2020 - [warning] Got error on MySQL select ping: 2013 (Lost connection to MySQL server during query) Thu Jun 11 20:59:53 2020 - [info] Executing secondary network check>

從日誌可以看到192.168.1.102被提升為了新的master。

腦裂問題

如果原來的master恢復了,會不會搶回master呢,還是會出現多個master?

如果原來的master恢復後,還是master,那就是一個主從複製集群中出現了兩個master,這樣就出現了腦裂

且看MySQL如何解決這個問題。

重新啟動192.168.1.101的MySQL服務。因為新的master已經變成192.168.1.102了,所以在新的master上執行show slave hosts查看master上連接了幾個slave

<code>show slave hosts; +-----------+------+------+-----------+--------------------------------------+ | Server_id | Host | Port | Master_id | Slave_UUID | +-----------+------+------+-----------+--------------------------------------+ | 103 | | 3306 | 102 | d6532e2a-a592-11ea-99c3-000c297f5b55 | +-----------+------+------+-----------+--------------------------------------+ 1 row in set (0.00 sec)/<code>

可以看到slave只剩下192.168.1.103,也就說原來的master恢復後,並沒有搶回master,也沒有成為slave

如果想讓原來的master加入集群,需要重新配置

<code>change master to master_host='192.168.1.102', master_user='repl', master_password='your password', master_auto_position=1; start slave;/<code>

配置、啟動之後,再次查看master的slave節點

<code> show slave hosts; +-----------+------+------+-----------+--------------------------------------+ | Server_id | Host | Port | Master_id | Slave_UUID | +-----------+------+------+-----------+--------------------------------------+ | 101 | | 3306 | 102 | 81502f9e-a592-11ea-b912-000c2928707c | | 103 | | 3306 | 102 | d6532e2a-a592-11ea-99c3-000c297f5b55 | +-----------+------+------+-----------+--------------------------------------+ 2 rows in set (0.00 sec)/<code>

可以看到,原來的master就在故障恢復之後成功的加入了集群。

總結

MySQL的高可用非常重要,手動搭建一個MHA的高可用架構,可以讓我們更好的理解MHA的工作原理,也讓我們在面對MySQL故障時不至於束手無策。

作者:Sicimike

原文鏈接:https://blog.csdn.net/Baisitao_/article/details/106558800