MHA的主要作用:

        在mysql的主从复制中,当master崩溃了,利用mha实现backup顶替崩溃的master自动切换为master继续工作,从而实现高可用。

下面介绍本次实验的环境:

        MHA分为manager管理节点和node节点,一般来讲最少是三台服务器,两台node节点,一台manager节点,但本次环境限制,只能使用两台,所以把manager也装在一台node节点上。    

    两台服务器,两个网口:

    IP:

         10.2.16.253     10.0.0.1  node1

         10.2.16.254     10.0.0.2  node2  

  准备工作:

1、首先根据环境修改两台服务器的/etc/hosts文件

 

在hosts文件最后加入两行:

10.0.0.2node2

10.0.0.1node1

 

2、关闭selinux\iptables\ip6tables

 

3、配置网卡:

eth0  为10.2.16.0网段   (用于外网通信)

eth1  为10.0.0.0 网段   (用于内网监测)

4、本例中的系统版本为:centos 6.4 x64 ,MHA版本为 0.54

 

安装MHA-manager:

     mha依赖Perl模块,安装依赖非常的麻烦,首先需要确保拥有 centos的yum源和epel的yum源。(本例中使用的epel源为6.8版本)

注意:经验证有些包epel没有centos源有,有些相反,所以需要互补,通过两个源再结合cpan来安装依赖。

yum -y install perl* ncftp cpan  (centos源安装)

 

perl-Log-Dispatch 是下载的rpm包安装,其他的缺少包找依赖下载吧。

shell> wget http://downloads.naulinux.ru/pub/NauLinux/6x/i386/sites/School/RPMS/perl-Log-Dispatch-2.27-1.el6.noarch.rpm

shell> wget http://dl.Fedoraproject.org/pub/epel/6/i386/perl-Parallel-ForkManager-0.7.9-1.el6.noarch.rpm

shell> wget http://dl.fedoraproject.org/pub/epel/6/i386/perl-Mail-Sender-0.8.16-3.el6.noarch.rpm

shell> wget http://dl.fedoraproject.org/pub/epel/6/i386/perl-Mail-Sendmail-0.79-12.el6.noarch.rpm

shell> wget http://mirror.centos.org/centos/6/os/x86_64/Packages/perl-Time-HiRes-1.9721-136.el6.x86_64.rpm

 

装依赖时候,我也废了很大劲才装上,所以,缺啥找啥装啥吧.....

 

大概是这些包:perl-Config-Tiny perl-Params-Validate perl-Log-Dispatch perl-Parallel-ForkManager perl-DBD-MySQL perl-MIME-Lite*  perl-Mail-Send*

 

依赖安装完毕后,下面讲解本次实验所需的大致步骤:

1、首先需要配置两台主机的SSH密钥免密码验证登录

2、建立 mysql-master\mysql-slave之间的主从复制

3、配置 mha-manager文件

4、用mha-chech-ssh和mha-check-repl验证登录和mysql复制是否成功

5、启动mha-manager(确认以上都无问题后)

一、配置SSH免密钥登录:

     

[root@node1 ~]# ssh-keygen -t rsa   连按三个回车(确认密钥文件位置、设置密钥密码、确认密钥密码)

[root@node1 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.0.0.2     拷贝密钥密码到对方的/root/.ssh/authorized_keys文件

 

[root@node2 ~]# ssh-keygen -t rsa

[root@node2 ~]# ssh-copy-id -i /root/.ssh/id_rsa.pub root@10.0.0.1

 

注意:master的authorized_keys文件也需要加入本机的id.pub内容,不然后期check-ssh会不通过(当时这个问题困扰了两天才找到原因)

[root@node1 ~]#cat .ssh/id_rsa.pub >> .ssh/authorized_keys

 

二、安装mysql配置主从复制:

为了方便,就用yum装了哈

[root@node1 ~]# yum -y install mysql* 

[root@node2 ~]# yum -y install mysql*

 

node1(master)的配置文件

[root@node1 ~]# cat /etc/my.cnf 

[mysqld]

server-id=1 #全局唯一

log-bin=mysql-bin   #生成二进制复制文件的前缀

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

user=mysql

# Disabling symbolic-links is recommended to prevent assorted security risks

symbolic-links=0

[mysqld_safe]

log-error=/var/log/mysqld.log

pid-file=/var/run/mysqld/mysqld.pid

node2(slave)的配置文件 

[root@node2 ~]# cat /etc/my.cnf 

[mysqld]

server-id=2

log-bin=mysql-bin

#binlog_format=mixed

datadir=/var/lib/mysql

socket=/var/lib/mysql/mysql.sock

user=mysql

# Disabling symbolic-links is recommended to prevent assorted security risks

symbolic-links=0

[mysqld_safe]

log-error=/var/log/mysqld.log

pid-file=/var/run/mysqld/mysqld.pid 

配置mysql主节点:

添加用户以及复制权限:

mysql>grant replication slave on *.* to 'repl'@'10.0.0.%' identified by '123456';

mysql>grant all on *.* to 'root'@'10.0.0.2' identified by '123456';

mysql>grant all on *.* to 'root'@'node1' identified by '123456';

mysql>flush privileges;

 

配置完之后查看: 

mysql> select user,host,password from mysql.user;  

+------+-----------+-------------------------------------------+

| user | host      | password                                  |

+------+-----------+-------------------------------------------+

| root | localhost | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |

| root | node1     | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |

| root | 127.0.0.1 |                                           |

|      | localhost |                                           |

|      | node1     |                                           |

| repl | 10.0.0.%  | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |

| root | 10.0.0.2  | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |

+------+-----------+-------------------------------------------+

7 rows in set (0.02 sec)

mysql> show master status;

+------------------+----------+--------------+------------------+

| File             | Position | Binlog_Do_DB | Binlog_Ignore_DB |

+------------------+----------+--------------+------------------+

| mysql-bin.000001 |      106 |              |                  |

+------------------+----------+--------------+------------------+

1 row in set (0.01 sec)

 

配置mysql的slave节点:

 

mysql> change master to

master_host='10.0.0.1',

master_port=3306,

master_user='repl',

master_password='123456',

master_log_file='mysql-bin.000001',

master_log_pos=106;

slave机子上也要授权,因为这个是备用master;

 

mysql>grant replication slave on *.* to 'repl'@'10.0.0.%' identified by '123456';

mysql>grant all on *.* to 'root'@'10.0.0.1' identified by '123456';

mysql>grant all on *.* to 'root'@'node2' identified by '123456';

mysql>flush privileges;

mysql> select user,host,password from mysql.user;

+------+-----------+-------------------------------------------+

| user | host      | password                                  |

+------+-----------+-------------------------------------------+

| root | localhost | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |

| root | node2     | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |

| root | 127.0.0.1 |                                           |

|      | localhost |                                           |

|      | node2     |                                           |

| repl | 10.0.0.%  | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |

| root | 10.0.0.1  | *6BB4837EB74329105EE4568DDA7DC67ED2CA2AD9 |

+------+-----------+-------------------------------------------+

7 rows in set (0.00 sec)

启动slave

mysql> slave start;

Query OK, 0 rows affected (0.02 sec)

 

 查看slave 的Slave_IO_Running 和Slave_SQL_Running,都为yes 时表示配置成功

 

mysql> show slave status\G

*************************** 1. row ***************************

               Slave_IO_State: Waiting for master to send event

                  Master_Host: 10.0.0.1

                  Master_User: repl

                  Master_Port: 3306

                Connect_Retry: 60

              Master_Log_File: mysql-bin.000001

          Read_Master_Log_Pos: 361

               Relay_Log_File: mysqld-relay-bin.000001

                Relay_Log_Pos: 326

        Relay_Master_Log_File: mysql-bin.000001

             Slave_IO_Running: Yes

            Slave_SQL_Running: Yes

              Replicate_Do_DB: 

          Replicate_Ignore_DB: 

           Replicate_Do_Table: 

       Replicate_Ignore_Table: 

      Replicate_Wild_Do_Table: 

  Replicate_Wild_Ignore_Table: 

                   Last_Errno: 1007

                   Last_Error:

                 Skip_Counter: 0

          Exec_Master_Log_Pos: 181

              Relay_Log_Space: 807

              Until_Condition: None

               Until_Log_File: 

                Until_Log_Pos: 0

           Master_SSL_Allowed: No

           Master_SSL_CA_File: 

           Master_SSL_CA_Path: 

              Master_SSL_Cert: 

            Master_SSL_Cipher: 

               Master_SSL_Key: 

        Seconds_Behind_Master: NULL

Master_SSL_Verify_Server_Cert: No

                Last_IO_Errno: 0

                Last_IO_Error: 

               Last_SQL_Errno: 1007

               Last_SQL_Error: 

 

三、安装配置MHA-manager:

      [root@node1 mha4mysql-manager-0.54]# perl Makefile.PL #perl-cpan非常麻烦,应该会遇到各种错误,按报错找问题再解决吧...

 [root@node1 ~] make && make install 

 

 [root@node1 ~]# mkdir /etc/masterha   配置manager的主文件目录

 [root@node1 ~]# mkdir -p /master/app1 配置manager的log文件目录

 

 [root@node1 mha4mysql-manager-0.54]# cp samples/conf/* /etc/masterha/    拷贝配置文件模版

 

 编辑配置文件app1.conf,另一个masterha_default.cnf 无需配置

 [root@node1 ~]# cat /etc/masterha/app1.cnf 

[server default]

manager_workdir=/masterha/app1

manager_log=/masterha/app1/manager.log

user=root      #配置manager-mysql的root管理用户和密码

password=123456

ssh_user=root#之前配置的用来无密码登录的用户

repl_user=repl#用来数据同步复制的用户

repl_password=123456

ping_interval=1#ping验证检查mysql状态,每隔一秒检查一次

shutdown_script=""

#master_ip_failover_script="/data/master_ip_failover"

master_ip_online_change_script=""

report_script=""

[server1]    #配置node节点

hostname=10.0.0.1

master_binlog_dir="/var/lib/mysql/"     #mysql主从复制的二进制文件位置

candidate_master=1#master机宕掉后,优先启用这台作为新master

#no_master=1 #设置使主机不能成为新master

[server2]    

hostname=10.0.0.2

master_binlog_dir="/var/lib/mysql/"     

candidate_master=1

 

保存退出,配置完成。

 

在两台机器上安装node的rpm包

[root@node1 ~]# rpm -ivh mha4mysql-node-0.54-0.el6.noarch.rpm 

[root@node2 ~]# rpm -ivh mha4mysql-node-0.54-0.el6.noarch.rpm  

 

 

进行ssh验证

 

[root@node1 ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf 

Fri Jun 27 15:28:50 2014 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.

Fri Jun 27 15:28:50 2014 - [info] Reading application default configurations from /etc/masterha/app1.cnf..

Fri Jun 27 15:28:50 2014 - [info] Reading server configurations from /etc/masterha/app1.cnf..

Fri Jun 27 15:28:50 2014 - [info] Starting SSH connection tests..

Fri Jun 27 15:28:52 2014 - [debug] 

Fri Jun 27 15:28:51 2014 - [debug]  Connecting via SSH from root@10.0.0.1(10.0.0.1:22) to root@10.0.0.2(10.0.0.2:22)..

Fri Jun 27 15:28:52 2014 - [debug]   ok.

Fri Jun 27 15:28:52 2014 - [debug] 

Fri Jun 27 15:28:51 2014 - [debug]  Connecting via SSH from root@10.0.0.2(10.0.0.2:22) to root@10.0.0.1(10.0.0.1:22)..

Fri Jun 27 15:28:52 2014 - [debug]   ok.

Fri Jun 27 15:28:52 2014 - [info] All SSH connection tests passed successfully.

 

进行mysql主从复制验证

 

[root@node1 ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf 

重要信息显示当前master和slave: 

10.2.16.253 (current master)

 +--10.2.16.254

最后输出以下,则表示配置成功!

MySQL Replication Health is OK.

 

启动MHA-manager管理进程:

 

nohup masterha_manager  --conf=/etc/masterha/app1.cnf >/tmp/mha_manager.log 2>&1

 

查看管理节点的进程和当前的Master

masterha_check_status --conf=/etc/masterha/app1.cnf

 

*****************此时MHA的配置已经全部完成!***********************

 

模拟当mha-master挂掉之后,master切换为备用的254之后,如何切回来:

 

首先需要删除rm -f /masterha/app1/app1.failover.complete  #此为每次启动MHA管理进程时生成的临时文件

 

 *******************让旧的master先变为slave,同步宕机时丢失的数据***********

一、在活这的master中切换 

在旧的master 253上执行:

 

mysql> reset master;

Query OK, 0 rows affected (0.04 sec)

 

#查看现在备用的master上的pos和log_file值之后,再写:

 

mysql> change master to master_host='10.0.0.1', master_port=3306, master_user='repl', master_password='123456', master_log_file='mysql-bin.000002', master_log_pos=106;

Query OK, 0 rows affected (0.05 sec)

 

    mysql> start slave;      #暂时先把旧master变为从

    

    shell> masterha_master_switch --master_state=alive --conf=/etc/masterha/app1.cnf

二、在有宕机的master中切换:

[root@racdb scripts]# masterha_master_switch --master_state=dead --conf=/etc/app1.cnf --dead_master_host=10.0.0.2 --new_master_host=10.0.0.1

如果,MHA manager检测到没有dead的server,将报错,并结束failover:

Thu May 17 17:53:33 2012 - [info] Dead Servers:
Thu May 17 17:53:33 2012 - [error][/usr/lib/perl5/site_perl/5.8.8/MHA/MasterFailover.pm, ln181] None of server is dead. Stop failover.

 

###########################master成功切换回###########################

 

  如何结合KeepAlived实现宕机时IP漂移达到mysql高可用,请查看本博另一篇文章”keepalived结合MHA实现mysql高可用”