在 Ubuntu 14.04 上使用 Monit 监控 LEMP 堆栈

介绍

Monit 是一个免费的开源服务监控应用程序,可以执行各种基于事件的操作,Monit 可以发送电子邮件通知,重新启动服务或应用程序,或采取其他响应性行动。

本教程将建立在一个基本的LEMP堆栈(Linux, Nginx,MySQL,PHP)。Monit将被集成到监控堆栈中的所有服务,并提醒根用户任何不利的条件。

可选的外部 Monit 服务器也可以用于远程监控 Web 应用程序或其他服务。

前提条件

在我们开始之前,您首先需要设置一个 Ubuntu 14.04 Droplet
您需要一个具有 sudo)特权的标准用户帐户
本教程将Monit添加到现有的LEMP堆栈。有关如何创建一个初始的LEMP堆栈的教程,请参阅 [如何在Ubuntu 14.04上安装Linux,nginx,MySQL,PHP(LEMP)堆栈(https://andsky.com/tech/tutorials/how-to-install-linux-nginx-mysql-php-lemp-stack-on-ubuntu-14-04)
可选:如果您想监控远程网站,DNS或邮件服务器,您应该有该服务器设置一个公共可访问的域或IP地址(更多在步骤6) )

步骤 1 — 配置 Monit 通知的电子邮件交付

系统监控的一部分通常涉及电子邮件通知警报,因此,正确的电子邮件交付必须存在,以便Monit发送电子邮件通知。

 1From: [email protected] 
 2To: [email protected]
 3
 4Resource limit matched Service example.com
 5
 6        Date:        Mon, 22 Dec 2014 03:04:06
 7        Action:      alert
 8        Host:        example.com
 9        Description: cpu user usage of 79.8% matches resource limit [cpu user usage>70.0%]
10
11Your faithful employee,
12Monit

本教程将设置Monit,以便每次发出警报时向您发送电子邮件。

** 注意: ** Monit 的通知可能会默认到您的垃圾邮件文件夹。反向 DNS (称为 PTR 记录) 必须正确配置,以确保最有可能的邮件成功交付。您的 Droplet 主机名必须与其完全合格的域名(FQDN)匹配,因此,例如,它们可以是 hostname.example.com。若要编辑 DigitalOcean Droplet 的 PTR 记录,请访问 DigitalOcean 控制面板。导航到 Settings 并选择 Rename 卡。输入新主机名并点击 Rename。

本指南假定您没有先前存在的邮件传输代理(MTA),因此我们将安装Postfix。

要开始安装 Postfix 作为您的 MTA,请先更新系统存储库的源列表。

1sudo apt-get update

然后从 Ubuntu 存储库中安装 Postfix 和 GNU Mailutils 包。

1sudo apt-get install postfix mailutils

在安装结束时,您将被要求选择服务器配置类型,如下面的屏幕截图所示。

Postfix - select Internet Site

当被提示使用 系统邮件名称时,请使用您的 Droplet 的完全合格域名(FQDN)。

Postfix - set System Mail Name

接下来,打开 /etc/aliases 文件进行编辑. 此指南将使用 Nano,但您可以使用您喜欢的任何文本编辑器。

1sudo nano /etc/aliases

在这里,我们将添加一个个人电子邮件地址,在那里我们将收到Monit的通知电子邮件,这些邮件通知将来自我们的LEMP服务器的根用户。

1postmaster: root
2root: [email protected]

如果需要,还可以添加多个目的地:

1root: username, [email protected], [email protected]

保存您的更改并退出 Nano. 然后运行以下操作来更新代名文件:

1sudo newaliases

您可以從您的 Droplet 發送測試訊息以檢查郵件傳送. 請檢查垃圾郵件文件夾,如果測試訊息不是在您的收件箱中首次看到。

1echo test | mail -s "test message from my VPS" root

步骤 2 – 安装和配置 Monit

Monit 也可在 Ubuntu 包存储库中使用. 有关 Monit 的简短参考指南,请参阅此教程。

Monit 可以在您的 LEMP 服务器上安装:

1sudo apt-get install monit

在 Ubuntu 14.04 上,Monit 配置文件位于 /etc/monit/中,而主要的 Monit 配置文件是 /etc/monit/monitrc。

要在 Nano 中打开 monitrc 以进行编辑:

1sudo nano /etc/monit/monitrc

删除下列行并更改它们以匹配下面的内容:

 1set mailserver localhost	#Use localhost for email alert delivery.
 2
 3set mail-format {
 4      from: monit@$HOST
 5   subject: monit alert --  $EVENT $SERVICE
 6   message: $EVENT Service $SERVICE
 7                 Date:        $DATE
 8                 Action:      $ACTION
 9                 Host:        $HOST
10                 Description: $DESCRIPTION
11
12            Your faithful employee,
13            Monit
14}
15
16set alert root@localhost not on { instance, action }	#Set email address to receive alerts. This guide uses root mail.

在 monitrc 文件中,现在不要评论下面的行,然后更改 example.com 以匹配您的服务器的域或 IP 地址。

1check system example.com
2    if loadavg (1min) > 4 then alert
3    if loadavg (5min) > 2 then alert
4    if memory usage > 75% then alert
5    if swap usage > 25% then alert
6    if cpu usage (user) > 70% then alert
7    if cpu usage (system) > 30% then alert
8    if cpu usage (wait) > 20% then alert

我们也会在文件的末尾添加这个条目:

1check filesystem rootfs with path /	#Alert if low on disk space.
2    if space usage > 90% then alert

保存您的更改并退出 Nano。

步骤 3 — 配置 Monit 中的 LEMP 服务的服务监控

在 Ubuntu 14.04 上,Monit 配置可以直接在 /etc/monit/monitrc 文件中或通过 /etc/monit/conf.d/ 中的个别文件来指定。

首先,我们将为Monit提供管理服务的手段,为了简化本教程,我们将将所有流程监控置于位于/etc/monit/conf.d/lemp-services的单个文件中。

我们可以使用 Nano 创建工作文件:

1sudo nano /etc/monit/conf.d/lemp-services

在我们的 LEMP 堆栈中添加以下服务条目:

 1check process nginx with pidfile /var/run/nginx.pid
 2    group www-data
 3    start program = "/etc/init.d/nginx start"
 4    stop program = "/etc/init.d/nginx stop"
 5
 6check process mysql with pidfile /var/run/mysqld/mysqld.pid
 7    start program = "/etc/init.d/mysql start"
 8    stop program = "/etc/init.d/mysql stop"
 9
10check process php5-fpm with pidfile /var/run/php5-fpm.pid
11    start program = "/etc/init.d/php5-fpm start"
12    stop program = "/etc/init.d/php5-fpm stop"

然后保存你的变化。

步骤 4 — 添加重启不健康 LEMP 服务的操作

现在Monit 能够管理选定的服务,可以根据需要添加操作来重新启动服务,例如,Monit 可以监控 TCP 连接,如果服务器不再证明 HTTP 连接,Monit 可以重新启动 PHP-FPM 或 Nginx 来自动解决问题。

为了建立在我们现有的配置上,我们现在将进一步编辑 /etc/monit/conf.d/lemp-services. 下面我们所做的补充是以红色显示的,如果HTTP连接不再可用,我们会告诉Monit重新启动Nginx和PHP-FPM。

** 注意:** 请确保在第一个和第三个条目中看到 example.com 的位置使用您的 Droplet 域名或 IP 地址。

 1check process nginx with pidfile /var/run/nginx.pid
 2    group www-data
 3    start program = "/etc/init.d/nginx start"
 4    stop program = "/etc/init.d/nginx stop"
 5    if failed host example.com port 80 protocol http then restart
 6    if 5 restarts within 5 cycles then timeout
 7
 8check process mysql with pidfile /var/run/mysqld/mysqld.pid
 9    start program = "/etc/init.d/mysql start"
10    stop program = "/etc/init.d/mysql stop"
11    if failed unixsocket /var/run/mysqld/mysqld.sock then restart
12    if 5 restarts within 5 cycles then timeout
13
14check process php5-fpm with pidfile /var/run/php5-fpm.pid
15    start program = "/etc/init.d/php5-fpm start"
16    stop program = "/etc/init.d/php5-fpm stop"
17    if failed host example.com port 80 protocol http then restart
18    if 5 restarts within 5 cycles then timeout

保存您的更改并关闭 Nano 然后重新启动 Monit 以应用您迄今为止所做的配置更改。

1sudo service monit restart

步骤 5 (可选) — 监控错误和关键字日志

Monit 还可以监控特定关键字的日志,然后执行操作或发送警报,这在Web 应用程序遇到问题或团队需要从日志中通知特定追踪或事件时是有帮助的。

下面是 Nginx 日志示例,其中包含 Monit 可以监控的时限错误,并警告:

12014/12/22 11:03:54 [error] 21913#0: *202571 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 2600:3c01::f03c:91ff:fe6e:5a91, server: example.com, request: "GET /wp-admin/admin-ajax.php?action=wordfence_doScan&isFork=1&cronKey=40cb51ccsdfsf322fs35 HTTP/1.0", upstream: "fastcgi://unix:/var/run/example.com.sock", host: "example.com"

基于我们现有的配置,请在 Nano 中重新打开 LEMP 服务配置文件。

1sudo nano /etc/monit/conf.d/lemp-services

添加下列条目. 此将发送通知,当 Nginx 与 PHP-FPM 通信时发生任何时间停机。

1check file nginx-error with path /var/log/nginx/error.log
2    if match "^timed out" then alert

保存更改并关闭 Nano 然后重新启动 Monit 以便更改生效:

1sudo service monit restart

步骤 6 (可选) — 使用 Monit 监控远程网站和其他服务

除了本地使用 Monit 之外,Monit 还可以查看各种外部服务和连接,在本示例中,我们将使用我们已经设置的本地 Monit 实例,并为外部服务添加一些新的监控配置。

如果一个网页应用程序位于纽约,在旧金山有一个小型的外部Monit服务器将是理想的。

以下是外部 Monit 检查的例子,可以在运行 Monit 的第二个主机上实现,这些例子将放置在外部服务器的 /etc/monit/conf.d/lemp-external 文件中,以便远程检查我们的 LEMP 堆栈在 remote-example.com。

使用 Nano 创建此配置文件:

1sudo nano /etc/monit/conf.d/lemp-external

监控 ICMP 响应和 HTTP 和 HTTPS 连接:

 1# ICMP check
 2check host remote-example.com with address remote-example.com
 3    if failed icmp type echo
 4        for 5 times within 5 cycles
 5        then alert
 6
 7# HTTP check
 8    if failed 
 9          port 80 protocol http 
10       for 5 times within 5 cycles
11       then alert        
12
13# HTTPS check
14    if failed 
15          port 443 type tcpSSL protocol http 
16       for 5 times within 5 cycles
17       then alert

DNS 监视器:

1check host ns1.example.com with address ns1.example.com
2    if failed port 53 type udp protocol dns then alert

SMTP 监视器:

1check host smtp.example.com with address smtp.example.com
2    if failed port 25 type tcp protocol smtp then alert

监视健康检查 Web 应用程序的 URL

对于 Web 应用程序,Monit 也可以在健康检查 URL 上执行特定请求。下面的示例是一个网站 remote-example.com 具有健康检查 URL 的网站: https://remote-example.com/healthcheck。

1check host remote-example.com with address remote-example.com
2    if failed 
3          port 443 type tcpSSL protocol http 
4       request "/healthcheck"     
5       for 5 times within 5 cycles
6       then alert

步骤7:从命令行管理 Monit

Monit 还提供了一个命令行实用程序,从那里,可以使用简单的命令来检查 Monit 的整体状态,并完成有用的任务,例如暂时启动或停止监控。

要从命令行运行 Monit 状态检查,必须启用 Monit Web 服务. 要做到这一点,请在 Nano 中打开 /etc/monit/monitrc 以进行编辑。

1sudo nano /etc/monit/monitrc

不遵守以下行,以便在本地启用 Web 服务:

1set httpd port 2812 and
2        use address localhost
3        allow localhost

保存您的更改并退出 Nano 然后重新启动 Monit:

1sudo service monit restart

现在可以从命令行检查Monit的状态。

以下是暂时禁用和启用监控的命令:

1sudo monit unmonitor all
2
3sudo monit monitor all

步骤 8 - 查看报告

让我们看看我们已经设置的所有检查的报告。

1sudo monit status

现在,您将看到您已配置 Monit 来检查的所有内容的输出,包括本地 LEMP 服务和任何外部检查:

 1sudo monit status
 2The Monit daemon 5.6 uptime: 0m 
 3
 4System 'example.com'
 5  status Running
 6  monitoring status Monitored
 7  load average                      [0.00] [0.01] [0.05]
 8  cpu 0.5%us 0.4%sy 0.0%wa
 9  memory usage 115132 kB [22.9%]
10  swap usage 0 kB [0.0%]
11  data collected Mon, 22 Dec 2014 16:50:42
12
13Filesystem 'rootfs'
14  status Accessible
15  monitoring status Monitored
16  permission 755
17  uid 0
18  gid 0
19  filesystem flags 0x1000
20  block size 4096 B
21  blocks total 5127839 [20030.6 MB]
22  blocks free for non superuser 4315564 [16857.7 MB] [84.2%]
23  blocks free total 4581803 [17897.7 MB] [89.4%]
24  inodes total 1310720
25  inodes free 1184340 [90.4%]
26  data collected Mon, 22 Dec 2014 16:50:42
27
28Process 'nginx'
29  status Running
30  monitoring status Monitored
31  pid 14373
32  parent pid 1
33  uptime 28m 
34  children 4
35  memory kilobytes 1364
36  memory kilobytes total 9228
37  memory percent 0.2%
38  memory percent total 1.8%
39  cpu percent 0.0%
40  cpu percent total 0.0%
41  port response time 0.018s to example.com:80 [HTTP via TCP]
42  data collected Mon, 22 Dec 2014 16:50:42
43
44Process 'mysql'
45  status Running
46  monitoring status Monitored
47  pid 12882
48  parent pid 1
49  uptime 32m 
50  children 0
51  memory kilobytes 44464
52  memory kilobytes total 44464
53  memory percent 8.8%
54  memory percent total 8.8%
55  cpu percent 0.0%
56  cpu percent total 0.0%
57  unix socket response time 0.000s to /var/run/mysqld/mysqld.sock [DEFAULT]
58  data collected Mon, 22 Dec 2014 16:50:42
59
60Process 'php5-fpm'
61  status Running
62  monitoring status Monitored
63  pid 17033
64  parent pid 1
65  uptime 0m 
66  children 2
67  memory kilobytes 13836
68  memory kilobytes total 22772
69  memory percent 2.7%
70  memory percent total 4.5%
71  cpu percent 0.0%
72  cpu percent total 0.0%
73  port response time 0.018s to example.com:80 [HTTP via TCP]
74  data collected Mon, 22 Dec 2014 16:50:42
75
76File 'nginx-error'
77  status Accessible
78  monitoring status Monitored
79  permission 644
80  uid 0
81  gid 0
82  timestamp Mon, 22 Dec 2014 16:18:21
83  size 0 B
84  data collected Mon, 22 Dec 2014 16:50:42
85
86Remote Host 'example.com'
87  status Online with all services
88  monitoring status Monitored
89  icmp response time 0.021s [Echo Request]
90  port response time 0.107s to example.com:443 [HTTP via TCPSSL]
91  port response time 0.062s to example.com:80 [HTTP via TCP]
92  data collected Mon, 22 Dec 2014 16:50:42

使用这些数据来检查您的服务的健康状况,并查看有用的统计数据。

故障解析

如果出现任何问题,请先检查位于 /var/log/monit.log 的 Monit 日志,这将为您提供有关问题的性质的更多信息。

错误记录输入的例子:

1[UTC Dec 22 13:59:54] error    : ICMP echo response for example.com 1/3 timed out -- no response within 5 seconds
2[UTC Dec 22 14:10:16] error    : ICMP echo response for example.com 1/3 timed out -- no response within 5 seconds
3[UTC Dec 22 15:24:19] error    : 'example.com' failed protocol test [HTTP] at INET[example.com:80] via TCP -- HTTP: Error receiving data -- Resource temporarily unavailable
4[UTC Dec 22 15:57:15] error    : ICMP echo response for example.com 1/3 timed out -- no response within 5 seconds
5[UTC Dec 22 17:00:57] error    : ICMP echo response for example.com 1/3 timed out -- no response within 5 seconds
6[UTC Dec 22 17:49:00] error    : 'example.com' failed, cannot open a connection to INET[example.com:443/API] via TCPSSL

结论

完成本指南后,您现在应该将Monit配置为在Ubuntu 14.04上监控LEMP堆栈,Monit非常可扩展,可以轻松定制或扩展,用于监控各种小型和大型网络的服务。

下面是 Monit 的一些额外链接: