0%

nginx平滑升级

nginx信号

TERM, INT fast shutdown
QUIT graceful shutdown
HUP changing configuration, keeping up with a changed time zone (only for FreeBSD and Linux), starting new worker processes with a new configuration, graceful shutdown of old worker processes
USR1 re-opening log files
USR2 upgrading an executable file
WINCH graceful shutdown of worker processes

编译新版本

查看旧nginx编译参数

1
2
3
4
5
6
# /usr/local/webserver/nginx/sbin/nginx -V
nginx version: nginx/1.12.2
built by gcc 4.8.5 20150623 (Red Hat 4.8.5-28) (GCC)
built with OpenSSL 1.0.2k-fips 26 Jan 2017
TLS SNI support enabled
configure arguments: --prefix=/usr/local/webserver/nginx --user=nobody --group=nobody --with-http_stub_status_module --with-http_ssl_module

编译新版本,注意不用make install 备注1

1
2
3
4
5
# tar -xzf nginx-1.14.2.tar.gz
# cd nginx-1.14.2
# yum install -y gcc
# ./configure --prefix=/usr/local/webserver/nginx --user=nobody --group=nobody --with-http_stub_status_module --with-http_ssl_module
# make -j `cat /proc/cpuinfo | grep "^processor" | wc -l`

平滑升级 备注2

update.png

拷贝二进制文件

1
2
3
4
5
6
7
# cp -a /usr/local/webserver/nginx/sbin/nginx /usr/local/webserver/nginx/sbin/nginx.oldbin
# cp -af objs/nginx /usr/local/webserver/nginx/sbin/nginx
cp: overwrite ‘/usr/local/webserver/nginx/sbin/nginx’? y
# ps -ef | grep ngin[x]
root 31862 1 0 11:16 ? 00:00:00 nginx: master process /usr/local/webserver/nginx/sbin/nginx
nobody 31863 31862 0 11:16 ? 00:00:00 nginx: worker process
nobody 31864 31862 0 11:16 ? 00:00:00 nginx: worker process

发送USR2信号给master进程 备注3

1
2
3
4
5
6
7
8
# kill -s USR2 12029
# ps -ef | grep ngin[x]
root 31862 1 0 11:16 ? 00:00:00 nginx: master process /usr/local/webserver/nginx/sbin/nginx
nobody 31863 31862 0 11:16 ? 00:00:00 nginx: worker process
nobody 31864 31862 0 11:16 ? 00:00:00 nginx: worker process
root 31961 31862 0 11:17 ? 00:00:00 nginx: master process /usr/local/webserver/nginx/sbin/nginx
nobody 31962 31961 0 11:17 ? 00:00:00 nginx: worker process
nobody 31963 31961 0 11:17 ? 00:00:00 nginx: worker process

发送WINCH信号给旧master进程

1
2
3
4
5
6
# kill -s WINCH 31862
# ps -ef | grep ngin[x]
root 31862 1 0 11:16 ? 00:00:00 nginx: master process /usr/local/webserver/nginx/sbin/nginx
root 31961 31862 0 11:17 ? 00:00:00 nginx: master process /usr/local/webserver/nginx/sbin/nginx
nobody 31962 31961 0 11:17 ? 00:00:00 nginx: worker process
nobody 31963 31961 0 11:17 ? 00:00:00 nginx: worker process

master进程退出

1
2
3
4
5
# kill -s QUIT 31862
# ps -ef | grep ngin[x]
root 31961 1 0 11:17 ? 00:00:00 nginx: master process /usr/local/webserver/nginx/sbin/nginx
nobody 31962 31961 0 11:17 ? 00:00:00 nginx: worker process
nobody 31963 31961 0 11:17 ? 00:00:00 nginx: worker process

备注

备注1

make install实际上只是将编译好的文件拷贝到指定目录,如下

1
2
3
4
5
6
7
8
9
10
11
# make install
make -f objs/Makefile install
make[1]: Entering directory `/usr/local/src/nginx-1.12.2'
test -d '/usr/local/webserver/nginx' || mkdir -p '/usr/local/webserver/nginx'
test -d '/usr/local/webserver/nginx' \
|| mkdir -p '/usr/local/webserver/nginx'
test ! -f '/usr/local/webserver/nginx/sbin' \
|| mv '/usr/local/webserver/nginx/sbin' \
'/usr/local/webserver/nginx/sbin.old'
cp objs/nginx '/usr/local/webserver/nginx/sbin'
...

备注2

除了手动一步一步的进行升级之外,也可以执行make upgrade一步到位。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
# cat Makefile 

default: build

clean:
rm -rf Makefile objs

build:
$(MAKE) -f objs/Makefile

install:
$(MAKE) -f objs/Makefile install

modules:
$(MAKE) -f objs/Makefile modules

upgrade:
/usr/local/webserver/nginx/sbin/nginx -t

kill -USR2 `cat /usr/local/webserver/nginx/logs/nginx.pid`
sleep 1
test -f /usr/local/webserver/nginx/logs/nginx.pid.oldbin

kill -QUIT `cat /usr/local/webserver/nginx/logs/nginx.pid.oldbin`

备注3

从较老版本升级到新版本时候遇到一个问题,新进程会在3s左右退出,错误日志如下

1
2
3
4
5
6
7
8
9
10
11
12
2019/07/25 17:58:44 [emerg] 15540#0: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2019/07/25 17:58:44 [emerg] 15540#0: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2019/07/25 17:58:44 [emerg] 15540#0: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2019/07/25 17:58:44 [emerg] 15540#0: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2019/07/25 17:58:44 [emerg] 15540#0: bind() to 0.0.0.0:80 failed (98: Address already in use)
nginx: [emerg] bind() to 0.0.0.0:80 failed (98: Address already in use)
2019/07/25 17:58:44 [emerg] 15540#0: still could not bind()
nginx: [emerg] still could not bind()

查了一下进程监听到端口的流程:
jianting.jpg

strace查了下,发新新master进程在打开了日志文件描述符之后,会尝试创建新的socket,猜测新老版本差异太大了,无法共享同socket。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[pid 16930] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 8
[pid 16930] setsockopt(8, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 16930] ioctl(8, FIONBIO, [1]) = 0
[pid 16930] bind(8, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use)
[pid 16930] write(4, "2019/07/25 18:14:43 [emerg] 1693"..., 94) = 94
[pid 16930] write(2, "nginx: [emerg] bind() to 0.0.0.0"..., 72) = 72
[pid 16930] close(8) = 0
[pid 16930] nanosleep({0, 500000000}, NULL) = 0
[pid 16930] socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 8
[pid 16930] setsockopt(8, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
[pid 16930] ioctl(8, FIONBIO, [1]) = 0
[pid 16930] bind(8, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("0.0.0.0")}, 16) = -1 EADDRINUSE (Address already in use)
[pid 16930] write(4, "2019/07/25 18:14:43 [emerg] 1693"..., 94) = 94
[pid 16930] write(2, "nginx: [emerg] bind() to 0.0.0.0"..., 72) = 72
[pid 16930] close(8)

参考:http://nginx.org/en/docs/control.html