Linux 的内核当检测到系统内存不足、挑选并杀掉某个进程的过程可以参考内核源代码 linux/mm/oom_kill.c,当系统内存不足的时候,out_of_memory() 被触发,然后调用 select_bad_process() 选择一个 “bad” 进程杀掉,如何判断和选择一个 “bad” 进程呢,总不能随机选吧?挑选的过程由 oom_badness() 决定,挑选的算法和想法都很简单很朴实:最 bad 的那个进程就是那个最占用内存的进程。
Out of memory 的问题。通常都是因为某时刻应用程序大量请求内存导致系统内存不足造成的,这通常会触发 Linux 内核里的 Out of Memory (OOM) killer,OOM killer 会杀掉某个进程以腾出内存留给系统用,不致于让系统立刻崩溃。执行
dmesg -T
可以得到类似如下信息
[Tue Jan 9 12:04:19 2018] [54445] 1001 54445 72525 2440 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54446] 1001 54446 72531 2419 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54447] 1001 54447 72531 2419 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54448] 1001 54448 73061 2521 126 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54449] 1001 54449 72531 2419 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54450] 1001 54450 72531 2419 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54453] 1001 54453 72531 2419 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54454] 1001 54454 72531 2515 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54459] 1001 54459 72531 2401 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54460] 1001 54460 72531 2401 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54461] 1001 54461 72531 2404 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54462] 1001 54462 72531 2403 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54463] 1001 54463 72531 2400 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54464] 1001 54464 72531 2399 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54465] 1001 54465 72531 2401 124 0 -998 httpd
[Tue Jan 9 12:04:19 2018] [54481] 0 54481 9141 125 14 0 1000 combadvisor
[Tue Jan 9 12:04:19 2018] [54488] 0 54488 1029 20 8 0 1000 comb.sh
[Tue Jan 9 12:04:19 2018] [54490] 0 54490 4436 80 13 0 1000 bash
[Tue Jan 9 12:04:19 2018] [54491] 0 54491 1050 23 7 0 -1000 sh
[Tue Jan 9 12:04:19 2018] Out of memory: Kill process 53269 (combdeploy) score 1003 or sacrifice child
[Tue Jan 9 12:04:19 2018] Killed process 54488 (comb.sh) total-vm:4116kB, anon-rss:80kB, file-rss:0kB
[Tue Jan 9 12:04:19 2018] sh invoked oom-killer: gfp_mask=0x2000d0, order=2, oom_score_adj=-1000
[Tue Jan 9 12:04:19 2018] sh cpuset=system mems_allowed=0
[Tue Jan 9 12:04:19 2018] CPU: 1 PID: 54491 Comm: sh Not tainted 3.18.20-nce-amd64 #35
[Tue Jan 9 12:04:19 2018] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[Tue Jan 9 12:04:19 2018] 0000000000000000 0000000000000000 ffffffff8142f7ed ffff88002d8d0a50
[Tue Jan 9 12:04:19 2018] ffffffff8142d8be ffffffff81089448 0000000000000000 0000000000000001
[Tue Jan 9 12:04:19 2018] ffff88008ffdcb00 ffff88008ffdcb00 ffffffff8109e22a ffffffff818b2030
[Tue Jan 9 12:04:19 2018] Call Trace:
[Tue Jan 9 12:04:19 2018] [<ffffffff8142f7ed>] ? dump_stack+0x41/0x51
[Tue Jan 9 12:04:19 2018] [<ffffffff8142d8be>] ? dump_header+0x6f/0x1e2
[Tue Jan 9 12:04:19 2018] [<ffffffff81089448>] ? rcu_batches_completed+0x8/0x8
[Tue Jan 9 12:04:19 2018] [<ffffffff8109e22a>] ? smp_call_function_single+0x6d/0x82
[Tue Jan 9 12:04:19 2018] [<ffffffff8143361e>] ? _raw_spin_unlock_irqrestore+0xc/0xd
[Tue Jan 9 12:04:19 2018] [<ffffffff810e4241>] ? oom_kill_process+0x72/0x2f0
[Tue Jan 9 12:04:19 2018] [<ffffffff810e3ffd>] ? find_lock_task_mm+0x1e/0x6b
[Tue Jan 9 12:04:19 2018] [<ffffffff810e4a4c>] ? out_of_memory+0x42f/0x462
可以看到
Out of memory: Kill process 53269 (combdeploy)
表示process 53269 最先被 kill