FreeBSD12.2でjailとepair+bridgeでたまにpanicする
サーバ
Published: 2021-02-10

FreeBSD12.2でjail+epair+bridge(VIMAGE)使用の環境でkernel panicが起きたのでメモ。

環境

vtnet0 => bridge0 => epair0a => jail(epair0b)

jail.confにいろいろ書いているのですが、bridgeから切り離してから epairを消す処理をexec.poststopで入れている。

exec.poststop ="/sbin/ifconfig bridge0 deletem epair1a";
exec.poststop+="/sbin/ifconfig epair1a destroy";

このjail環境を停止すると2~3回に1回ぐらいkernel panicが発生しました。

Fatal trap 12: page fault while in kernel mode
cpuid = 3; apic id = 03
fault virtual address   = 0x410
fault code              = supervisor read data, page not present
instruction pointer     = 0x20:0xffffffff80b9f237
stack pointer           = 0x28:0xfffffe003369c370
frame pointer           = 0x28:0xfffffe003369c3f0
code segment            = base 0x0, limit 0xfffff, type 0x1b
                        = DPL 0, pres 1, long 1, def32 0, gran 1
processor eflags        = interrupt enabled, resume, IOPL = 0
current process         = 4456 (ifconfig)
trap number             = 12
panic: page fault
cpuid = 3
time = 1612947580
KDB: stack backtrace:
#0 0xffffffff80c0aa85 at kdb_backtrace+0x65
#1 0xffffffff80bbed3b at vpanic+0x17b
#2 0xffffffff80bbebb3 at panic+0x43
#3 0xffffffff8108e911 at trap_fatal+0x391
#4 0xffffffff8108e96f at trap_pfault+0x4f
#5 0xffffffff8108dfb6 at trap+0x286
#6 0xffffffff81066938 at calltrap+0x8
#7 0xffffffff80bb9591 at _rm_rlock_hard+0x3c1
#8 0xffffffff80ce5ce6 at rtinit+0x2a6
#9 0xffffffff80d3873e at in_scrubprefix+0x29e
#10 0xffffffff80d5001d at rip_ctlinput+0x8d
#11 0xffffffff80c4922c at pfctlinput+0x5c
#12 0xffffffff80cbb4fa at if_down+0x12a
#13 0xffffffff80cb90d0 at if_detach_internal+0x150
#14 0xffffffff80cb8df0 at if_detach+0x50
#15 0xffffffff8297ebb1 at epair_clone_destroy+0x81
#16 0xffffffff80cc0c4d at if_clone_destroyif+0xdd
#17 0xffffffff80cc0b12 at if_clone_destroy+0x1a2
Uptime: 3m28s
55              __asm("movq %%gs:%P1,%0" : "=r" (td) : "n" (offsetof(struct pcpu,
(kgdb) #0  doadump () at src/sys/amd64/include/pcpu_aux.h:55
#1  0xffffffff80bbe955 in kern_reboot (howto=260)
    at /usr/src/sys/kern/kern_shutdown.c:451
#2  0xffffffff80bbed93 in vpanic (fmt=<value optimized out>,
    ap=<value optimized out>) at /usr/src/sys/kern/kern_shutdown.c:880
#3  0xffffffff80bbebb3 in panic (fmt=<value optimized out>)
    at /usr/src/sys/kern/kern_shutdown.c:807
#4  0xffffffff8108e911 in trap_fatal (frame=<value optimized out>,
    eva=<value optimized out>) at /usr/src/sys/amd64/amd64/trap.c:921
#5  0xffffffff8108e96f in trap_pfault (frame=0xfffffe003369c2b0,
    usermode=<value optimized out>, signo=<value optimized out>,
    ucode=<value optimized out>) at src/sys/amd64/include/pcpu_aux.h:55
#6  0xffffffff8108dfb6 in trap (frame=0xfffffe003369c2b0)
    at /usr/src/sys/amd64/amd64/trap.c:405
#7  0xffffffff81066938 in calltrap ()
    at /usr/src/sys/amd64/amd64/exception.S:289
#8  0xffffffff80b9f237 in __mtx_lock_sleep (c=0xfffff8008b6b6f38,
    v=<value optimized out>) at /usr/src/sys/kern/kern_mutex.c:579
#9  0xffffffff80bb9591 in _rm_rlock_hard (rm=0xfffff8008b6b6ee0,
    tracker=0xfffffe003369c440, trylock=0)
    at /usr/src/sys/kern/kern_rmlock.c:410
#10 0xffffffff80ce5ce6 in rtinit (ifa=<value optimized out>,
    cmd=<value optimized out>, flags=0) at /usr/src/sys/net/route.c:2030
#11 0xffffffff80d3873e in in_scrubprefix (target=0xfffff8008b7a4600, flags=0)
    at /usr/src/sys/netinet/in.c:897
#12 0xffffffff80d5001d in rip_ctlinput (cmd=<value optimized out>,
    sa=0xfffff8008b7a4698, vip=<value optimized out>)
    at /usr/src/sys/netinet/raw_ip.c:804
#13 0xffffffff80c4922c in pfctlinput (cmd=0, sa=0xfffff8008b7a4698)
    at /usr/src/sys/kern/uipc_domain.c:473
#14 0xffffffff80cbb4fa in if_down (ifp=0xfffff8004f4a7000)
    at /usr/src/sys/net/if.c:2360
#15 0xffffffff80cb90d0 in if_detach_internal (ifp=0xfffff8004f4a7000, vmove=0,
    ifcp=0x0) at /usr/src/sys/net/if.c:1173
#16 0xffffffff80cb8df0 in if_detach (ifp=0xfffff8004f4a7000)
    at /usr/src/sys/net/if.c:1112
#17 0xffffffff8297ebb1 in epair_clone_destroy (ifc=0xfffff80009d7ea80,
    ifp=0xfffff80003205000) at /usr/src/sys/net/if_epair.c:957
#18 0xffffffff80cc0c4d in if_clone_destroyif (ifc=0xfffff80009d7ea80,
    ifp=0xfffff80003205000) at /usr/src/sys/net/if_clone.c:337
#19 0xffffffff80cc0b12 in if_clone_destroy (
    name=0xfffffe003369ca10 "epair105a") at /usr/src/sys/net/if_clone.c:295
#20 0xffffffff80cbda72 in ifioctl (so=0xfffff800097ac368, cmd=2149607801,
    data=0xfffffe003369ca10 "epair105a", td=<value optimized out>)
    at /usr/src/sys/net/if.c:3155
#21 0xffffffff80c28837 in kern_ioctl (td=<value optimized out>,
    fd=<value optimized out>, com=2149607801, data=<value optimized out>)
    at src/sys/sys/file.h:337
#22 0xffffffff80c284da in sys_ioctl (td=0xfffff8006a2b3000,
    uap=0xfffff8006a2b33c0) at /usr/src/sys/kern/sys_generic.c:713
#23 0xffffffff8108f4c7 in amd64_syscall (td=0xfffff8006a2b3000, traced=0)
    at src/sys/amd64/amd64/../../kern/subr_syscall.c:144
#24 0xffffffff8106725e in fast_syscall_common ()
    at /usr/src/sys/amd64/amd64/exception.S:582
#25 0x000000080047199a in ?? ()
Previous frame inner to this frame (corrupt stack?)
Current language:  auto; currently minimal

backtraceを見たところ、epair_clone_destroy なので、eparirのdestroyで発生しているみたいでした。

このbugが該当しそうです。
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=238326

ワークアラウンドとして以下のようにしてやるとpanicしなくなりました。

exec.poststop ="/sbin/ifconfig bridge0 deletem epair1a";
exec.poststop+="sleep 1";
exec.poststop+="/sbin/ifconfig epair1a destroy";