|
From: | Corey Bryant |
Subject: | Re: [Qemu-devel] [PATCHv2 3/4] Support for "double whitelist" filters |
Date: | Mon, 05 Nov 2012 09:39:46 -0500 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:16.0) Gecko/20121009 Thunderbird/16.0 |
On 11/02/2012 06:14 PM, Paul Moore wrote:
On Friday, November 02, 2012 06:00:29 PM Corey Bryant wrote:On 11/02/2012 05:29 PM, Paul Moore wrote:On Tuesday, October 23, 2012 03:55:31 AM Eduardo Otubo wrote:This patch includes a second whitelist right before the main loop. It's a smaller and more restricted whitelist, excluding execve() among many others. v2: * ctx changed to main_loop_ctx * seccomp_on now inside ifdef * open syscall added to the main_loop whitelist Signed-off-by: Eduardo Otubo <address@hidden>Unfortunately qemu.org seems to be down for me today so I can't grab the latest repo to review/verify this patch (some of my comments/assumptions below may be off) but I'm a little confused, hopefully you guys can help me out, read below ... The first call to seccomp_install_filter() will setup a whitelist for the syscalls that have been explicitly specified, all others will hit the default action TRAP/KILL. The second call to seccomp_install_filter() will add a second whitelist for another set of explicitly specified syscalls, all others will hit the default action TRAP/KILL.That's correct. The goal was to have a 2nd list that is a subset of the 1st list, and also not include execve() in the 2nd list. At this point though, since it's late in the release, we've expanded the 2nd list to be the same as the 1st with the exception of execve() not being in the 2nd list.The problem occurs when the filters are executed in the kernel when a syscall is executed. On each syscall the first filter will be executed and the action will either be ALLOW or TRAP/KILL, next the second filter will be executed and the action will either be ALLOW or TRAP/KILL; since the kernel always takes the most restrictive (lowest integer action value) action when multiple filters are specified, I think your double whitelist value is going to have some inherent problems.That's something I hadn't thought of. But TRAP and KILL won't exist together in our whitelists, and our 2nd whitelist is a subset of the 1st. So do you think there would still be problems?It doesn't really matter if the default action is TRAP and/or KILL, the point is that if you use a second whitelist after an initial whitelist the effective seccomp filter is going to be only the syscalls you explicitly allowed in the second whitelist. When using multiple seccomp filters on a process, all filters are executed for each syscall and the most restrictive action of all the filters is the action that the kernel takes. Don't get me wrong, I like the idea of progressively restricting QEMU, but if you are going to load multiple seccomp filters into the kernel, you almost certainly only want the first whitelist filter to be the union of all the seccomp filter you intend to load with all subsequent filters being blacklists which progressively remove syscalls which are allowed by the initial whitelist.
That's what we're doing though. The first whitelist is a union of all subsequent filters. Of course there's only one subsequent filter at this point. But the idea is to start out with a large whitelist for initialization and then tighten it up before the main loop when presumably less syscalls are needed.
My concern is getting the two whitelists correct. We keep uncovering new syscalls as we test.
I might suggest an initial, fairly permissive whitelist followed by a follow-on blacklist if you want to disable certain syscalls.I have to admit I'm nervous about this at this point in QEMU 1.3. It's getting late in the cycle and we'd hoped to get this in earlier. A more permissive whitelist is probably going to be the only way we'll successfully turn -sandbox on by default at this point in QEMU 1.3.Thats fine, I just wanted to point out that I think the multiple whitelist approach is going to have some inherent problems.
Are you thinking there will be problems with the current two-whitelist approach, or are you thinking there would be problems in the future if we continued restricting the QEMU process with further whitelists? If you mean the latter, then I understand your point since QEMU is a single process that requires a certain subset of syscalls.
I'm thinking once the two whitelists are in place, we can move on to restricting syscall parameters in the existing whitelists where it makes sense, and then look into your original decomposition approach, where parts of qemu are run in separate threads/processes which would allow much tighter seccomp restriction.
What do you think? -- Regards, Corey Bryant
[Prev in Thread] | Current Thread | [Next in Thread] |