1 year ago

#281934

test-img

Wowfunhappy

Prior to macOS Sierra, why didn't XNU handle THREAD_RESTART in its kqueue_scan_continue function?

I'm trying to find the cause of a nasty kernel panic triggered by Chromium Legacy, a project to backport modern versions of Chromium to old versions of macOS (10.7 – 10.10). The kernel panic occurs when the kqueue_scan_continue function is called with the wait_result parameter set to THREAD_RESTART.

In XNU 2422 (OS X 10.9.5), kqueue_scan_continue looks like this:

static void
kqueue_scan_continue(void *data, wait_result_t wait_result)
{
    thread_t self = current_thread();
    uthread_t ut = (uthread_t)get_bsdthread_info(self);
    struct _kqueue_scan * cont_args = &ut->uu_kevent.ss_kqueue_scan;
    struct kqueue *kq = (struct kqueue *)data;
    int error;
    int count;

    /* convert the (previous) wait_result to a proper error */
    switch (wait_result) {
    case THREAD_AWAKENED:
        kqlock(kq);
        error = kqueue_process(kq, cont_args->call, cont_args, &count,
            current_proc());
        if (error == 0 && count == 0) {
            wait_queue_assert_wait((wait_queue_t)kq->kq_wqs,
                KQ_EVENT, THREAD_ABORTSAFE, cont_args->deadline);
            kq->kq_state |= KQ_SLEEP;
            kqunlock(kq);
            thread_block_parameter(kqueue_scan_continue, kq);
            /* NOTREACHED */
        }
        kqunlock(kq);
        break;
    case THREAD_TIMED_OUT:
        error = EWOULDBLOCK;
        break;
    case THREAD_INTERRUPTED:
        error = EINTR;
        break;
    default:
        panic("%s: - invalid wait_result (%d)", __func__,
            wait_result);
        error = 0;
    }

    /* call the continuation with the results */
    assert(cont_args->cont != NULL);
    (cont_args->cont)(kq, cont_args->data, error);
}

It's easy to see why this leads to a kernel panic. The switch statement expects wait_result to be either THREAD_AWAKENED, THREAD_TIMED_OUT, or THREAD_INTERRUPTED. If it's something else, such as THREAD_RESTART, the default case is selected, and the kernel panics.

In macOS Sierra, Apple added an additional case to this switch statement to handle THREAD_RESTART:

    case THREAD_RESTART:
        error = EBADF;
        break;

When I add this code to older kernels and recompile XNU, they no longer panic while running Chromium Legacy.

My question is, why did it take Apple until macOS Sierra to handle THREAD_RESTART in this function? THREAD_RESTART is a valid value for wait_result_t, and is returned by various internal kernel functions.

The most obvious explanation is "Apple made a mistake", and that may be all it is! However, it feels like too obvious a mistake to go unnoticed for years in highly-sensitive kernel code!

Does this look like a simple human error, or is there a reason Apple may have thought that handling THREAD_RESTART was unnecessary? For example, is calling kqueue_scan_continue with THREAD_RESTART supposed to be impossible?


Just for reference, here's the Chromium Legacy GitHub issue where some smart people helped me figure out a lot of the information in this question.

macos

xnu

kqueue

0 Answers

Your Answer

Accepted video resources