memcpy fails when trying to copy from mm_struct->start_code

I'm experiencing some very odd behavior with the code segment pointed to by task->mm and I hope someone out there can help me out. What I'm doing is pulling the code segment out and putting it in a buffer and then generating an HMAC from it. Occasionally I'll get an Ooops! saying that the can't with the call stack terminating at memcpy(). What appears to be happening is that the data goes away in the middle of the copy process and it causes a page fault and then the Ooops. I've searched far and wide for references to this seemingly ephemeral nature of the memory used in mm_struct, but have found nothing. I don't believe I'm doing anything controversial in the code; here it is with comments etc removed for brevity.

struct mm_struct* __mm;
...

__mm = get_task_mm(__task);
if(likely(__mm))
{
    __buflen = (__mm->end_code - __mm->start_code);
    if(likely(__buflen > 0))
    {
        __buf = (unsigned char*)__get_buffer(__buflen);
        if(likely(__buf))
        {          
            preempt_disable();
            memcpy(__buf, (uint8_t*)__mm->start_code, __buflen);
            preempt_enable();

            mmput(__mm);

            if(unlikely(!__do_ntru_hmac(__buf, __buflen, __hmac)))
            {
                __retcode = 0;
            }

            __release_buffer(__buf, __buflen);
        }
        else
        {
            printk(KERN_ERR "[%s] Buffer allocation failure [%d]\n", __task->comm, __buflen);
            __retcode = 0;
        }

        ...

The memory allocation routines are simple and aimed at being able to allocate large blocks of memory at once. They look like this:

void* __get_buffer(unsigned long __buflen)
{
    if(likely(__buflen <= KMALLOC_MAX_SIZE))
    {
        return kmalloc(__buflen, GFP_KERNEL);
    }
    else
    {
        return (void*)__get_free_pages(GFP_KERNEL, get_order(__buflen));
    }

    return NULL;
}

void __release_buffer(void* __buffer, unsigned long __buflen)
{
    if(likely(__buflen <= KMALLOC_MAX_SIZE))
    {
        kfree(__buffer);
    }
    else
    {
        free_pages((unsigned long)__buffer, get_order(__buflen));
    }

    return;
 }

The error seems to occur randomly and I can't tie it to a task, parent or any other components of struct task_struct. I've tried mutexes and spinlocks to protect the memory during memcpy, I've tried stopping the task altogether using set_task_state() and restarting it after the copy, but nothing seems to stop the problem.

UPDATE: I'm still hammering away at this problem and though I'd toss in some more data. Here's the Oops dump.

Mar 16 09:39:27 ubuntu kernel: [  324.229195] BUG: unable to handle kernel paging request at 0804b000
Mar 16 09:39:27 ubuntu kernel: [  324.229199] IP: [<c1312dfd>] memcpy+0x1d/0x40
Mar 16 09:39:27 ubuntu kernel: [  324.229221] *pdpt = 000000002cf4c001 *pde = 000000003b72c067
Mar 16 09:39:27 ubuntu kernel: [  324.229223] Oops: 0000 [#1] SMP
Mar 16 09:39:27 ubuntu kernel: [  324.229225] Modules linked in: aerolock(OF) vmhgfs(OF) vmw_balloon psmouse snd_ens1371 serio_raw gameport snd_ac97_codec ac97_bus snd_pcm snd_seq_midi btusb snd_rawmidi snd_seq_midi_event snd_seq snd_timer snd_seq_device vmwgfx snd ttm drm bnep rfcomm soundcore mac_hid bluetooth snd_page_alloc vmw_vmci i2c_piix4 parport_pc ppdev shpchp lp parport hid_generic usbhid hid pcnet32 mptspi ahci libahci mptscsih mptbase floppy mii vmw_pvscsi vmxnet3
Mar 16 09:39:27 ubuntu kernel: [  324.229256] CPU: 0 PID: 2880 Comm: aerolockd Tainted: GF          O 3.11.0-17-generic #31~precise1-Ubuntu
Mar 16 09:39:27 ubuntu kernel: [  324.229258] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 07/31/2013
Mar 16 09:39:27 ubuntu kernel: [  324.229259] task: f1816700 ti: ed774000 task.ti: ed774000
Mar 16 09:39:27 ubuntu kernel: [  324.229262] EIP: 0060:[<c1312dfd>] EFLAGS: 00010202 CPU: 0
Mar 16 09:39:27 ubuntu kernel: [  324.229264] EIP is at memcpy+0x1d/0x40
Mar 16 09:39:27 ubuntu kernel: [  324.229266] EAX: ecc80000 EBX: 00011cd0 ECX: 00003b34 EDX: 08048000
Mar 16 09:39:27 ubuntu kernel: [  324.229268] ESI: 0804b000 EDI: ecc83000 EBP: ed775e74 ESP: ed775e68
Mar 16 09:39:27 ubuntu kernel: [  324.229269]  DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068
Mar 16 09:39:27 ubuntu kernel: [  324.229271] CR0: 80050033 CR2: 0804b000 CR3: 2d9d5000 CR4: 001407f0
Mar 16 09:39:27 ubuntu kernel: [  324.229345] Stack:
Mar 16 09:39:27 ubuntu kernel: [  324.229347]  00011cd0 f1816700 f33703b4 ed775eb0 f9ba3a1b 0000063b 00000000 00000000
Mar 16 09:39:27 ubuntu kernel: [  324.229353]  0000063c c1b80e4c f1816700 ed775ee0 08048000 ecc80000 00000000 f1816700
Mar 16 09:39:27 ubuntu kernel: [  324.229358]  f9baa952 f1816700 ed775f08 f9ba3b6e 00000000 00000000 00000000 c1b9d642
Mar 16 09:39:27 ubuntu kernel: [  324.229364] Call Trace:
Mar 16 09:39:27 ubuntu kernel: [  324.229370]  [<f9ba3a1b>] __generate_hmac+0x8b/0x190 [aerolock]
Mar 16 09:39:27 ubuntu kernel: [  324.229373]  [<f9ba3b6e>] __validate_hmac+0x4e/0x220 [aerolock]
Mar 16 09:39:27 ubuntu kernel: [  324.229377]  [<f9ba3da0>] ret_do_fork+0x60/0x70 [aerolock]
Mar 16 09:39:27 ubuntu kernel: [  324.229384]  [<c167f12a>] trampoline_handler+0x11a/0x1c0
Mar 16 09:39:27 ubuntu kernel: [  324.229390]  [<c10839a4>] ? wake_up_new_task+0xe4/0x150
Mar 16 09:39:27 ubuntu kernel: [  324.229394]  [<c1054bf5>] ? SyS_clone+0x25/0x30
Mar 16 09:39:27 ubuntu kernel: [  324.229397]  [<c1054bf5>] ? SyS_clone+0x25/0x30
Mar 16 09:39:27 ubuntu kernel: [  324.229400]  [<c167efee>] kretprobe_trampoline+0x16/0x38
Mar 16 09:39:27 ubuntu kernel: [  324.229404]  [<c167efd8>] ? kretprobe_trampoline_holder+0x8/0x8
Mar 16 09:39:27 ubuntu kernel: [  324.229406]  [<c167c937>] syscall_call+0x7/0xb
Mar 16 09:39:27 ubuntu kernel: [  324.229408] Code: c3 90 8d 74 26 00 e8 33 fe ff ff eb e8 90 55 89 e5 83 ec 0c 89 5d f4 89 75 f8 89 7d fc 3e 8d 74 26 00 89 cb 89 c7 c1 e9 02 89 d6 <f3> a5 89 d9 83 e1 03 74 02 f3 a4 8b 5d f4 8b 75 f8 8b 7d fc 89
Mar 16 09:39:27 ubuntu kernel: [  324.229439] EIP: [<c1312dfd>] memcpy+0x1d/0x40 SS:ESP 0068:ed775e68
Mar 16 09:39:27 ubuntu kernel: [  324.229444] CR2: 000000000804b000
Mar 16 09:39:27 ubuntu kernel: [  324.229447] ---[ end trace 3c014cb0223fa59a ]--- 

I've tried a lot of different tacks but have met with failure on all. copy_from_user() for example fails every time; sometimes not reading the whole request and sometimes returning a partial. Every time it fails on a partial it does so on a page boundary--again making it seem like the memory is being taken away mid-copy.

Given that I'm hooking do_fork(), could the process just be transitioning from kernel space to user space while I'm trying to capture it? As I mentioned before, I've tried stopping the current task and restarting post copy, but it has no effect.

Also interesting to note; I've had the same (memcpy()) code running for six weeks straight under varying loads without a failure on a single processor ARM BeagleBoard Black running Ubuntu 12.04 (3.8.13-bone28). The problem only seems to happen on my x86 box running Ubuntu 12.04 and then only when I put a heavy load on it like starting Chromium.

Sorry to be so long winded here, I'm stumped.

Any ideas?

Thanks again in advance,

Pete

Answers


You can't reliably copy memory directly from userspace, as you've discovered.

Use copy_from_user() instead of memcpy(). And don't disable preemption, there's no point.


Need Your Help

Apply Spring formatter to non-form text in JSP

spring jsp spring-mvc

In Spring MVC 3, I have a customer Formatter that converts my entity objects to text and parses the text for my entity objects. It's registered with the conversionService bean. This link shows how it

How to remove columns space in column chart

asp.net highcharts

I am using HighChart.dll to use high charts in ASP.NET