Ip reassembly at intermediate node
I have the following requirment,
I have a Linux PC connected directly to an embedded board.
The Linux PC receives IP traffic from the Internet - it needs to forward this to the embedded board. However the embedded board does not have ability to reassemble IP fragments. Currently what we do is receive the reassembled packet in the linux pc and then sendto() to the emmbeded board. However given the high load of traffic this consumes too much CPU cycles in the Linux PC - since this invovles a copy from kernel space to user space and again the same packet is copied from user space to kernel space.
Is there a way for the kernel to reassemble the fragements and IP forward it to the embedded board without the packet having to come to user space? Note: I have the flexibility to make the destination IP of the IP packets as either the Linux PC or the embedded board.
Broadly speaking, no this is not built into the kernel, particularly if your reassembled packet exceeds the MTU size and therefore cannot be transmitted to your embedded board. If you wanted to do it, I'd suggest routing via a tun device and reassembling in user space, or (if you are just using tcp) using any old tcp proxy. If written efficiently it's hard to see why a linux PC would not be able to keep up with this if the embedded board can manage to process the output. If you insist on using the kernel, I think there is a tcp splice technique (see kernel-based (Linux) data relay between two TCP sockets) though whether that works at a segment level and thus does not reassemble, I don't know.
However, do you really need it? See: http://en.wikipedia.org/wiki/Path_MTU_Discovery
Here tcp sessions are sent with the DF bit set precisely so no fragmentation occurs. This means that most such tcp sessions won't actually need to support fragmentation.
Based on the title of the question, it appears you need to perform reassembly on the intermediate node (linux device). This doesn't mean you have to do it in the kernel space.
Take a look at DPDK. It is an opensource dataplane development kit. It may sound complicated, but all it does is use Poll Mode Drivers to get packets up to the user space with out the copying and interrupt overhead.
Please not, it uses poll mode drivers and will take up CPU cycles. You can use dpdk on a x86_64 hardware if you are ready to give up a couple of cores assuming you also want to fragment the packets in the reverse path.
Take a look at the sample application guide of DPDK for packet fragmentation and reassembly.