Hadoop passing variables to mapper and reducer

I'm a complete beginner with Hadoop. I've built Word Count, and I'm fairly sure I understand the basics, but I'm having trouble extending that to an actual problem. My (modified) code is below:

for (Item i : set) {
    for (Item j : set) {
        Score s = score(i, j);
        renderer.render(s);
    }
}

I'd like to use Hadoop to distribute this. I can write a Mapper and a Reducer, but I don't know how to pass the set to the Mapper and the renderer to the Reducer (or if that's even the idiomatic way to handle this). I also feel like I need to write my own Writable to handle passing the pair (i, j) between the Mapper and Reducer, but I don't know the best way to do this. Any help would be appreciated.

Answers


What you are doing is essentially a cartesian product of set with itself. You would probably need to implement a custom input format.

Here is an example of a generic Cartesian Product job: https://github.com/adamjshook/mapreducepatterns/blob/master/MRDP/src/main/java/mrdp/ch5/CartesianProduct.java

You can see the same logic you have done above to the input paths at line 67-77: https://github.com/adamjshook/mapreducepatterns/blob/master/MRDP/src/main/java/mrdp/ch5/CartesianProduct.java#L67-L77


Need Your Help

NTL header file not found

c++ ubuntu ntl

I have downloaded and installed the NTL library on my Ubuntu. I'm currently using gedit to write my program and having included this ZZ.h header in my program. This is how i compile my program in the

How can I measure CPU time in C++ on windows and include calls of system()?

c++ windows cpu-time

I want to run some benchmarks on a C++ algorithm and want to get the CPU time it takes, depending on inputs. I use Visual Studio 2012 on Windows 7. I already discovered one way to calculate the CPU...