I am looking at writing an Accumulo iterator to return a random sample of a percentile of a table

I am looking at writing an Accumulo iterator to return a random sample of a percentile of a table.

I would appreciate any suggestions.

Thnaks,

Chris

Answers


You can extend org.apache.accumulo.core.iterators.Filter and randomly accept x% of the entries. The following iterator would randomly return 5 percent of the entries.

import java.util.Random;

import org.apache.accumulo.core.data.Key;
import org.apache.accumulo.core.data.Value;
import org.apache.accumulo.core.iterators.Filter;

public class RandomAcceptFilter extends Filter {
    private Random rand = new Random();

    @Override
    public boolean accept(Key k, Value v) {
        return rand.nextDouble() < .05;
    }
}

Extending Ben Tse's answer slightly to allow variable amount of selection:

import java.util.Random;

import org.apache.accumulo.core.data.Key;
import org.apache.accumulo.core.data.Value;
import org.apache.accumulo.core.iterators.Filter;

public class RandomAcceptFilter extends Filter {
    private Random rand = new Random();
    private double percentToAllow;
    public static final String RATIO = "ratio";
    public static final String DEFAULT = "0.05";        

    @Override
    public void init(SortedKeyValueIterator<Key, Value> source, Map<String, String> options, IteratorEnvironment env) throws IOException {
        super.init(source, options, env);
        String option = options.containsKey(RATIO) ? options.get(RATIO) : DEFAULT;
        this.percentToAllow = Double.parseDouble(option);
    }

    @Override
    public boolean accept(Key k, Value v) {
        return rand.nextDouble() < this.percentToAllow;
    }
}

Then when you are calling your iterator from your code you'd do

IteratorSetting itr = new IteratorSetting(15, "myIterator", RandomAcceptFilter.class);
itr.addOption(RandomAcceptFilter.RATIO, "0.20");
myScanner.addScanIterator(itr);

Obviously you need to add bounds checking, etc, but you get the idea.


Need Your Help

Filter data out of long string (vcard)

javascript jquery regex string substring

I'm scanning data from a vcard QR-code. The string I receive always looks something like this: