Using SQL/Python to build probability distribution

Suppose the following jobs table:

 - name
 - position
 - is_featured (boolean)

I would like to build a list of suggested jobs for a given user, where jobpost.position matches the user's position (for example, an accountant would receive jobs in accounting).

The basic query to accomplish this would be something like:

SELECT name FROM jobpost WHERE jobpost.position IN (list of user positions) LIMIT 10

I also want to make sure that jobs that are featured (is_featured=True) receive extra weight. Then I need to build a probility distribution list from which a random number of jobs would be selected. For this I was thinking of building a python list of tuples, with the job name and probability, and then using random.random(). For example, something like (in pseudocode):

x = [('job 1', 0.2), ('job 2', 0.2), ('job 3', 0.2),  ('job 4', 0.4)]
# pick three out of the list of jobs above
random.random.sample(x,  3)

I have three questions related to this:

  1. Does this seem like the right approach?

  2. How would I use the random module (or another one) to select n number of objects with each object having a certain given probability?

  3. In terms of giving a weighted average to a featured job over a non-featured job, would the following query be the correct approach? If not, what would be a better way?

    SELECT name, 1 * (CASE WHEN is_featured=True THEN % ELSE 1) as weighted_average FROM ...

This would give me tuple with the job name and the relative weight.


I'd recommend using SQLAlchemy and make use of Python's syntax, not SQL's. But to solve your problem, here's what I'd try:

import random

x = [('job 1', 0.2), ('job 2', 0.2), ('job 3', 0.2), ('job 4', 0.2), ('job 5', 0.4)]

def random_choice(jobs_list, number=1):
  jobs = jobs_list[:]
  choice = random.uniform(0, 1)
  total = 0

  while total < number:
    index, job_item = random.choice(list(enumerate(jobs)))
    job, weight = job_item

    if weight > random.uniform(0, 1):
      yield job
      total += 1

print list(random_choice(x, 3))

Just as a side note, in order for your "weights" to follow a discrete PDF, they need to add up exactly to 1.0.

Truthfully, I don't think this "weighting" is a proper solution to the problem. You don't have a proper PDF to create the discrete "weight" for a given job. Instead, why not just have separate categories for jobs?

Need Your Help

Different Debug Keystores in Eclipse on a per project base?

android eclipse debugging keystore

I do need to have different debug keystores for several Android projects. How can I do that with plain Eclipse (no additional tools)?

Zf2 multicheckbox at least one element is always required

zend-framework doctrine-orm zend-framework2 zend-form

I can't validate a zf2 form with multicheckbox because at least one checkbox is always required.