How to analyze twitters messages? (improving my algorithm)

I had a nice idea to implement. I call it


The idea goes like this, imagine you are driving or traveling all over the world and when you see some obstacle or damage - broken light, trash which cover all the street or any other problem you would like the responsible authority will fix it.

all you have to do is tweet something like that, and you can add a photo, and of course location, using the inherit location service of twitter or Facebook applications.

Tweet like this:

@FixTheUnFixed there is a broken fire hydrant here
@FixTheUnFixed my cellular company charged me 18,572$
  for using my iPhone aboard.

I thought a lot about how to get processing the messages. most of the issues that will come up are municipality concerns and I would like to get the location and re-tweet to the relevant municipality or to send them an email.

my two ideas for getting this address are by google it (with google API).

the pseudo algorithm is:

1. get the location the Twitter's or Facebook's status sent from.
2. look for key words such as trash, cats, animals etc.
3. finding the relevant authority e-mail , twitter or Facebook account.
4. send the message to the authority account and re-tweet it to the public
     world so they can follow if there is any change.
  • In 3.@algo is there any smart way to implement it?
  • I don't want to spam the authorities and and neither publish spam of sneaky people.
  • How can I improve the algorithm above?
  • How can I search for the communication resources of the relvent authorities?


My suggestion is to start by using Amazon Mechanical Turk - pay real people a tiny fee for each tweet they process. They would need to determine whether it was spam or not, then, if legitimate, they would then search for the correct municipality contact info. Meanwhile, collect detailed stats on each tweet that is processed, from which you could build a database. For instance you would be able to see that all tweets containing "Garbage" and "Chicago" generate a reply with a certain phone number. Once you got enough data you could use it to automate common/well-specified incoming tweets, and gradually build from there, constantly refining your data & associations using the research done by the Turk workers.

Would also suggest to only implement the service for limited areas to start - say, New York or London. (Or the largest city near wherever you are) That way the information needed to start off with is much smaller.

As a first step towards your solution, I would suggest plugging the latitude/longitude into SimpleGeo (they have an iOS library):

Using something like "Find boundaries surrounding a location", you could retrieve information about the county, municipality, legislative district, etc. which might give you supporting metadata as well as a few outlets to dig for contact information:

For instance, I'm sure you could turn the legislative district into the email address of a member of congress through some publicly available website/API. Perhaps send their office a bi-weekly or monthly batch email of all reported issues in their district and put pressure on the elected officials to enact the appropriate change?

Another option could be to display your database of reported issues on a publicly available website and collect the appropriate contact information through crowd sourcing. Allow members of the website to add/update email addresses that can be used for currently reported issues and issues you may receive for the same location in the future. could likely be used for this? its a service to automate an action based on your custom criteria.

maybe you could hook up with them?

I think the right thing to do is to use existing NLP library such as Stanford nlp library.

Which includes:

  • Stanford CoreNLP
  • Stanford Parser
  • Stanford Classifier

Alternatively, you can use opennlp or nltk. If the NLP framework is in java and you want to use python or ruby as OP wanted check jruby and jython out.

Need Your Help

Proguard keep class names?

java android obfuscation proguard

Hello I am writing an Android app and I have set up Proguard to obfuscate my application. I however use a classloader to dynamically load different extensions to my application. The problem is that...

Yum crashed with Keyboard Interrupt error

python-3.x redhat yum fedora16

I installed the newer version of python (3.2.3) than the one available in Fedora16 (python2.7)