Force python http request to refresh

I am new to python and havent found anything which suggests this is probably dead easy.

The page I am scrapping is fairly simple but it completely updates every 2 minutes. I have managed to scrap all the data, but the issue is that even though the program runs every 2 minutes (I have tried through taskeng.exe and looping in the script), the html it is pulling from the website seems to refresh every 12 minutes. For the sake of clarity, the website I am scrapping has a time stamp when it updates. My program pulls that stamp (along with other data) and writes to a csv file. But its pulling the same data for 12 minutes and then suddenly the data arrives. So the output looks like:

16:30, Data1, Data2, Data3
16:30, Data1, Data2, Data3
...
16:30, Data1, Data2, Data3
16:42, Data1, Data2, Data3
16:42, Data1, Data2, Data3

where as it should be:

16:30, Data1, Data2, Data3
16:32, Data1, Data2, Data3
16:34, Data1, Data2, Data3
16:36, Data1, Data2, Data3
16:38, Data1, Data2, Data3
16:40, Data1, Data2, Data3
16:42, Data1, Data2, Data3

I think this has to do with the cache on myside. How can I force my http requests to completely refresh or force python to not store it in the cache?

I am using BeautifulSoup and Mechanize. My code for the http request is below:

mech = Browser()

url = "http://myurl.com"

page = mech.open(url)

html = page.read()
soup = BeautifulSoup(html)

If it helps to post all my code, I can do that. Thanks in advance for any advice

Answers


You could use a simpler tool like requests.

import requests
response = requests.get(url)
html = response.text

But if you really want to stick with mechanize you can also skip the Browser() stuff (which is probably introducing cookies into your requests). Check the mechanize docs for more details.

response = mechanize.urlopen("http://foo.bar.com/")
html = response.read() # or readlines 

Need Your Help

How to str_replace Google News RSS for Facebook Share?

regex rss preg-match simplexml str-replace

Hi I'm using simpleXML to display a news.google.com feed.

How to intent to another page on android/pop up a message from idle time?

java android eclipse

Halo, the first i want to know the idle time at my android application. after that, i will do something if it is a idle time mode.