BeautifulSoup Python - HTML Table Data Issues

I need to extract a value from a html table which can be grabbed from a webserver in a txt file. The exact requirement is to extract the last temp read time wise in to a variable.

The formatting of this table is not perfect I think.

Here is an example of the html code for the table...

<table border="1" rules="all">
<col />
<col />
   <col align="char" char="." />
   <col align="char" char="." />
   <col />
   <col />
   <col align="char" char="m" />
   <col align="char" char="m" />
   <col align="char" char="." />
   <col align="char" char="," />
   <tr>
     <th colspan="2" rowspan="2">Date &amp; time</th>
    <th rowspan="2">Temp</th>
    <th rowspan="2">Feels like</th>
    <th rowspan="2">Humidity</th>
    <th colspan="3">Wind</th>
    <th rowspan="2">Rain</th>
    <th rowspan="2">Pressure</th>
  </tr>
  <tr>
    <th>dir</th>
    <th>ave</th>
    <th>gust</th>
  </tr>
  <tr>
    <td>2014/01/08</td>
    <td>1056 GMT</td>
    <td>11.0 &deg;C</td>
    <td>9.8 &deg;C</td>
    <td>74%</td>
    <td>NNW</td>
    <td>1 mph</td>
    <td>6 mph</td>
    <td>0.3 mm</td>
    <td>1032.4 hPa, rising</td>
  </tr>
  <tr>
    <td></td>
    <td>1159 GMT</td>
    <td>10.8 &deg;C</td>
    <td>9.7 &deg;C</td>
    <td>74%</td>
    <td>SSE</td>
    <td>1 mph</td>
    <td>4 mph</td>
    <td>0.0 mm</td>
    <td>1032.0 hPa, rising slowly</td>
  </tr>
  <tr>
    <td></td>
    <td>1258 GMT</td>
    <td>11.0 &deg;C</td>
    <td>9.9 &deg;C</td>
    <td>73%</td>
    <td>SSE</td>
    <td>1 mph</td>
    <td>4 mph</td>
    <td>0.0 mm</td>
    <td>1031.5 hPa, falling slowly</td>
  </tr>
  <tr>
    <td></td>
    <td>1357 GMT</td>
    <td>10.8 &deg;C</td>
    <td>9.7 &deg;C</td>
    <td>75%</td>
    <td>SSW</td>
    <td>1 mph</td>
    <td>4 mph</td>
    <td>0.0 mm</td>
    <td>1030.7 hPa, falling</td>
  </tr>
  <tr>
    <td></td>
    <td>1456 GMT</td>
    <td>10.3 &deg;C</td>
    <td>9.3 &deg;C</td>
    <td>77%</td>
    <td>ENE</td>
    <td>1 mph</td>
    <td>4 mph</td>
    <td>0.0 mm</td>
    <td>1030.0 hPa, falling</td>
  </tr>
  <tr>
    <td></td>
    <td>1600 GMT</td>
    <td>9.7 &deg;C</td>
    <td>8.7 &deg;C</td>
    <td>81%</td>
    <td>WNW</td>
    <td>1 mph</td>
    <td>3 mph</td>
    <td>0.0 mm</td>
    <td>1028.7 hPa, falling</td>
  </tr>
  <tr>
    <td></td>
    <td>1658 GMT</td>
    <td>8.9 &deg;C</td>
    <td>7.9 &deg;C</td>
    <td>86%</td>
    <td>NNE</td>
    <td>1 mph</td>
    <td>4 mph</td>
    <td>0.0 mm</td>
    <td>1026.9 hPa, falling quickly</td>
  </tr>
</table>

I have the following python code which puts all data into rows

#!/usr/bin/python
from BeautifulSoup import BeautifulSoup
import urllib2
data = "http://****************/weather_station/data/6hrs.txt"
req = urllib2.Request(data)
page = urllib2.urlopen(req)
soup = BeautifulSoup(page)

table = soup.find('table')
for row in table.findAll('tr'):
        col = row.findAll('td')
#       time = col[0].string
#       temp = col[1].string

print col

This is where I'm stuck. The time = col[0].string returns an error list index out of range implying that there is nothing in the list, but if I print col, it displays the data I wish to extract.

Any suggestions?

#

The answer below works great for that table. I also wish to get the same data from a table like this...

<table border="1" rules="rows" cellspacing="0" cellpadding="5">
  <col />
  <col />
  <col align="char" char="." />
  <col align="char" char="." />
  <col />
  <col />
  <col align="char" char="m" />
  <col align="char" char="m" />
  <col align="char" char="." />
  <col align="char" char="," />
  <tr>
    <th rowspan="2">Time</th>
    <th rowspan="2">Temp</th>
    <th rowspan="2">Feels like</th>
    <th rowspan="2">Humidity</th>
    <th colspan="3">Wind</th>
    <th rowspan="2">Rain</th>
    <th rowspan="2">Pressure</th>
  </tr>
  <tr>
    <th>dir</th>
    <th>ave</th>
    <th>gust</th>
  </tr>
<tr>
<td>12:45 <small>GMT:</small></td>
<td>8.8<small>C</small></td>
<td>7.1 <small>&deg;C</small></td>
<td>66<small>%</small></td>
<td>W </td>
<td>1 <small>mph</small></td>
<td>2 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:40 <small>GMT:</small></td>
<td>8.9<small>C</small></td>
<td>6.9 <small>&deg;C</small></td>
<td>66<small>%</small></td>
<td>SE </td>
<td>2 <small>mph</small></td>
<td>4 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:34 <small>GMT:</small></td>
<td>8.8<small>C</small></td>
<td>6.3 <small>&deg;C</small></td>
<td>66<small>%</small></td>
<td>NE </td>
<td>3 <small>mph</small></td>
<td>7 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:29 <small>GMT:</small></td>
<td>9.0<small>C</small></td>
<td>6.4 <small>&deg;C</small></td>
<td>64<small>%</small></td>
<td>NW </td>
<td>3 <small>mph</small></td>
<td>6 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:24 <small>GMT:</small></td>
<td>9.6<small>C</small></td>
<td>7.4 <small>&deg;C</small></td>
<td>63<small>%</small></td>
<td>S </td>
<td>2 <small>mph</small></td>
<td>5 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:19 <small>GMT:</small></td>
<td>10.1<small>C</small></td>
<td>7.4 <small>&deg;C</small></td>
<td>61<small>%</small></td>
<td>SW </td>
<td>4 <small>mph</small></td>
<td>6 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:14 <small>GMT:</small></td>
<td>10.8<small>C</small></td>
<td>8.9 <small>&deg;C</small></td>
<td>61<small>%</small></td>
<td>SE </td>
<td>2 <small>mph</small></td>
<td>2 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:09 <small>GMT:</small></td>
<td>10.7<small>C</small></td>
<td>8.8 <small>&deg;C</small></td>
<td>61<small>%</small></td>
<td>N </td>
<td>2 <small>mph</small></td>
<td>3 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>12:04 <small>GMT:</small></td>
<td>10.3<small>C</small></td>
<td>8.5 <small>&deg;C</small></td>
<td>64<small>%</small></td>
<td>NE </td>
<td>2 <small>mph</small></td>
<td>3 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>11:58 <small>GMT:</small></td>
<td>9.3<small>C</small></td>
<td>7.6 <small>&deg;C</small></td>
<td>65<small>%</small></td>
<td>N </td>
<td>1 <small>mph</small></td>
<td>2 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>11:53 <small>GMT:</small></td>
<td>9.3<small>C</small></td>
<td>7.8 <small>&deg;C</small></td>
<td>65<small>%</small></td>
<td>W </td>
<td>0 <small>mph</small></td>
<td>2 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1022 <small>hPa</small></td>
</tr>
<tr>
<td>11:48 <small>GMT:</small></td>
<td>8.8<small>C</small></td>
<td>7.1 <small>&deg;C</small></td>
<td>66<small>%</small></td>
<td>W </td>
<td>1 <small>mph</small></td>
<td>2 <small>mph</small></td>
<td>0.0 <small>mm</small></td>
<td>1021 <small>hPa</small></td>
</tr>
</table>

Using the same code as below

table = soup.find('table')
for row in table.findAll('tr')[1:]:
        col = row.findAll('td')
        if len(col) >= 2:
                time = col[0].string
                temp = col[1].string
print time
print temp

time & temp return 'none'

If I print col all of the values are there. Why is the len(col) >= 2 not working for that data?

Answers


You crash because you try to get td's from this tr:

<tr>
 <th colspan="2" rowspan="2">Date &amp; time</th>
 <th rowspan="2">Temp</th>
 <th rowspan="2">Feels like</th>
 <th rowspan="2">Humidity</th>
 <th colspan="3">Wind</th>
 <th rowspan="2">Rain</th>
 <th rowspan="2">Pressure</th>
</tr>

Just add something like this:

col = row.findAll('td')
if len(col) >= 2:
    time = col[0].string
    temp = col[1].string

Need Your Help

Why does scanf require &?

c scanf

I want to read a number from stdin. I don't understand why scanf requires the use of &amp; before the name of my variable:

$.each with Ajax and Variable Modification

javascript jquery ajax closures

Why is whereString returning "where " and not "where "+some data. I know this has todo with closures and scoping but I'm not sure how to resolve it.