Scrapy (Python) throws ImportError when running with cron

I'm running a scrapy spider with cron, but it throws an ImportError exception:

Traceback (most recent call last):
  File "/Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py", line 2, in <module>
    import scrapy
  File "/Library/Python/2.7/site-packages/scrapy/__init__.py", line 48, in <module>
    from scrapy.spiders import Spider
  File "/Library/Python/2.7/site-packages/scrapy/spiders/__init__.py", line 10, in <module>
    from scrapy.http import Request
  File "/Library/Python/2.7/site-packages/scrapy/http/__init__.py", line 12, in <module>
    from scrapy.http.request.rpc import XmlRpcRequest
  File "/Library/Python/2.7/site-packages/scrapy/http/request/rpc.py", line 7, in <module>
    from six.moves import xmlrpc_client as xmlrpclib
ImportError: cannot import name xmlrpc_client

The strange thing is that when I run the script that is being run by cron it works fine.

The cron is set as

*   *   *   *   *   sh /Users/som/sh/hm_scraping.sh

and the script is

#!/bin/bash
python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py

I'm using the CrawlerProcess class as described here: http://doc.scrapy.org/en/latest/topics/practices.html

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})
process.crawl(HmSpider)
process.start()

================================================ EDIT

Based on MuhammadTahir and lapinkoira comments I tested the following directly in the terminal:

/usr/bin/python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py

and

sudo -u som /usr/bin/python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py

The first one runs fine, but when I use sudo (I've ran without setting the user as well) it returns the same problem. Maybe cron uses sudo in the background.

Any ideas??

Thanks!

Answers


I would try one of both:

1- Activate the env first:

source /path/of/your/venv/bin/activate && /path/of/your/venv/bin/python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py

2- or without activating the env (may not work):

/path/of/your/venv/bin/python /Users/som/scrapy_testing/scrapy_testing/spiders/hm_spiders.py

Need Your Help

in android when video duration or date_modified attribute available through content provider

android image video media duration

I'm having some trouble accessing attributes like DURATION and DATE_MODIFIED of video and image. My app access these attributes for both type of data but if I take a picture or shoot a video the re...

Function that executes once and only when an element is fully loaded

javascript element execute exists

I'm having a bit of difficulty getting a JavaScript function to execute once and only when a DOM element is fully loaded. I've tried countless combindations of setIntervals, seTimeouts, FOR loops, IF