Run scrapy program from within python script

Question

I'm trying to run scrapy from a python script. I've almost succeeded(I think) to do this but something just doesn't work. In my code I have a line like this run_spider(quotes5). quotes5 is the name of my scrapy that I used to execute like this in cmd: scrapy crawl quotes5. Any help, please? The error is that quotes5 is undefined.

This is my code:

import scrapy
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
import json
import csv
import re
from crochet import setup
from importlib import import_module
from scrapy.utils.project import get_project_settings
setup()


def run_spider(spiderName):
    module_name="WS_Vardata.spiders.{}".format(spiderName)
    scrapy_var = import_module(module_name)   #do some dynamic import of selected spider   
    spiderObj= scrapy_var.QuotesSpider()           #get mySpider-object from spider module
    crawler = CrawlerRunner(get_project_settings())   #from Scrapy docs
    crawler.crawl(spiderObj)  

run_spider(quotes5)

Scrapy code (quotes_spider.py):

import scrapy
import json
import csv
import re

class QuotesSpider(scrapy.Spider):
name = "quotes5"

def start_requests(self):
    with open('input.csv','r') as csvf:
        urlreader = csv.reader(csvf, delimiter=',',quotechar='"')
        for url in urlreader:
            if url[0]=="y":
                yield scrapy.Request(url[1])
    #with open('so_52069753_out.csv', 'w') as csvfile:
        #fieldnames = ['Category', 'Type', 'Model', 'SK']
        #writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        #writer.writeheader()

def parse(self, response):

    regex = re.compile(r'"product"\s*:\s*(.+?\})', re.DOTALL)
    regex1 = re.compile(r'"pathIndicator"\s*:\s*(.+?\})', re.DOTALL)
    source_json1 = response.xpath("//script[contains(., 'var digitalData')]/text()").re_first(regex)
    source_json2 = response.xpath("//script[contains(., 'var digitalData')]/text()").re_first(regex1)
    model_code = response.xpath('//script').re_first('modelCode.*?"(.*)"')

    if source_json1 and source_json2:
        source_json1 = re.sub(r'//[^\n]+', "", source_json1)
        source_json2 = re.sub(r'//[^\n]+', "", source_json2)
        product = json.loads(source_json1)
        path = json.loads(source_json2)
        product_category = product["pvi_type_name"]
        product_type = product["pvi_subtype_name"]
        product_model = path["depth_5"]
        product_name = product["model_name"]


    if source_json1 and source_json2:
        source1 = source_json1[0]
        source2 = source_json2[0]
        with open('output.csv','a',newline='') as csvfile:
            fieldnames = ['Category','Type','Model','Name','SK']
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            if product_category:
                writer.writerow({'Category': product_category, 'Type': product_type, 'Model': product_model, 'Name': product_name, 'SK': model_code})

enter image description here

ARR · Accepted Answer · 2018-09-25 21:41:31Z

2

As the error says quote5 is undefined you need to define quote5 before passing it to the method. Or try something like this :

run_spider(“quotes5”)

Edited:

import WS_Vardata.spiders.quotes_spiders as quote_spider_module
def run_spider(spiderName):
    #get the class from within the module
    spiderClass = getattr(quote_spider_module, spiderName)
    #create the object and your good to go
    spiderObj= spiderClass()
    crawler = CrawlerRunner(get_project_settings())   #from Scrapy docs
    crawler.crawl(spiderObj)  

run_spider("QuotesSpider")

This script should run in the same directory as WS_Vardata

So in your case:

- TEST
| the_code.py
| WS_Vardata
   | spiders
     | quotes_spider <= containing QuotesSpider class

edited Sep 25, 2018 at 21:41

answered Sep 25, 2018 at 20:38

ARR

2,3383 gold badges21 silver badges33 bronze badges

Sign up to request clarification or add additional context in comments.

12 Comments

PyRar Over a year ago

run_spider("quotes5") works! thank you sir! I have a new error.."ModuleNotFoundError: No module named 'WS_Vardata.spiders'". The location of my scrapy program is this "C:\Users\raresb\Desktop\TEST\WS_Vardata\spiders\quotes_spider". "quotes_spider" is the scrapy program.

ARR Over a year ago

Do you got “init.py” in WS_Vardata and spiders folder?

PyRar Over a year ago

Yes sir. "init.py" is in both of them.

PyRar Over a year ago

The code attached in the post above? It's in a whole other location. On desktop.

ARR Over a year ago

Place your code in the same directory as WS_Vardata is in and run it

|

Collectives™ on Stack Overflow

Run scrapy program from within python script

1 Answer 1

12 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

12 Comments

Your Answer

Sign up or log in

Post as a guest

Related