0

enter image description hereenter image description hereI'm trying to run scrapy from a python script. I've almost succeeded(I think) to do this but something just doesn't work. In my code I have a line like this run_spider(quotes5). quotes5 is the name of my scrapy that I used to execute like this in cmd: scrapy crawl quotes5. Any help, please? The error is that quotes5 is undefined.

This is my code:

import scrapy
from twisted.internet import reactor
from scrapy.crawler import CrawlerRunner
from scrapy.utils.log import configure_logging
import json
import csv
import re
from crochet import setup
from importlib import import_module
from scrapy.utils.project import get_project_settings
setup()


def run_spider(spiderName):
    module_name="WS_Vardata.spiders.{}".format(spiderName)
    scrapy_var = import_module(module_name)   #do some dynamic import of selected spider   
    spiderObj= scrapy_var.QuotesSpider()           #get mySpider-object from spider module
    crawler = CrawlerRunner(get_project_settings())   #from Scrapy docs
    crawler.crawl(spiderObj)  

run_spider(quotes5)

Scrapy code (quotes_spider.py):

import scrapy
import json
import csv
import re

class QuotesSpider(scrapy.Spider):
name = "quotes5"

def start_requests(self):
    with open('input.csv','r') as csvf:
        urlreader = csv.reader(csvf, delimiter=',',quotechar='"')
        for url in urlreader:
            if url[0]=="y":
                yield scrapy.Request(url[1])
    #with open('so_52069753_out.csv', 'w') as csvfile:
        #fieldnames = ['Category', 'Type', 'Model', 'SK']
        #writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
        #writer.writeheader()

def parse(self, response):

    regex = re.compile(r'"product"\s*:\s*(.+?\})', re.DOTALL)
    regex1 = re.compile(r'"pathIndicator"\s*:\s*(.+?\})', re.DOTALL)
    source_json1 = response.xpath("//script[contains(., 'var digitalData')]/text()").re_first(regex)
    source_json2 = response.xpath("//script[contains(., 'var digitalData')]/text()").re_first(regex1)
    model_code = response.xpath('//script').re_first('modelCode.*?"(.*)"')

    if source_json1 and source_json2:
        source_json1 = re.sub(r'//[^\n]+', "", source_json1)
        source_json2 = re.sub(r'//[^\n]+', "", source_json2)
        product = json.loads(source_json1)
        path = json.loads(source_json2)
        product_category = product["pvi_type_name"]
        product_type = product["pvi_subtype_name"]
        product_model = path["depth_5"]
        product_name = product["model_name"]


    if source_json1 and source_json2:
        source1 = source_json1[0]
        source2 = source_json2[0]
        with open('output.csv','a',newline='') as csvfile:
            fieldnames = ['Category','Type','Model','Name','SK']
            writer = csv.DictWriter(csvfile, fieldnames=fieldnames)
            if product_category:
                writer.writerow({'Category': product_category, 'Type': product_type, 'Model': product_model, 'Name': product_name, 'SK': model_code})

enter image description here

1 Answer 1

2

As the error says quote5 is undefined you need to define quote5 before passing it to the method. Or try something like this :

run_spider(“quotes5”)

Edited:

import WS_Vardata.spiders.quotes_spiders as quote_spider_module
def run_spider(spiderName):
    #get the class from within the module
    spiderClass = getattr(quote_spider_module, spiderName)
    #create the object and your good to go
    spiderObj= spiderClass()
    crawler = CrawlerRunner(get_project_settings())   #from Scrapy docs
    crawler.crawl(spiderObj)  

run_spider("QuotesSpider")

This script should run in the same directory as WS_Vardata

So in your case:

- TEST
| the_code.py
| WS_Vardata
   | spiders
     | quotes_spider <= containing QuotesSpider class 
Sign up to request clarification or add additional context in comments.

12 Comments

run_spider("quotes5") works! thank you sir! I have a new error.."ModuleNotFoundError: No module named 'WS_Vardata.spiders'". The location of my scrapy program is this "C:\Users\raresb\Desktop\TEST\WS_Vardata\spiders\quotes_spider". "quotes_spider" is the scrapy program.
Do you got “init.py” in WS_Vardata and spiders folder?
Yes sir. "init.py" is in both of them.
The code attached in the post above? It's in a whole other location. On desktop.
Place your code in the same directory as WS_Vardata is in and run it
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.