1

I have been struggling with a site I am scrapping using scrappy. This site, returns a series of Javascript variables (array) with the products data. Example:

datos[0] = ["12345","3M YELLOW CAT5E CABLE","6.81","1","A","N","N","N","N","N",0,0,0,0,0,"0","0","0","0","0","P","001-0030","12","40K8957","28396","250","Due: 30-12-1899",0.0000,1,"",\'\'];
datos[1] = ["12346","3M GREEN CAT5E CABLE","7.81","1","A","N","N","N","N","N",0,0,0,0,0,"0","0","0","0","0","P","001-0030","12","40K8957","28396","250","Due: 30-12-1899",0.0000,1,"",\'\'];
...

So on...

Fetching the array into a string with scrapy was easy, since the site response prints the variables. The problem is I want to transform it into Json so I can process it and store it in a database table.

Normally I would use Javascript's function Json.stringify to convert it to Json and post it in PHP.

However when using Python's json.loads and even StringIO I am unable to load the array into json.

Probably is a format error, but I am unable to identify it, since I am not expert in Json nor Python.

EDIT: I just realize since scrapy is unable to execute Javascript probably the main issue is that the data is just a string. I should format it into a Json format.

Any help is more than welcome.

Thank you.

1
  • Could you check if the below answer works for you? I'll appreciate it if you mark it as accepted if it works. Thanks! Commented Jun 29, 2022 at 9:35

2 Answers 2

2

If you wanted to take an array and create a json object, you could do something like this.

values = ["12345","3M YELLOW CAT5E CABLE","6.81","1","A","N","N","N","N","N",0,0,0,0,0,"0","0","0","0","0","P","001-0030","12","40K8957","28396","250","Due: 30-12-1899",0.0000,1]
keys = [x for x in range(len(values))]
d = dict(zip(keys, values))
x = json.dumps(d)
Sign up to request clarification or add additional context in comments.

Comments

1

There is a section in the scrapy doc to find various ways to parse the JavaScript code. For your case, if you just need to have it in an array, you can use the regex to get the data.

Since the website you are scraping is not present in the question, I am assuming this would be a more straightforward way to get it, but you could use whichever way seems suitable.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.