0

I have scraped three lists from a website and they get printed out into Selenium. Those being Team, odds and Href. However, these lists do not get written to a CSV file correctly. I want each list to be put into column 1, 2 and 3. Any help?

I tend to get lots of: <selenium.webdriver.remote.webelement.WebElement (session="211dc26889dedb4d1d5db5f355c9b225", element="0.936313100855265-9")>

My data looks like this: https://ibb.co/iW6rbk

What I want it to look like: https://ibb.co/fhna2Q

I believe this is caused by it writing the web elements instead of what I actually want. Any suggestions on how I can adjust my code so it actually writes what I want (the scraped values)?

Thanks

 from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.common.by import By
    import csv
    import requests
    import time
    from selenium import webdriver
    driver = webdriver.Chrome(executable_path=r'C:\Brother\chromedriver.exe')
    driver.set_window_size(1024, 600)
    driver.maximize_window()


    driver.get('https://www.bookmaker.com.au/sports/soccer/37854435-football-australia-australian-npl-2-new-south-wales/')

    SCROLL_PAUSE_TIME = 0.5

    # Get scroll height
    last_height = driver.execute_script("return document.body.scrollHeight")

    while True:
        # Scroll down to bottom
        driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

        # Wait to load page
        time.sleep(SCROLL_PAUSE_TIME)

        # Calculate new scroll height and compare with last scroll height
        new_height = driver.execute_script("return document.body.scrollHeight")
        if new_height == last_height:
            break
        last_height = new_height

    time.sleep( 5 )

    #link
    elems = driver.find_elements_by_css_selector("h3 a[Href*='/sports/soccer']")
    for elem in elems:
        print(elem.get_attribute("href"))



    #TEAM
    langs1 = driver.find_elements_by_css_selector(".row:nth-child(1) td:nth-child(1)")
    for lang in langs1:
        print (lang.text)



    time.sleep( 10)

    #ODDS
    langs = driver.find_elements_by_css_selector(".row:nth-child(1) span")
    for lang in langs:
        print (lang.text)






    time.sleep( 10 )

    import csv

    with open ('I AM HERE12345.csv','w') as file:
       writer=csv.writer(file)
       for row in langs, langs1, elems:
          writer.writerow(row)

1 Answer 1

0

There are two issues in your code

#TEAM
langs1 = driver.find_elements_by_css_selector(".row:nth-child(1) td:nth-child(1)")
for lang in langs1:
    print (lang.text)

langs1 is an array of element. You print the text of each, but the array still only has element and not the text. So how can you can add that to CSV when you never stored the text? So I change it like below. Not the most optimized code but works

langs1 = driver.find_elements_by_css_selector(".row:nth-child(1) td:nth-child(1)")
langs1_text = []

for lang in langs1:
    print(lang.text)
    langs1_text.append(lang.text)

Next your csv loop is wrong

for row in langs_text, langs1_text, elem_href:
    writer.writerow(row)

This loop is combining all arrays into single row and not multiple rows. What you need is one value from each array one at a time

for row in zip(langs_text, langs1_text, elem_href):
    writer.writerow(row)

Edit-1

Though one can make your code work. But the approach used is not right. When you want to capture data from multiple sections then you should loop through each section and then gather data from that section.

I changed the code to do that

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.by import By
import csv
import requests
import time
from selenium import webdriver

driver = webdriver.Chrome()
driver.set_window_size(1024, 600)
driver.maximize_window()

driver.get('https://www.bookmaker.com.au/sports/soccer/36116103-football-russia-russian-national-football-league/')

SCROLL_PAUSE_TIME = 0.5

# Get scroll height
last_height = driver.execute_script("return document.body.scrollHeight")

while True:
    # Scroll down to bottom
    driver.execute_script("window.scrollTo(0, document.body.scrollHeight);")

    # Wait to load page
    time.sleep(SCROLL_PAUSE_TIME)

    # Calculate new scroll height and compare with last scroll height
    new_height = driver.execute_script("return document.body.scrollHeight")
    if new_height == last_height:
        break
    last_height = new_height

time.sleep(5)

sections = driver.find_elements_by_css_selector(".fullbox")
# link
import csv

with open('I AM HERE12345.csv', 'w') as file:
    writer = csv.writer(file)
    for section in sections:
        link = section.find_element_by_css_selector("h3 a").get_attribute("href")
        team_name = section.find_element_by_css_selector("tr.row[data-teamname]").get_attribute("data-teamname")
        bet = section.find_element_by_css_selector("a.odds.quickbet").text

        writer.writerow((bet, team_name, link))

And the CSV is generated fine

Results

Edit-2

The issue with blank rows is specific to Windows and that's why was not showing up on my mac. You can get rid of that using any of the below methods

with open('I AM HERE12345.csv', 'w', newline='') as file:

or

with open('I AM HERE12345.csv', 'w', newline='\n') as file:
Sign up to request clarification or add additional context in comments.

8 Comments

I can't seem to apply this to the href elem = driver.find_elements_by_css_selector(".row:nth-child(1) td:nth-child(1)") elem_href = [] for elem in elem: print(elem.href) elem_href.append(elem.href)
It would elem_href.append(elem.get_attribute("href"))
I have the below which works but why does it display none 3 times out of curiosity? elem = driver.find_elements_by_css_selector(".row:nth-child(1) td:nth-child(1)") elem_href = [] for elem in elem: print(elem.get_attribute("href")) elem_href.append(elem.get_attribute("href")). Which gives: None None None SKA Energiya Khabarovsk CSKA Moscow Zenit Krasnodar
You might be picking up blank elements with no hrefs. That is why none
Is there a way to remove the spaces in the CSV so I can have data directly under each other? ibb.co/fnt9U5
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.