1

I am still learning Python have and been practicing webscraping. I recently wanted to try out Selenium. This is my project so far.

My goal is to turn my output (which I believe is a string) into a pandas dataframe. Unfortunately I do not know how to separate all of the data into their own columns.

Here is my code so far:

from selenium import webdriver
import time
import pandas as pd

class FindByIdName():
    
    def test(self):
        baseUrl = 'https://vaccinefinder.org/'
        driver = webdriver.Chrome('C:\webdrivers\chromedriver.exe')
        driver.get(baseUrl)        
        time.sleep(3)
        
        #find covid vaccine button    
        vaccineElement = driver.find_element_by_xpath("/html//div[@id='__next']/div/div[2]/div[1]//button[.='Find COVID-19 Vaccines']")        
        vaccineElement.click()        
        time.sleep(3)  
        
        #uncheck moderna vaccine      
        modernaElement = driver.find_element_by_xpath("/html//div[@id='split-screen-content']/form/div[@class='search-form__fields-row']//label[.='Moderna COVID Vaccine']")
        modernaElement.click()
        time.sleep(2)
        
        #uncheck pfizer vaccine
        pfizerElement = driver.find_element_by_xpath("/html//div[@id='split-screen-content']/form/div[@class='search-form__fields-row']//label[.='Pfizer-BioNTech COVID Vaccine']")
        pfizerElement.click()
        time.sleep(2)
        
        #enter zip code
        zipElement = driver.find_element_by_xpath(".//*[@id='zipCode']")
        zipElement.send_keys('92646')        
        time.sleep(3)        
        
        #submit zipCode button
        searchElement = driver.find_element_by_xpath("//div[@id='split-screen-content']/form//button[@type='submit']")
        searchElement.click()        
        time.sleep(3)
        
        #find and print vaccine data
        listElement = driver.find_element_by_xpath("//div[@id='split-screen-content']/main/div[3]")        
        print(listElement.text)
        print(type(listElement.text))          
        driver.close()           
        
ff = FindByIdName()
ff.test()

Here is an example of the final output that I am looking for:

#        Location                            Address                        Distance   Inventory
1.  Vons Pharmacy #3160        8891 Atlanta Ave Huntington Beach, CA 92646   .61 miles   Out Of Stock
2.  Walmart Inc #10-5601       21132 Beach Blvd Huntington Beach, CA 92648   1.05 miles  Out Of Stock
3.  CVS Pharmacy, Inc #09483   10011 Adams Ave  Huntington Beach, CA 92646   1.92 miles  Out Of Stock
...         ...                                 ...                              ...           ...

I have seen some examples online where they convert string data to a dataframe but the data looks to be in csv format: How to convert a string to a dataframe in Python

Any help would be greatly appreciated. I will learn a lot from your advice. Thanks! =)

0

1 Answer 1

1

If you save/return the result instead of just printing:

        ...
        result = listElement.text
        driver.close()
        
        return result

Then you can split() and reshape() into a dataframe:

result = ff.test().split('\n')

columns = ['number', 'pharmacy', 'address', 'distance', 'status', 'view']
data = np.array(result).reshape(len(result)//len(columns), len(columns))

df = pd.DataFrame(data=data, columns=columns).drop(columns=['number', 'view'])
pharmacy address distance status
0 VONS PHARMACY #3160 8891 Atlanta Ave • Huntington Beach, CA 92646 0.61 miles Out Of Stock
1 Walmart Inc #10-5601 21132 Beach Blvd • Huntington Beach, CA 92648 1.05 miles Out Of Stock
... ... ... ... ...
48 Costco Wholesale Corporation #1110 7562 Center Ave • Huntington Beach, CA 92647 5.96 miles Out Of Stock
49 Aloha Pharmacy #0596962 15611 Brookhurst St • Westminster, CA 92683 5.99 miles Out Of Stock
Sign up to request clarification or add additional context in comments.

3 Comments

Ty sir @tdy! Much appreciated for your help. Can you please explain what this line of code is doing? data = np.array(result).reshape(len(result)//len(columns), len(columns))
@AbleArcher You're welcome! That line converts result from a python list into a 2-D numpy array. In the original list, every 6 elements was a new pharmacy. The new numpy array reshapes the 1-D list into 2-D so that each row contains 1 pharmacy with 6 columns of info.
Awesome! So grateful for your response. =)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.