2

For my project I need to extract the CSS Selectors for a given element that I will find through parsing. What I do is navigate to a page with selenium and then with python-beautiful soup I parse the page and find if there are any elements that I need the CSS Selector of. For example I may try to find any input tags with id "print".

soup.find_all('input', {'id': 'print')})

If I manage to find such an element I want to fetch its extract it's CSS Selector, something like "input#print". I don't just find using id's but also a combination of classes and regular expressions. Is there any way to achieve this?

7
  • CSS selectors are used to find elements. If you can already find the elements you want with Beautiful Soup, what do you need the CSS selectors for? Commented Mar 8, 2018 at 14:49
  • @Ian I first find the selector and then use it with puppeteer. For example I know that in my webpage there exists a print button, and I know it is associated with printing something so I assume either it's id or class name will have print in it, then I use regex to find all buttons which have an id or a class name with print somewhere in it. If I find it I need it's selector to access it with puppeteer (headless Chrome). Example my program should find the button even if it has an id "randomtextprintrandom" as the id has print. It can also be a class name. Commented Mar 8, 2018 at 14:55
  • Are you just using this script to find these selectors once, to make it easier to write your Puppeteer script? Or will this be done every time you use Puppeteer? Are you actually using Puppeteer to interact with anything outside of the HTML document, e.g., the browser chrome? Commented Mar 8, 2018 at 15:10
  • @Ian I will be using this script just to find the selectors but the thing is I have to find the selectors across multiple pages in the same form thus I will be interacting as well. Commented Mar 8, 2018 at 15:23
  • If this script is going to do the same interaction in order to find all of the selectors, what is left for Puppeteer to do? Commented Mar 8, 2018 at 15:27

2 Answers 2

4

Try this.

from scrapy.selector import Selector
from selenium import webdriver

link = "https://example.com"
xpath_desire = "normalize-space(//input[@id = 'print'])"

path1 = "./chromedriver"
driver = webdriver.Chrome(executable_path=path1)
driver.get(link)
temp_test = driver.find_element_by_css_selector("body")
elem = temp_test.get_attribute('innerHTML')


value = Selector(text=elem).xpath(xpath_desire).extract()[0]
print(value)
Sign up to request clarification or add additional context in comments.

Comments

1

Ok, I am totally new to Python so i am sure that there is a better answer for this, but here's my two cents :)

import requests
from bs4 import BeautifulSoup

url = "https://stackoverflow.com/questions/49168556/extract-css-selector-for-
an-element-with-selenium"
element = 'a'
idName = 'nav-questions'
page = requests.get(url)
soup = BeautifulSoup(page.content, 'html.parser')
tags = soup.find_all(element, id = idName)

if tags:
    for tag in tags :
        getClassNames = tag.get('class')
        classNames = ''.join(str('.' + x) for x in getClassNames)
        print element + '#' + idName + classNames
else:
    print ':('

This would print something like:

a#nav-questions.-link.js-gps-track

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.