Selenium gives references to elements in browser's memory and when you click() to load new page then it removes elements from memory and later it has problem to find next element - even if you go back to page because elements can be in different places in memory.
Sometimes similar problem can be when click() changes something on page (ie. fold or unfold some list) because it can also move elements in memory.
There are two methods:
- harder: count all elements
divs at start and later in every loop get again all divs and use index to get next elements - div = divs[index].
divs = driver.find_elements('xpath', '//div[@class="topmenu"]/div')
print('len(divs):', len(divs))
for index in range(len(divs)):
divs = driver.find_elements('xpath', '//div[@class="topmenu"]/div')
div = divs[index]
# ... code ...
# go back to main page
driver.get("https://www.elizabethnj.org/Directory.aspx")
- simpler: in your code
click() loads new pages so first you can get all href as strings and later you can use .get(url) to load pages with contacts. And it doesn't need to go back to main page - so it may work shorter.
BTW: there two other problems.
- some
div are empty - without text and without link to page - so you have to skip it (ie. you can check if it has empty text).
- some pages don't have table with contacts and link to main page is
[2] instead of [3]. It is simpler to use driver.back() to go back to previous page. Or you could simply run again driver.get(url) to get main page.
Working code for harder version:
import time
from selenium import webdriver
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://www.elizabethnj.org/Directory.aspx")
time.sleep(3)
divs = driver.find_elements('xpath', '//div[@class="topmenu"]/div')
print('len(divs):', len(divs))
#for index, div in enumerate(divs):
# print(f'{index} >>>', div.text)
for index in range(len(divs)):
divs = driver.find_elements('xpath', '//div[@class="topmenu"]/div')
div = divs[index]
text = div.text.strip()
if not text:
print(f'{index} >>> --- empty ---')
continue
print(f'{index} >>>', div.text)
div.click()
time.sleep(2)
contacts = driver.find_elements('xpath', '//*[@id="cityDirectoryDepartmentDetails"]/tbody/tr')
for contact in contacts:
print(' contact:', contact.text)
print('<<< back')
# ... some pages don't have contacts and link is in `[2]` instead of `[3]`
#back = driver.find_element('xpath', '//*[@id="CityDirectoryLeftMargin"]/div[3]/span')
#back.click()
# ... or ...
driver.back()
# ... or ...
#driver.get("https://www.elizabethnj.org/Directory.aspx")
time.sleep(2)
input('Press ENTER to close')
Working code for simpler version
import time
from selenium import webdriver
driver = webdriver.Chrome()
driver.maximize_window()
driver.get("https://www.elizabethnj.org/Directory.aspx")
time.sleep(3)
divs = driver.find_elements('xpath', '//div[@class="topmenu"]/div')
print('len(divs):', len(divs))
# --- first get all HREF as strings ---
data = []
for index, div in enumerate(divs):
text = div.text.strip()
if not text:
print(f'{index} >>> --- empty ---')
continue
url = div.find_element('xpath', './/a').get_attribute('href')
print(f'{index} >>>', text)
data.append( (text, url) )
# --- next visit all pages (wiithout going back to main page) ---
for index, (text, url) in enumerate(data):
print(f'{index} >>>', text)
driver.get(url)
time.sleep(2)
contacts = driver.find_elements('xpath', '//*[@id="cityDirectoryDepartmentDetails"]/tbody/tr')
for contact in contacts:
print(' contact:', contact.text)
# it doesn't need to go back to main page
input('Press ENTER to close')
click()in loop becausefind_elementsgives references to object in memory and when youclick()then it remove these objects to load new page, and later reference can't find object in memory (even if you load back the same page). First you have to get all urls as strings and later usefor-loop which loads pages with.get(url)instead ofclick()click()elements on page then browser may move objects in different place in memory - and references may not work. You may have to count alldiv-len(divs) - and later runfor-loop likefor index in range(len(divs)):which always runsdivs = driver.find_elements(...)to get new references - and later it uses indexdivs[index]to work with next element.divare empty and it needs to skip them (when they have emtoy.text), (2) some pages don't have table with contacts - so link to previous page is indiv[2]instead ofdiv[3]. You could get alldivand use[-1]in Python, but you can also usedriver.back()(to use back button in browser) or simply runget(url)to load main page again.