1

I'm using selenium in python and trying to click the "See all Properties" button to get to the next web page where all the properties will be listed and I can easily scrap the data.

Here's what the situation looks like

This is the site: https://www.magicbricks.com/

The element of the button is shown:

<a href="javascript:void(0);" class="mb-home__section__title--anchor-see-all push-right" onclick="fireDynamicPropSeeAllGTM(event,'Owner properties see all','ownerProp');seeAppPropertiesOpenUrl('ownerProp');">
    See all Properties
</a>

Right now my script just selects the action(buy or rent etc.) and selects the city the user wants to view properties in. After that I have to make selenium click on the "See all Properties" button so that I can get a proper list of properties, which will be easier to scrap.

Now, here is how I tried to solve the problem:

try:
        #Get the element
        wait = WebDriverWait(driver, 10)
        see_all_properties = wait.until(
            EC.visibility_of_element_located((By.XPATH, "/html/body/div[5]/div[1]/div[10]/div/section/div[1]/a"))#XPATH from browser inspect menu
        )

        #Get the original opened tab
        original_tab = driver.current_window_handle

        #Click the button
        driver.execute_script("arguments[0].click();", see_all_properties)
        print("->Successfully clicked the button using JavaScript.")
        print("->Seeing all popular properties in your area...")

        #Wait for the tab
        wait = WebDriverWait(driver, 10)
        wait.until(EC.number_of_windows_to_be(2))

        #Loop through all the tabs and make the new tab primary
        for tab in driver.window_handles:
            if tab != original_tab:
                driver.switch_to.window(tab)
                break

except Exception as e:
        print(f"->Error: {e}")

And these are the errors I get:

--Enter any of the actions(case-sensitive) - ['Buy', 'Rent', 'PG', 'Plot', 'Commercial', 'Post Free Property Ad']: Buy
--Enter your state: Chandigarh
->Hovering over state selector element...
->Successfully clicked on Chandigarh.
->Error: Message: 
Stacktrace:
#0 0x59391a14cfba <unknown>
#1 0x593919bd16d0 <unknown>
#2 0x593919c232aa <unknown>
#3 0x593919c23541 <unknown>
#4 0x593919c716c4 <unknown>
#5 0x593919c48e5d <unknown>
#6 0x593919c6eb54 <unknown>
#7 0x593919c48c03 <unknown>
#8 0x593919c157a8 <unknown>
#9 0x593919c16421 <unknown>
#10 0x59391a111b28 <unknown>
#11 0x59391a11587f <unknown>
#12 0x59391a0f9c49 <unknown>
#13 0x59391a116405 <unknown>
#14 0x59391a0df4ff <unknown>
#15 0x59391a13a258 <unknown>
#16 0x59391a13a432 <unknown>
#17 0x59391a14bfa3 <unknown>
#18 0x7218daea27f1 <unknown>
#19 0x7218daf33b5c <unknown>

Note that I have already tried:

  • Checking my web driver version

  • ActionChain clicking method

  • Basic driver.get_element().click() method

  • Switching tabs so that selenium doesn't give error when there are two tabs

When I click that button manually through my browser, a new tab opens which contains the contents I want to scrap.

Any help would be extremely appreciated.

4
  • better create minimal working code so everyone could simply copy and test it, and use it for modifications (to test some solutions) Commented Sep 28 at 18:20
  • do you run it without --headless to see what browser is doing? Commented Sep 28 at 18:20
  • I created own minimal working code and works fo rme see_all_properties.click() without using javascript Commented Sep 28 at 22:46
  • "arguments[0].click();" also works for me - but only on Chrome. Firefox has some problem. Commented Sep 28 at 23:12

2 Answers 2

1

The problem is that the page is dynamically created and there's some intermediate state where that link exists but isn't completely set up yet. I ran some simple code several times and it kept failing.

wait.until(EC.element_to_be_clickable((By.XPATH, "//a[text()='See all Properties']"))).click()

I came up with a solution that works. There are actually 3 different links that contain the string "See all Properties". I wait for the count to be 3, then click the right (first) one. Working code is below,

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.wait import WebDriverWait

url = 'https://www.magicbricks.com/'
driver = webdriver.Chrome()
driver.maximize_window()
driver.get(url)

wait = WebDriverWait(driver, 10)
wait.until(lambda d: len(d.find_elements(By.XPATH, "//a[text()='See all Properties']")) == 3)
driver.find_element(By.XPATH, "//a[text()='See all Properties']").click()

wait.until(EC.number_of_windows_to_be(2))

for tab in driver.window_handles:
    if tab != driver.current_window_handle:
        driver.switch_to.window(tab)
        break

for property in wait.until(EC.visibility_of_all_elements_located((By.CSS_SELECTOR, "h2"))):
    print(property.text)

NOTE: You can define wait once and then reuse it throughout the script to save code and a little time reinstantiating.

Sign up to request clarification or add additional context in comments.

3 Comments

Why do you have to wait for all the 3 to load? What I did was wait for that exact element to load using: EC.visibility_of_element_located((By.XPATH, "/html/body/div[5]/div[1]/div[10]/div/section/div[1]/a")) Why does my code fail? I also waited for the exact button to load, using its XPATH. Also thank you for giving the advice at last.
Because waiting for just that one wasn't working. The page wasn't fully loaded yet. This is just one way that I found to accomplish this... there are likely others. I just found that waiting for the three links to load was enough to get the full page to load and work and was simple to implement.
If this or any other answer was useful please upvote it. Once you find the answer to your question, please mark it as accepted so the question isn't left unanswered.
0

I'd probably not use Selenium here. The data is in the script tags as a json (there is also a url that receives the json directly, but was returning an error and didn't want to dig into the debugging.)

But this essentially will loop through the pages and stop once it starts repeating. It will take a while to run as it pulls 30 at a time:

import requests
from bs4 import BeautifulSoup
import re
import json
import pandas as pd


url = 'https://www.magicbricks.com/property-for-sale/residential-real-estate'
headers = {
    'user-agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/140.0.0.0 Safari/537.36'
    }


# Regex pattern to find and extract the JSON after the variable
pattern = r'window\.SERVER_PRELOADED_STATE_\s*=\s*(\{.*?\});'

data = []
seen_ids = set()

continueLoop = True
while continueLoop:
    for p in range(1, 99999):
        if not continueLoop:
            break

        print(f'Page: {p}')

        payload = {
            'proptype': 'Multistorey-Apartment,Builder-Floor-Apartment,Penthouse,Studio-Apartment,Residential-House,Villa',
            'bedrooms': '11701,11702',
            'cityName': 'Bangalore',
            'page': f'{p}',
            'groupstart': f'{(p - 1) * 30}',
            'offset': '0',
            'maxOffset': '552',
            'sortBy': 'premiumRecent',
            'postedSince': '-1',
            'isNRI': 'Y',
            'multiLang': 'en',
            'category': 'B',
            'parameter': 'rel',
            'hideviewed': 'N',
            'ListingsType': 'I',
            'filterCount': '3',
            'incSrc': 'Y',
            'fromSrc': 'homeSrc'
        }

        response = requests.get(url, params=payload, headers=headers)
        html = response.text
        soup = BeautifulSoup(html, 'html.parser')

        scripts = soup.find_all('script')

        for script in scripts:
            content = script.string or script.text
            match = re.search(pattern, content, re.DOTALL)
            if match:
                json_str = match.group(1)
                try:
                    jsonData = json.loads(json_str)['searchResult']
                except Exception as e:
                    print(f"Error parsing JSON: {e}")
                    continue

                # Detect duplicate entries
                new_ids = set()
                for item in jsonData:
                    unique_id = item.get('encId')
                    if unique_id in seen_ids:
                        continue  # Already seen
                    new_ids.add(unique_id)
                    data.append(item)

                if not new_ids:
                    print("No new listings — ending loop.")
                    continueLoop = False
                else:
                    seen_ids.update(new_ids)
                    print(f'{len(seen_ids)}')

                break
            
            
df = pd.DataFrame(data)`

Sample Output:

print(df.head())
                      encId    possStatusD  ... brokerConnect smartDiaryApiTag
0  y1qzFa+chQ5zpSvf+uAgZw==  Ready to Move  ...           NaN              NaN
1  FYvyYAw/nkRzpSvf+uAgZw==  Ready to Move  ...           NaN              NaN
2  ozkrlLIHSYBzpSvf+uAgZw==  Ready to Move  ...           NaN              NaN
3  gwOp2B/7mbZzpSvf+uAgZw==  Ready to Move  ...           NaN              NaN
4  etIxbx5cNOtzpSvf+uAgZw==  Ready to Move  ...           NaN              NaN

[5 rows x 241 columns]

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.