Skip to main content
Filter by
Sorted by
Tagged with
-4 votes
1 answer
46 views

I am new to coding and am doing a scraping problem a friend sent me. I have set up this Scraping code according to the assignment but am getting the error code 403, but it's not telling me what's ...
Reefman411's user avatar
1 vote
1 answer
30 views

I have from selectolax.lexbor import LexborHTMLParser html = """ <a rel="mw:WikiLink" href="Reconstruction" title="Proto-Germanic"> <span ...
Akira's user avatar
  • 2,876
-1 votes
0 answers
34 views

I’m trying to scrape opening and closing hours (Monday to Sunday) for about 2000 clinic entries. I already have a CSV file with ~20,00 rows containing the clinic name and clinic country. My goal is: ...
Sadeesha Jay's user avatar
2 votes
1 answer
41 views

I'm building a web crawler and need to parse anchor tags to extract URLs. However, I'm running into issues identifying whether an href attribute contains a full URL path or a relative/internal path. ...
user20617578's user avatar
Advice
0 votes
4 replies
100 views

I am trying to find a public or unofficial Instagram API to build a downloader. There are many services that can already fetch videos, photos, Stories and Reels, so the needed endpoints clearly exist. ...
Roman Cherepakha's user avatar
0 votes
1 answer
86 views

I am scraping the Dead by Daylight Fandom wiki (specifically TOME pages, e.g., https://deadbydaylight.fandom.com/wiki/Tome_1_-_Awakening) to extract memory logs. The goal is to extract the Memory ...
zeromiedo's user avatar
3 votes
1 answer
59 views

I am trying to search for elements on a webpage and have used various methods, including text and XPath. It seems that the timeout option does not work the way I expected, and no exception is raised ...
Shankboy's user avatar
0 votes
2 answers
213 views

From the below structure I only want value of href attribute. But rec_block is returning h5 element without its children so basically <h5 class="series">Recommendations</h5>. <...
Emby's user avatar
  • 1
3 votes
1 answer
138 views

I'm making a tutorial on how to scrape with Scrapy. For that, I use Quarto/RStudio and the website https://quotes.toscrape.com/. For pedagogic purposes, I need to run a first crawl on the first page, ...
Didier mac cormick's user avatar
Advice
0 votes
5 replies
68 views

I wanted to know how I can get live news feed data (INDIAN), without any or like minimal latency (30-40 seconds). I tried using some RSS feeds, but all they do is provide the data as some latency, so ...
its m's user avatar
  • 47
0 votes
0 answers
53 views

Camoufox browser window remains visible in WSL even when headless is set to virtual Description When headless is set to "virtual", the Camoufox browser window still appears on the screen in ...
exlead's user avatar
  • 1
1 vote
0 answers
88 views

I want to retrieve content from web page. However, I tried above method but the error still come when the query string contain Chinese character. code $json = Get-Content -Encoding utf8 -Path "./...
Akira's user avatar
  • 33
-4 votes
2 answers
76 views

I am trying to write code to give me BBFC film ratings. I am using selenium to do this but would be happy with any solution that works reliably. After a lot of work I finally came up with this code: #...
Simd's user avatar
  • 21.5k
0 votes
1 answer
225 views

This is my python code using on ubuntu to try fetch and extract data from https://www.sofascore.com/ I create this test code before using on E2 device in my plugin # python3 -m venv venv # source venv/...
RR-EB's user avatar
  • 55
0 votes
1 answer
74 views

I'm working on integration tests for a web application that's running in a Docker container within our GitLab CI/CD pipeline. The application is a frontend that requires Kerberos/SPNEGO authentication ...
ben green's user avatar
0 votes
1 answer
79 views

I'm quite new to web scraping, and in particular in using Scrapy's spiders, pipelines... I'm getting some 202 status from some spider requests' response, hence the page content is not available yet ...
Manu310's user avatar
  • 178
-1 votes
1 answer
50 views

I have this Apps Script / Cheerio function that successfully scrapes the data I want from the url. The site only displays 25 entries at this url. I can find additional entries on subsequent pages (by ...
zambonidude's user avatar
1 vote
0 answers
43 views

Problem I’m using Docusaurus with Typesense and the docsearch-typesense-scraper to index my documentation site. Everything runs fine — the sitemap is found, and the scraper produces records. However, ...
Erwin's user avatar
  • 11
0 votes
0 answers
189 views

I’m building a scraper to monitor the Meta (Facebook) Ads Library for new ads as soon as they start running. From inspecting network requests, I see that the Ads Library web app uses a GraphQL ...
kiqueboat's user avatar
1 vote
1 answer
45 views

Website photo with search box visible. So, this is the website https://sa.ucla.edu/ro/public/soc There is a dropdown menu for selecting subject area where I need to write subject and i will receive ...
Rohit Kasturi's user avatar
0 votes
0 answers
128 views

I'm trying to download the barchart data table from https://www.barchart.com/investing-ideas/ai-stocks using Excel VBA in similar manner as the python script in Automatic file downloading on Barchart....
ateene's user avatar
  • 1
-1 votes
1 answer
69 views

I’m using Python + Selenium + ChromeDriver to check a list of titles (from a CSV file) against an online library catalog. My script searches each title and tries to determine if a specific library has ...
huda's user avatar
  • 1
3 votes
2 answers
157 views

I am trying to read in a specific table from the US Customs and Border Protection's Dashboard on Southwest Land Border Encounters as a dataframe. The url is: https://www.cbp.gov/newsroom/stats/...
Ari's user avatar
  • 2,023
-2 votes
1 answer
118 views

I am webscraping WHO pages using the following code: pacman::p_load(rvest, httr, stringr, purrr) download_first_pdf_from_handle <- function(handle_id) { ...
flâneur's user avatar
  • 341
0 votes
0 answers
140 views

I’m working on a project in Laravel/Python where I want to fetch product information from Shein, but I’ve run into a major problem with ShareJump links. Here’s an example link I’m working with: http://...
Mahmod Algeriany's user avatar
1 vote
1 answer
127 views

I am a bit new to webscraping and trying to build a scraper to collect the title, text, and date from this archived page: from selenium import webdriver from selenium.webdriver.chrome.service import ...
Kaitlin's user avatar
  • 83
1 vote
2 answers
90 views

I'm using selenium in python and trying to click the "See all Properties" button to get to the next web page where all the properties will be listed and I can easily scrap the data. Here's ...
Gurnoor Kalsi's user avatar
0 votes
0 answers
320 views

My goal is to find out if a given user has liked any post of another profile. So the following question has to be answered: Has the user X liked any post on the profile Y in the past 24 months. For ...
a6i09per5f's user avatar
-1 votes
2 answers
105 views

I'm trying to scrape the data off this site. The website shows a charging station, in this case you can click each to unravel the accordion and see the data per charger. I am trying to use this ...
NorthoftheWall's user avatar
3 votes
1 answer
165 views

I'm working on a web scraping project in Python to collect data from a real estate website. I'm running into an issue with the addresses, as they are not always consistent. I've already handled simple ...
Adamzam15's user avatar
1 vote
0 answers
69 views

ody: I’m building a Make.com scenario like this: HTTP (fetch website HTML) → Text parser (extract elements) → Filter "only good links" → Array aggregator → further processing Goal I want ...
Alex Lombardo's user avatar
-1 votes
3 answers
278 views

I would like to scrape the 2nd table in the page seen below from the link - https://fbref.com/en/comps/9/2023-2024/stats/2023-2024-Premier-League-Stats on google collab. But pd.read_html only gives me ...
rian patel's user avatar
-1 votes
1 answer
123 views

I'm using Puppeteer and JS to write a web scraper. The site I'm scraping is pretty intense, so I need to use a local chrome instance and a residential proxy service to get it working. Here's my basic ...
Alex's user avatar
  • 41
2 votes
2 answers
191 views

Using the following code: library(rvest) read_html("https://gainblers.com/mx/quinielas/progol-revancha/", encoding = "UTF-8")|> html_elements(xpath= '//*[@id="...
Alejandro Carrera's user avatar
1 vote
2 answers
269 views

I am trying to extract bus prices between 2 cities in Ontario, Canada. I am using Selenium/Python to do this: The website is here and it has default cities and dates. Here is my Python code: from ...
brooklin7's user avatar
1 vote
2 answers
111 views

I'm a bit new to Selenium and am trying to build a webscraper that can select a dropdown menu and then select specific options from the menu. I've built the following code and it was working at one ...
Kaitlin's user avatar
  • 83
3 votes
1 answer
61 views

I'm rather new to using Beautiful Soup and I'm having some issues splitting some html correctly by only looking at html breaks and ignoring other html elements such as changes in font color etc. The ...
James Brian's user avatar
1 vote
1 answer
237 views

I’ve been trying to scrape lottery results from a website that shows draws. The data is presented in a results table, but I keep running into strange issues where sometimes the numbers are captured ...
Zuryab's user avatar
  • 11
2 votes
1 answer
203 views

I'm trying to extract tables from this site: https://www.dnb.com/business-directory/company-information.beverage_manufacturing.br.html As you can see, the complete table has 14,387 rows and each page ...
Alejandro Carrera's user avatar
0 votes
0 answers
65 views

I'm trying to extract data from a website using Selenium. On random occasions, the page will do a client-side redirect with window.location. How can I disable this? I've tried redefining the property ...
anon's user avatar
  • 697
1 vote
1 answer
291 views

I set up a self-hosted Firecrawl instance and I want to crawl my internal intranet site (e.g. https://intranet.xxx.gov.tr/). I can access the site directly both from the host machine and from inside ...
birdalugur's user avatar
0 votes
1 answer
112 views

on this page I want to parse few elements. I would like to get text in circles and use attribute value to click sometimes. That code returns anything. With this code I want to get all attribute ...
Rok Golob's user avatar
2 votes
1 answer
118 views

This is my code as of now: from selenium import webdriver from webdriver_manager.chrome import ChromeDriverManager from selenium.webdriver.chrome.service import Service options = webdriver....
Ahmad's user avatar
  • 139
0 votes
1 answer
228 views

I am trying to use pytube (v15.0.0) to fetch the titles of YouTube videos. However, for every video I try, my script fails with the same error: HTTP Error 400: Bad Request. I have already updated ...
Rohit Hake's user avatar
0 votes
0 answers
226 views

I have a node Scraper Which Scrapes the HLS streaming url using Playwright Browser which gives the master Playlist like: https://example.com/master.m3u8 Then that Master Playlist does have a cors ...
Alsiro Mira's user avatar
1 vote
2 answers
297 views

I'm trying to download a protected PDF from the New York State Courts NYSCEF website using Python. The URL looks like this: https://iapps.courts.state.ny.us/nyscef/ViewDocument?docIndex=...
Daremitsu's user avatar
  • 655
4 votes
2 answers
285 views

I’m trying to programmatically download the full “pubblicazione completa non certificata” PDFs of the Italian Gazzetta Ufficiale – Serie Generale for 1969 (for an academic article). The site has a ...
Mark's user avatar
  • 1,801
-2 votes
2 answers
153 views

I am using the following code. It successfully targets the correct url and node text. However, the data that is returned is incomplete as some of the fields (like previous close and open) are blank or ...
Brad Horn's user avatar
  • 685
0 votes
0 answers
72 views

How can I use ScrapingRobot’s API to scrape Google search results as structured JSON data (e.g., titles, URLs, snippets) instead of raw HTML? The main page of the website shows three types of "...
AtiehCodes's user avatar
0 votes
2 answers
176 views

I would like to scrape the problems from these Go (board game) books, and convert them into SGFs, if they aren't in that format already. For now, I would be satisfied with only taking the problems ...
psygo's user avatar
  • 7,863

1
2 3 4 5
1035