Skip to main content

Questions tagged [web-scraping]

Web scraping is the process of extracting specific information from websites that do not readily provide an API or other methods of automated data retrieval. Questions about "How To Get Started With Scraping" (e.g. with Excel VBA) should be *thoroughly researched* as numerous functional code samples are available. Web scraping methods include 3rd-party applications, development of custom software, or even manual data collection in a standardized way.

web-scraping
1 vote
1 answer
58 views

Scraping text in whitebox

I am trying to collect some Dutch historical election data. Below you see the code I have been using. I still need to figure out how to iterate the process for every 'Gemeente', but my main problem ...
Jasper's user avatar
  • 115
-4 votes
0 answers
28 views

extracting skills requirement from linkedin posted jobs [closed]

enter image description here in the image we can see its tag name and tag attributes but i am unable to extract it. I have tried below and several other possible tags and attributes but still getting ...
Haris Nazar's user avatar
-3 votes
0 answers
31 views

Integrating web scraping and LLMs [closed]

I wanted to extract some information about a specific drug (lets say Rolvedon) from this site. I tried using BeautifulSoup and Scrapy but they seem to be very format dependent. I want the code to be ...
Mandvi Shukla's user avatar
-1 votes
0 answers
20 views

Scrape live appearing elements

So i have a website i am scraping data off, and it has live appearing elements i need to keep getting. I see them on screen and can get them as html via inspect. However i've been searching for hours ...
DeviEnd's user avatar
0 votes
0 answers
6 views

Preparing text data for raft implementation

I want to use Raft Retrieval Augmented Fine Tuning to build a smart chatbot. My data consists of scraped text from multiple websites. Should I transform it all to QAD format? If so, is there a way to ...
yasmina hachhouch's user avatar
0 votes
0 answers
40 views

OECD Package Malfunctioning

Does anyone know if the OECD 0.2.5 library still works? As of July 2, the OECD migrated its databases to a new portal, and now all the data I previously downloaded is unavailable. Here is an example: ...
silent_hunter's user avatar
1 vote
1 answer
34 views

Handling unavailable price element with another element

So, I've made this web scraping script for Cypress. It goes to a shop website and scrapes the listed products (product titles and prices). Namely, in the case of certain products, there is no price ...
No Tools No Craft's user avatar
-1 votes
0 answers
36 views

Scraping a cloudFlare protected website with Puppeteer

The website uses some bot protection asking to solve some challenges before redirecting to the actual page I need to scrape. The thing is, with puppeteer seems like I can pass these challenges (...
Gabriele Passoni's user avatar
-1 votes
1 answer
33 views

Page loads in a browser but gives 404 error in python requests library

I've seen similar questions but none of the solutions work for my case. I found a link that allows me to download csv file with the data in Tableau dashboard. When I open this link in a browser, it ...
just_learning's user avatar
1 vote
1 answer
27 views

Getting unexpected/not_present elements/tags while scraping in node js with cheerio

I am scraping and parsing the content of web page (https://www.mydealz.de/new). Structure is like follows. <div class="threadGrid-title"> <strong><a href="">...
Muhammad Kazim's user avatar
0 votes
2 answers
14 views

Need ScrapingHub/Splash Advice [closed]

I am a CSE student with zero knowledge about Docker. I am working on a project (an online app for product analysis) that requires web scraping (of amazon reviews). I got to know about the ScrapingHub/...
Hue hue's user avatar
0 votes
0 answers
21 views

Selenium XPATH for Google general search - how to improve results

I wrote a Python program to run a Google search for each company name in an Excel sheet. However, I get a lot more search results when manually searching the company names up on google. I suspect it's ...
evenevaa's user avatar
0 votes
1 answer
25 views

Web scraping with puppeteer - hidden element

I am scraping a website, and I can't seem to automate the pressing of a 'button' that has 'Create Order' on it, even though it is perfectly clickable manually. Here is the HTML element: <a id="...
wizzrdcode01's user avatar
2 votes
1 answer
49 views

Defined Rules get not called in Scrapy

I am currently working with the Python library Scrapy and I am inheriting from the CrawlSpider so that I can override/define custom Rules. I have defined rules that should block all URLs with auth/ ...
Vlajic Stevan's user avatar
0 votes
0 answers
20 views

Error When Publishing to LinkedIn via API - Code: 422

I'm encountering an error when publishing to LinkedIn via their API. Here are the details: Error code: 422 Response: { "message": "ERROR :: /author :: \"urn:li:person:aba2390d-...
usrmetamask's user avatar

15 30 50 per page