Questions tagged [web-scraping]
Web scraping is the process of extracting specific information from websites that do not readily provide an API or other methods of automated data retrieval. Questions about "How To Get Started With Scraping" (e.g. with Excel VBA) should be *thoroughly researched* as numerous functional code samples are available. Web scraping methods include 3rd-party applications, development of custom software, or even manual data collection in a standardized way.
web-scraping
51,246
questions
1
vote
1
answer
58
views
Scraping text in whitebox
I am trying to collect some Dutch historical election data. Below you see the code I have been using. I still need to figure out how to iterate the process for every 'Gemeente', but my main problem ...
-4
votes
0
answers
28
views
extracting skills requirement from linkedin posted jobs [closed]
enter image description here
in the image we can see its tag name and tag attributes but i am unable to extract it.
I have tried below and several other possible tags and attributes but still getting ...
-3
votes
0
answers
31
views
Integrating web scraping and LLMs [closed]
I wanted to extract some information about a specific drug (lets say Rolvedon) from this site.
I tried using BeautifulSoup and Scrapy but they seem to be very format dependent. I want the code to be ...
-1
votes
0
answers
20
views
Scrape live appearing elements
So i have a website i am scraping data off, and it has live appearing elements i need to keep getting. I see them on screen and can get them as html via inspect. However i've been searching for hours ...
0
votes
0
answers
6
views
Preparing text data for raft implementation
I want to use Raft Retrieval Augmented Fine Tuning to build a smart chatbot. My data consists of scraped text from multiple websites. Should I transform it all to QAD format? If so, is there a way to ...
0
votes
0
answers
40
views
OECD Package Malfunctioning
Does anyone know if the OECD 0.2.5 library still works? As of July 2, the OECD migrated its databases to a new portal, and now all the data I previously downloaded is unavailable. Here is an example:
...
1
vote
1
answer
34
views
Handling unavailable price element with another element
So, I've made this web scraping script for Cypress. It goes to a shop website and scrapes the listed products (product titles and prices). Namely, in the case of certain products, there is no price ...
-1
votes
0
answers
36
views
Scraping a cloudFlare protected website with Puppeteer
The website uses some bot protection asking to solve some challenges before redirecting to the actual page I need to scrape. The thing is, with puppeteer seems like I can pass these challenges (...
-1
votes
1
answer
33
views
Page loads in a browser but gives 404 error in python requests library
I've seen similar questions but none of the solutions work for my case. I found a link that allows me to download csv file with the data in Tableau dashboard. When I open this link in a browser, it ...
1
vote
1
answer
27
views
Getting unexpected/not_present elements/tags while scraping in node js with cheerio
I am scraping and parsing the content of web page (https://www.mydealz.de/new). Structure is like follows.
<div class="threadGrid-title">
<strong><a href="">...
0
votes
2
answers
14
views
Need ScrapingHub/Splash Advice [closed]
I am a CSE student with zero knowledge about Docker. I am working on a project (an online app for product analysis) that requires web scraping (of amazon reviews). I got to know about the ScrapingHub/...
0
votes
0
answers
21
views
Selenium XPATH for Google general search - how to improve results
I wrote a Python program to run a Google search for each company name in an Excel sheet. However, I get a lot more search results when manually searching the company names up on google.
I suspect it's ...
0
votes
1
answer
25
views
Web scraping with puppeteer - hidden element
I am scraping a website, and I can't seem to automate the pressing of a 'button' that has 'Create Order' on it, even though it is perfectly clickable manually. Here is the HTML element:
<a id="...
2
votes
1
answer
49
views
Defined Rules get not called in Scrapy
I am currently working with the Python library Scrapy and I am inheriting from the CrawlSpider so that I can override/define custom Rules. I have defined rules that should block all URLs with auth/ ...
0
votes
0
answers
20
views
Error When Publishing to LinkedIn via API - Code: 422
I'm encountering an error when publishing to LinkedIn via their API. Here are the details:
Error code: 422
Response:
{
"message": "ERROR :: /author :: \"urn:li:person:aba2390d-...