Questions tagged [web-crawler]

Ask Question

A Web crawler (also known as Web spider) is a computer program that browses the World Wide Web in a methodical, automated manner or in an orderly fashion. Other terms for Web crawlers are ants, automatic indexers, bots, Web spiders, Web robots, or – especially in the FOAF community – Web scutters.

9,711 questions

0 votes

0 answers

12 views

AWS crawler creating Null values for partion columns

I am having some country level partitioned data in s3 and crawler is crawling the this root folder and creating a table. No Null value is there for country code. But when looked in the Athena, there ...

Ananth

asked 15 hours ago

0 votes

0 answers

18 views

Why Am I Getting a 490 Response Code on TorProxy on an Ubuntu Server? [closed]

I've set up a TorProxy on my Ubuntu server for web crawling due to specific network requirements. When my crawler begins to operate, it functions correctly for about 3 to 4 minutes. However, after ...

Aref

asked yesterday

-3 votes

0 answers

24 views

Download ICD-10 codes (International Classification of Diseases)

We can easily browse the ICD-10 codes: https://icd.who.int/browse10/2019/en Unfortunately, there is no way to download all of the codes as TXT (or XLS) file in order to parse with Python, or import ...

JoyfulPanda

asked Jul 6 at 15:33

-1 votes

0 answers

19 views

crawler - rotten tomatoes website - problem with pages

im trying to crawl the website rotten tomatoes but i have a problem: to get the html for page 5 and above of the movies for example: https://www.rottentomatoes.com/browse/movies_at_home/?page=**8** ...

Nadav Goldin

asked Jul 3 at 18:13

-2 votes

0 answers

19 views

Mass-attack of Amazon bots [closed]

Gday folks. Recently we discovered a significant spike in outgoing data on our web-server. It turns out Amazon bots are downloading our web imagery, a lot. We set a disallow in our Robots.txt, over a ...

Sami.C

asked Jul 2 at 0:17

1 vote

1 answer

57 views

Scrapy Spider does not work with multiple urls

I wrote a Scrapy spider and used Selenium in it to scrape the products in devgrossonline.com. It does not work with multiple category urls, but it works when I provide only one url. Here is my spider: ...

serkan ertas

asked Jul 1 at 15:52

-1 votes

0 answers

22 views

The time obtained by the Python crawler is incorrect when getting comments

When I use Python to crawl stock comments from a website, the time parsed from the website is different from the time obtained by my crawler. For example： when use the F12 to detect the website，i find ...

Ohhhhh

asked Jul 1 at 10:51

-4 votes

0 answers

30 views

Cannot fetch images from specific site [closed]

I'm using PHP (Laravel) code to fetch images from external urls and then saving them into my project folder. It works for all image urls but some from a specific site, for e.g https://f00.esfr.pl/foto/...

Ngan Nguyen

asked Jul 1 at 7:46

0 votes

1 answer

31 views

TYPO3 indexed search fails to index PDF files

I'm hoping to get help with a problem I can't solve. The working environment is as follows: SYSTEM Debian 12 bookworm PHP 7.4 (tried 8.2 and 8.3 with failure on crawler) + FPM/FastCGI /usr/bin/...

Alessandro Tuveri

asked Jul 1 at 5:38

0 votes

0 answers

12 views

How to download PDFs using Norconex Web Crawler?

I have tried to download PDFs from certain URLs (e.g. https://example.com) using the Norconex Web Crawler (v3.0) and the configuration below but no luck. Can someone please help me with this? <?xml ...

teklot

asked Jun 27 at 15:45

0 votes

0 answers

37 views

Getting subsequent GET calls for some PUT, POST APIs in web site

I'm observing subsequent GET calls for some PUT, POST API. I already checked the code and there is no GET calls created for those endpoints. But I'm seeing this call in my server logs. Say for example ...

coding life

asked Jun 26 at 10:54

-2 votes

0 answers

35 views

TikTok finding username with videoID

I am currently working on a project that deals with the data of the DSA transparency data base. Specifically, I am looking at the TikTok data. Now I would like to go one step further and check if the ...

Moritz

asked Jun 24 at 13:20

0 votes

0 answers

10 views

Issues with Crawling Yahoo Auction During Peak Hours in a Cross-Border E-commerce System (Errors 404, 500)

I am seeking assistance with a critical issue we are facing in our cross-border e-commerce auction and proxy purchase platform. Our system relies heavily on web crawling technology to access Yahoo ...

Nguyễn Nam Hải

asked Jun 24 at 4:36

0 votes

0 answers

25 views

Facebook Crawler not picking updated OpenGraph meta tags via Sharing Debugger but does via crawler curl call

Setup It's a React App with React Helmet. It's deployed with Docker on a VPS and is exposed with Nginx. Cloudflare is used for SSL and as a Prerender.io worker. Problem explaination I make a change to ...

mszan

asked Jun 20 at 9:07

0 votes

0 answers

14 views

how to focus on instagram post comment textarea using vanilla JS?

I can select the textarea using the devtools console but I cannot focus on it and start typing and because of that the post button is disabled. BTW, I can do it using python + selenium.

Dok

asked Jun 18 at 10:22

15 30 50 per page

2 3 4 5

…

648 Next

Collectives™ on Stack Overflow

Questions tagged [web-crawler]

AWS crawler creating Null values for partion columns

Why Am I Getting a 490 Response Code on TorProxy on an Ubuntu Server? [closed]

Download ICD-10 codes (International Classification of Diseases)

crawler - rotten tomatoes website - problem with pages

Mass-attack of Amazon bots [closed]

Scrapy Spider does not work with multiple urls

The time obtained by the Python crawler is incorrect when getting comments

Cannot fetch images from specific site [closed]

TYPO3 indexed search fails to index PDF files

How to download PDFs using Norconex Web Crawler?

Getting subsequent GET calls for some PUT, POST APIs in web site

TikTok finding username with videoID

Issues with Crawling Yahoo Auction During Peak Hours in a Cross-Border E-commerce System (Errors 404, 500)

Facebook Crawler not picking updated OpenGraph meta tags via Sharing Debugger but does via crawler curl call

how to focus on instagram post comment textarea using vanilla JS?

Hot Network Questions

Collectives™ on Stack Overflow

Questions tagged [web-crawler]

Related Tags