Extract an Image from a Web Page

Question

Every day, I need to manually extract the central image from two URLs. I decided to automate this process and, with the help of ChatGPT, I have the following code

# %%
from datetime import datetime, timedelta
import requests
from bs4 import BeautifulSoup
import os

def extract_image_url(page_url, img_selector):
    response = requests.get(page_url)
    response.raise_for_status()  # Verifica se o download foi bem-sucedido
    
    soup = BeautifulSoup(response.text, 'html.parser')
    
    # Encontra a imagem usando o seletor CSS
    img_tag = soup.select_one(img_selector)
    if img_tag and 'src' in img_tag.attrs:
        return img_tag['src']
    else:
        raise ValueError("Imagem não encontrada ou seletor CSS incorreto")

def download_image(img_url, save_path):
    response = requests.get(img_url)
    response.raise_for_status()
    
    with open(save_path, 'wb') as f:
        f.write(response.content)
    
    print(f"Imagem salva em: {save_path}")

#%%
today = datetime.now().strftime('%Y%m%d')
# Calcular a data 16 dias à frente
dia_ahead = (datetime.now() + timedelta(days=16)).strftime('%Y%m%d')

# Base da URL
url_base = "https://weather.us/model-charts/standard/"

# URLs específicas
url_today_00z = f"{url_base}{today}00"
url_today_06z = f"{url_base}{today}06"

# URLs completas
url_today_00z_complete = f"{url_today_00z}/brazil/accumulated-precipitation/{dia_ahead}-0000z.html"
url_today_06z_complete = f"{url_today_06z}/brazil/accumulated-precipitation/{dia_ahead}-0060z.html"

print("\nURL Today 00z Complete:")
print(url_today_00z_complete)
print("\nURL Today 06z Complete:")
print(url_today_06z_complete)


urls = [url_today_00z_complete, url_today_06z_complete]

img_selector = "#click-overlay" 

#%% Extrair e baixar a imagem de cada URL
for url in urls:
    try:
        img_url = extract_image_url(url, img_selector)
        if not img_url.startswith('http'):
            img_url = f'https://weather.us{img_url}'
        save_path = os.path.join(os.getcwd(), os.path.basename(img_url))
        download_image(img_url, save_path)
    except Exception as e:
        print(f"Erro ao processar {url}: {e}")

To be clear, I want to extract images similar to this one

Could someone help me correct this code?

Dan P · Accepted Answer · 2024-07-09 15:27:47Z

The website you are querying uses a few different layers of images, including some blank tiles, to construct the chart you want. This is throwing of your code, as it's finding and saving the blank tiles, not the image.

However, your approach would not have working entirely anyway, as you need to grab multiple layers and combine them to create the image you want.

Here's some code that, given the URL you want, will download all the relevant layers and combine them. You can add the code you need to get specific dates, I have hardcoded a random one in for now.

If you change the region or anything, you may need to adjust which layers are captured, by changing the number IDs in the array around halfway.

You can also remove some layers if you want less information. There is a folder called "images" which contains every layer, you just modify the list to choose which you want to include.

import os
import requests
from bs4 import BeautifulSoup
from PIL import Image
from io import BytesIO

# URL of the website
url = "https://weather.us/model-charts/standard/2024070900/brazil/accumulated-precipitation/20240725-0000z.html"

# Send a GET request to the website
response = requests.get(url)

# Write the response content to a text file
with open('response_content.txt', 'wb') as file:
    file.write(response.content)


soup = BeautifulSoup(response.content, 'html.parser')

# Find the div with the id "main-image-content" as an alternative way
main_image_content = soup.find('div', id='main-image-content')

# Find all image URLs within any element below the main_image_content div
image_urls = []
for img_tag in main_image_content.find_all('img', recursive=True):
    img_url = img_tag.get('src')
    if img_url:
        image_urls.append(img_url)

# Create a folder called images if it doesn't exist
if not os.path.exists('images'):
    os.makedirs('images')

# Download only the specified images (2 to 6) into the images folder
image_files = []
for i in [2, 3, 4, 5, 6]:
    if i < len(image_urls):
        img_url = image_urls[i]
        img_response = requests.get(img_url)
        img = Image.open(BytesIO(img_response.content))
        img_path = os.path.join('images', f'image_{i}.png')
        img.save(img_path)
        image_files.append(img_path)

from PIL import ImageOps

# Combine the specified images overlayed on top of each other to make one final image
base_image = Image.open(image_files[0]).convert("RGBA")

# Ensure all images are the same size as the base image by padding them
for img_path in image_files[1:]:
    overlay_image = Image.open(img_path).convert("RGBA")
    # Pad the overlay image to match the size of the base image
    overlay_image = ImageOps.pad(overlay_image, base_image.size, method=Image.NEAREST, centering=(0, 0))
    base_image = Image.alpha_composite(base_image, overlay_image)

# Save the final image as out.png
base_image.save('out.png')

# Remove the images folder and its contents
for img_path in image_files:
    os.remove(img_path)
os.rmdir('images')

This is the output this code generates:

I had no idea about the layers compose the figure. I’d upvote your answer, but I can’t because my reputation is too low. You deserve the heavens, folk! — LeoLaboi, Commented Jul 9 at 20:35
@LeoLaboi If you accept my answer (usually under the upvote button), that does the same for me :) — Dan P, Commented Jul 10 at 21:18

BoppreH · Accepted Answer · 2024-07-09 15:22:01Z

0

The current code is downloading a spacer image, https://weather.us/images/overlay/space.png . That's because it's programmed to download the first image with "#click-overlay", which is not correct for that website.

The actual forecast is a combination of several images. Here's a change to download the first part, the map:

    # Encontra a imagem usando o seletor CSS
    img_tag = soup.select_one('img[src^="https://osm.weather.us/custom/en_overlays/"]')

answered Jul 9 at 15:22

BoppreH

9,6854 gold badges38 silver badges74 bronze badges

Add a comment |

Collectives™ on Stack Overflow

Extract an Image from a Web Page

2 Answers 2

Not the answer you're looking for? Browse other questions tagged
python-3.x
web-scraping
beautifulsoup
or ask your own question.

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Not the answer you're looking for? Browse other questions tagged python-3.xweb-scrapingbeautifulsoup or ask your own question.

Related

Not the answer you're looking for? Browse other questions tagged
python-3.x
web-scraping
beautifulsoup
or ask your own question.