-1

I am trying to scrape earthquake weather data from USGS and my code runs up to the print(soup) line but nothing after that

import requests
from bs4 import BeautifulSoup

url="https://earthquake.usgs.gov/earthquakes/map/?extent=-87.0066,-435.23438&extent=86.96966,239.76563&map=false"
page_response=requests.get(url)

if page_response.status_code==200:
    page_content=page_response.text
    soup=BeautifulSoup(page_content,"html.parser")
    print(soup)
else:
    print(f"website was not found, status code: {page_response.status_code}")

earthqs=soup.find_all("usgs-event-list",class_="ng-star-inserted")
for each_eq in earthqs:
    e_magnitude=earthqs.find("div",class_="ng-star-inserted")
    e_location=earthqs.find("h6",class_="header")
    e_time=earthqs.find("span",class_="time")
    e_diameter=earthqs.find("span",class_="ng-star-inserted")
    print(f"Eathquake magnitude:{e_magnitude.text.strip()}")
    print(f"Eathquake location:{e_location.text.strip()}")
    print(f"Eathquake time:{e_time.text.strip()}")
    print(f"Eathquake diameter:{e_diameter.text.strip()}")
    print()
    
    
    

    
    

I want it to run all the lines of code and not to only end at the print(soup) line.

1
  • Where is exactly the print(soup) line in your code? from what I see maybe the response's status code never returns 200? the question is ambiguous. Commented Jul 18, 2024 at 16:51

2 Answers 2

0

I'm assuming you removed the print(soup) line from the question.

The problem is in the response obtained by fetching the URL. The response which you are using with BeautifulSoup does not contain the required data. If you look carefully at the output of print(soup), it doesn't have the tags which you are looking for. This is because the server is returning a no-javascript version of the page.

But if you look in the no-javascript message, it does provide an alternative. Image of no-script tag

You can use this url instead to get an XML containing the same data: https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.atom This can also be parsed using BeautifulSoup.

Sign up to request clarification or add additional context in comments.

Comments

0

The content on the page is dynamically rendered, so your approach is not going to work, unfortunately.

However, you can go directly to the API.

import requests
from datetime import datetime
import json

headers = {
    'User-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:127.0) Gecko/20100101 Firefox/127.0',
    'Accept': 'application/json, text/plain, */*',
    'Accept-Language': 'en-US,en;q=0.5',
}

response = requests.get(
    'https://earthquake.usgs.gov/earthquakes/feed/v1.0/summary/2.5_day.geojson',
    headers=headers,
)

quakes = []
#
for quake in response.json()["features"]:
    # There a LOT of information in quake, but just extract a few elements for illustration.
    quakes.append({
        "magnitude": quake["properties"]["mag"],
        "location": quake["geometry"]["coordinates"],
        "time": datetime.fromtimestamp(quake["properties"]["time"] / 1000).strftime("%Y-%m-%d %H:%M:%S")
    })

print(json.dumps(quakes, indent=2))

As noted in the code, the response returned by the API is very rich and I'm just extracting some of the data. You should take a look at the content of response.json() to see everything that's in the response.

Sample output (first few records):

[                                                                                                                                                                                                                                       
  {                                                                                                                                                                                                                                     
    "magnitude": 2.9,                                                                                                                                                                                                                   
    "location": [                                                                                                                                                                                                                       
      -147.5693,                                                                                                                                                                                                                        
      61.3318,                                                                                                                                                                                                                          
      20.1                                                                                                                                                                                                                              
    ],                                                                                                                                                                                                                                  
    "time": "2024-07-19 05:33:32"                                                                                                                                                                                                       
  },                                                                                                                                                                                                                                    
  {                                                                                                                                                                                                                                     
    "magnitude": 2.5,                                                                                                                                                                                                                   
    "location": [                                                                                                                                                                                                                       
      -101.0491,                                                                                                                                                                                                                        
      32.1374,                                                                                                                                                                                                                          
      0.718                                                                                                                                                                                                                             
    ],                                                                                                                                                                                                                                  
    "time": "2024-07-19 05:12:31"                                                                                                                                                                                                       
  },                                                                                                                                                                                                                                    
  {                                                                                                                                                                                                                                     
    "magnitude": 3.03,                                                                                                                                                                                                                  
    "location": [                                                                                                                                                                                                                       
      -66.83,                                                                                                                                                                                                                           
      17.9955,                                                                                                                                                                                                                          
      12.69                                                                                                                                                                                                                             
    ],                                                                                                                                                                                                                                  
    "time": "2024-07-19 04:37:32"                                                                                                                                                                                                       
  },
  ...
]

As mentioned in other answer, you can also grab the XML directly and parse that.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.