logo

Extracting Attributes

Sometimes you need more than text. Links have URLs, images have sources, and forms have actions. These live in HTML attributes.

link = soup.find("a")
url = link["href"]  # Get the href attribute

Access attributes like dictionary keys. Common ones you'll extract:

img = soup.find("img")
print(img["src"])      # Image URL
print(img.get("alt"))  # Alt text (safely)

Using .get() is safer - it returns None instead of crashing if the attribute doesn't exist.

For links, you often want both the text and the URL:

for link in soup.find_all("a"):
    print(link.text, "->", link.get("href"))

Real scraping is often about collecting these attributes - URLs to follow, image sources to download, data attributes that contain information.

Learn attribute extraction in my Web Scraping course.