logo

Extracting Text Patterns

Often you need to extract parts of strings - area codes from phone numbers, domains from emails, IDs from URLs.

Split and take a part:

df['email'].str.split('@').str[1]  # Get domain

Extract with regex:

df['phone'].str.extract(r'(\d{3})-\d{3}-\d{4}')  # Get area code

Extract multiple groups:

df['name'].str.extract(r'(\w+)\s+(\w+)')  # First and last name

The extract() method is powerful - any pattern you can describe with regex, you can pull out into new columns.

For advanced text extraction, see The Ultimate Pandas Bootcamp.