What is a Pandas DataFrame?
If you're coming from Excel or SQL, you'll feel right at home with the DataFrame. It is the absolute heart of the pandas library.
In simple terms, a DataFrame is just a table of data. It has rows and columns, just like a spreadsheet. But unlike a spreadsheet, it's programmable! You can filter millions of rows, create new calculated columns, or summarize data with just a line or two of Python code.
Here is how you can create a simple one from scratch:
import pandas as pd
data = {
'name': ['Alice', 'Bob', 'Carol'],
'age': [25, 30, 35],
'city': ['New York', 'Paris', 'London']
}
df = pd.DataFrame(data)
If you were to print this df, it would look something like this:
name age city
0 Alice 25 New York
1 Bob 30 Paris
2 Carol 35 London
Notice the numbers 0, 1, 2 on the left? That's the index. It's like the row numbers in Excel, giving each row a unique address.
Each column in a DataFrame is actually a Series (another pandas type). So, a DataFrame is basically a bunch of Series stuck together sharing a common index.
Why is this so powerful? Because pandas is optimized for speed. You can perform complex operations on this data—like finding the average age or filtering for everyone in "Paris"—instantly, even if you have huge datasets.
I cover DataFrame fundamentals thoroughly, including how to read them from real-world files, in my Pandas course.