Indexing and Selection of Data in Pandas
…
Efficiently selecting and manipulating data is a core aspect of data analysis. Pandas provides a powerful suite of indexing and selection tools to facilitate this. Let’s explore these capabilities in depth, focusing on the Series
and DataFrame
data structures.
Indexing and Selection in Series
A Series
in Pandas is a one-dimensional array with labels, which can be used for efficient indexing and selection.
Basic Indexing
You can access elements in a Series
using integer-based and label-based indexing.
import pandas as pd
# Creating a Series
s = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
# Integer-based indexing
print(s[0]) # Output: 10
# Label-based indexing
print(s['a']) # Output: 10
Slicing
Slicing allows you to select a range of elements.
# Integer-based slicing
print(s[1:3]) # Output: b 20, c 30
# Label-based slicing
print(s['b':'d']) # Output: b 20, c 30, d 40
Boolean Indexing
You can use boolean indexing to filter elements based on conditions.
print(s[s > 20]) # Output: c 30, d 40
Fancy Indexing
Fancy indexing involves passing a list of labels or integers to select multiple elements.
print(s[['a', 'c', 'd']]) # Output: a 10, c 30, d 40
Indexing and Selection in DataFrame
A DataFrame
in Pandas is a two-dimensional, labeled data structure, similar to a table in a relational database or a spreadsheet.
Selecting Columns
You can select columns in a DataFrame
using label-based indexing.
# Creating a DataFrame
df = pd.DataFrame({
'A': [1, 2, 3],
'B': [4, 5, 6],
'C': [7, 8, 9]
})
# Selecting a single column
print(df['A']) # Output: 0 1, 1 2, 2 3
# Selecting multiple columns
print(df[['A', 'C']])
# Output:
# A C
# 0 1 7
# 1 2 8
# 2 3 9
Selecting Rows
You can select rows using label-based indexing with loc
or integer-based indexing with iloc
.
# Label-based indexing
print(df.loc[1]) # Output: A 2, B 5, C 8
# Integer-based indexing
print(df.iloc[1]) # Output: A 2, B 5, C 8
Slicing
Slicing can be performed on both rows and columns.
# Slicing rows
print(df[1:3])
# Output:
# A B C
# 1 2 5 8
# 2 3 6 9
# Slicing rows and columns
print(df.loc[1:3, 'A':'B'])
# Output:
# A B
# 1 2 5
# 2 3 6
Boolean Indexing
Boolean indexing can be used to filter rows based on column values.
print(df[df['A'] > 1])
# Output:
# A B C
# 1 2 5 8
# 2 3 6 9
Fancy Indexing
Fancy indexing allows selection of specific rows and columns by passing lists of indices.
# Selecting specific rows
print(df.iloc[[0, 2]])
# Output:
# A B C
# 0 1 4 7
# 2 3 6 9
# Selecting specific columns
print(df[['A', 'C']])
# Output:
# A C
# 0 1 7
# 1 2 8
# 2 3 9
Advanced Indexing and Selection
MultiIndex
Pandas supports hierarchical indexing, which allows you to work with higher-dimensional data in a lower-dimensional data structure.
# Creating a MultiIndex DataFrame
index = pd.MultiIndex.from_tuples([('A', 1), ('A', 2), ('B', 1), ('B', 2)])
df = pd.DataFrame({
'Value': [10, 20, 30, 40]
}, index=index)
print(df)
# Output:
# Value
# A 1 10
# 2 20
# B 1 30
# 2 40
# Selecting data from MultiIndex
print(df.loc['A'])
# Output:
# Value
# 1 10
# 2 20
Setting and Resetting Index
You can set a column as an index and reset it back to a column.
# Setting an index
df = df.reset_index()
print(df)
# Output:
# level_0 level_1 Value
# 0 A 1 10
# 1 A 2 20
# 2 B 1 30
# 3 B 2 40
# Resetting the index
df = df.set_index(['level_0', 'level_1'])
print(df)
# Output:
# Value
# level_0 level_1
# A 1 10
# 2 20
# B 1 30
# 2 40
Conclusion
Pandas offers a comprehensive set of tools for indexing and selecting data, allowing for both simple and complex operations. Whether you’re working with one-dimensional Series
or multi-dimensional DataFrame
objects, Pandas provides intuitive and powerful methods to access and manipulate your data efficiently.
Learn How To Build AI Projects
Now, if you are interested in upskilling in 2024 with AI development, check out this 6 AI advanced projects with Golang where you will learn about building with AI and getting the best knowledge there is currently. Here’s the link.
Last updated 17 Aug 2024, 12:31 +0200 .