Count the number of rows by group in Pandas

Python

Pandas

Group

Count rows

Papa D.

Python

Pandas

DataFrame

The Pandas groupby() function is a versatile tool for manipulating DataFrames. It allows grouping DataFrame rows by the values in a particular column and applying operations to each of those groups. One simple operation is to count the number of rows in each group, allowing us to see how many rows fall into different categories.

Code Example

Use the groupby() function to group rows by column values, and use the count operation to count the number of rows in each group. In this example, we count the number of occurances of each value in the "Car" column.

import pandas as pd

# Read in some data
office_workers = pd.read_csv('office.csv')
office_workers
Name Sex Age Height Weight Car
0 Alex M 41 74 170 BMW
1 Bert M 42 68 166 Toyota
2 Carl M 32 70 155 Ford
3 Dave M 39 72 167 Honda
4 Elly F 30 66 124 Honda
5 Fran F 33 66 115 Jeep
6 Gwen F 26 64 121 Toyota
7 Hank M 30 71 158 BMW
8 Ivan M 53 72 175 Ford
9 Jake M 32 69 143 Ford
10 Kate F 47 69 139 Honda
11 Luke M 34 72 163 Subaru
12 Myra F 23 62 98 Subaru
13 Neil M 36 75 160 Subaru
14 Omar M 38 70 145 Toyota
15 Page F 31 67 135 Jeep
16 Quin M 29 71 176 Toyota
17 Ruth F 28 65 131 Volkswagen
# Count the occurances of each type of 'Car'
office_workers.groupby('Car').count()
Name Sex Age Height Weight
Car
BMW 2 2 2 2 2
Ford 3 3 3 3 3
Honda 3 3 3 3 3
Jeep 2 2 2 2 2
Subaru 3 3 3 3 3
Toyota 4 4 4 4 4
Volkswagen 1 1 1 1 1

Note that values across each row are identical. This is because the count operation is independent of column data—it merely counts the number of rows in each group. Alternatively, groupby operations like mean and median use column data to produce a new value. For example, the mean operation would compute the mean age, weight, and height of everyone who owns a BMW, a Ford, a Honda, etc.

office.csv is a CSV file that contains the following:

"Name","Sex","Age","Height","Weight","Car"
"Alex","M",41,74,170,"BMW"
"Bert","M",42,68,166,"Toyota"
"Carl","M",32,70,155,"Ford"
"Dave","M",39,72,167,"Honda"
"Elly","F",30,66,124,"Honda"
"Fran","F",33,66,115,"Jeep"
"Gwen","F",26,64,121,"Toyota"
"Hank","M",30,71,158,"BMW"
"Ivan","M",53,72,175,"Ford"
"Jake","M",32,69,143,"Ford"
"Kate","F",47,69,139,"Honda"
"Luke","M",34,72,163,"Subaru"
"Myra","F",23,62,98,"Subaru"
"Neil","M",36,75,160,"Subaru"
"Omar","M",38,70,145,"Toyota"
"Page","F",31,67,135,"Jeep"
"Quin","M",29,71,176,"Toyota"
"Ruth","F",28,65,131,"Volkswagen"

More Examples

Custom Logic

In this example, we use the groupby function to partition the set of office workers into those 35 or older, or those younger than 35.

office_workers \
    .groupby(
        lambda idx: office_workers.loc[idx, 'Age'] < 35) \
    .count()[['Name']] \
    .rename(
        index={ False: 'Younger than 35', True: "35 or older"}, columns={'Name': 'NumWorkers'})
NumWorkers
Younger than 35 7
35 or older 11

Multiple Columns

In this example, we use the groupby function with a list of column names to partition the rows based on multiple identifying traits, then count how many are in each group.

office_workers \
    .assign(Decade=lambda row: (row.Age // 10) * 10) \
    .groupby(['Decade', 'Car']) \
    .count()[['Name']] \
    .rename(columns={'Name': 'NumWorkers'})
NumWorkers
Decade Car
20 Subaru 1
Toyota 2
Volkswagen 1
30 BMW 1
Ford 2
Honda 2
Jeep 2
Subaru 2
Toyota 1
40 BMW 1
Honda 1
Toyota 1
50 Ford 1