## Count the number of rows by group in Pandas

Python

Pandas

Group

Count rows

Papa D.

Python

Pandas

DataFrame

The Pandas `groupby()` function is a versatile tool for manipulating DataFrames. It allows grouping DataFrame rows by the values in a particular column and applying operations to each of those groups. One simple operation is to count the number of rows in each group, allowing us to see how many rows fall into different categories.

### Code Example

Use the `groupby()` function to group rows by column values, and use the `count` operation to count the number of rows in each group. In this example, we count the number of occurances of each value in the "Car" column.

``````import pandas as pd

office_workers``````
Name Sex Age Height Weight Car
0 Alex M 41 74 170 BMW
1 Bert M 42 68 166 Toyota
2 Carl M 32 70 155 Ford
3 Dave M 39 72 167 Honda
4 Elly F 30 66 124 Honda
5 Fran F 33 66 115 Jeep
6 Gwen F 26 64 121 Toyota
7 Hank M 30 71 158 BMW
8 Ivan M 53 72 175 Ford
9 Jake M 32 69 143 Ford
10 Kate F 47 69 139 Honda
11 Luke M 34 72 163 Subaru
12 Myra F 23 62 98 Subaru
13 Neil M 36 75 160 Subaru
14 Omar M 38 70 145 Toyota
15 Page F 31 67 135 Jeep
16 Quin M 29 71 176 Toyota
17 Ruth F 28 65 131 Volkswagen
``````# Count the occurances of each type of 'Car'
office_workers.groupby('Car').count()``````
Name Sex Age Height Weight
Car
BMW 2 2 2 2 2
Ford 3 3 3 3 3
Honda 3 3 3 3 3
Jeep 2 2 2 2 2
Subaru 3 3 3 3 3
Toyota 4 4 4 4 4
Volkswagen 1 1 1 1 1

Note that values across each row are identical. This is because the `count` operation is independent of column data—it merely counts the number of rows in each group. Alternatively, `groupby` operations like `mean` and `median` use column data to produce a new value. For example, the `mean` operation would compute the mean age, weight, and height of everyone who owns a BMW, a Ford, a Honda, etc.

`office.csv` is a CSV file that contains the following:

``````"Name","Sex","Age","Height","Weight","Car"
"Alex","M",41,74,170,"BMW"
"Bert","M",42,68,166,"Toyota"
"Carl","M",32,70,155,"Ford"
"Dave","M",39,72,167,"Honda"
"Elly","F",30,66,124,"Honda"
"Fran","F",33,66,115,"Jeep"
"Gwen","F",26,64,121,"Toyota"
"Hank","M",30,71,158,"BMW"
"Ivan","M",53,72,175,"Ford"
"Jake","M",32,69,143,"Ford"
"Kate","F",47,69,139,"Honda"
"Luke","M",34,72,163,"Subaru"
"Myra","F",23,62,98,"Subaru"
"Neil","M",36,75,160,"Subaru"
"Omar","M",38,70,145,"Toyota"
"Page","F",31,67,135,"Jeep"
"Quin","M",29,71,176,"Toyota"
"Ruth","F",28,65,131,"Volkswagen"``````

### More Examples

#### Custom Logic

In this example, we use the `groupby` function to partition the set of office workers into those 35 or older, or those younger than 35.

``````office_workers \
.groupby(
lambda idx: office_workers.loc[idx, 'Age'] < 35) \
.count()[['Name']] \
.rename(
index={ False: 'Younger than 35', True: "35 or older"}, columns={'Name': 'NumWorkers'})``````
NumWorkers
Younger than 35 7
35 or older 11

#### Multiple Columns

In this example, we use the `groupby` function with a list of column names to partition the rows based on multiple identifying traits, then count how many are in each group.

``````office_workers \
.assign(Decade=lambda row: (row.Age // 10) * 10) \
.count()[['Name']] \
.rename(columns={'Name': 'NumWorkers'})``````
NumWorkers