Member-only story

Pandas groupby() vs. SQL GROUP BY

btd
2 min readNov 15, 2023

--

Photo by 蔡 世宏 on Unsplash

1. GROUP BY in SQL:

In SQL, the GROUP BY clause is used to arrange identical data into groups. This statement is often used with aggregate functions like COUNT(), SUM(), MAX(), MIN(), and AVG() to perform operations on each group of data.

Here is a basic syntax:

SELECT column1, aggregate_function(column2)
FROM table
GROUP BY column1;

For example, let’s say you have a table named orders with columns customer_id and total_amount, and you want to know the total amount spent by each customer:

SELECT customer_id, SUM(total_amount) as total_spent
FROM orders
GROUP BY customer_id;

This query will group the data by customer_id and calculate the total amount spent by each customer.

1. groupby in Python (using pandas):

In Python, the groupby function is part of the pandas library and is used for grouping data based on some criteria. It's often used in conjunction with aggregate functions to perform operations on each group of data.

Here is a basic syntax:

import pandas as pd

# Assuming 'df' is your DataFrame
result = df.groupby('column1')['column2'].aggregate_function()

--

--

btd
btd

No responses yet