Data Science/SQL

[Udemy] 15 Days of SQL - Day 3: Grouping, Functions, GROUP BY, HAVING

kyra 2024. 1. 31. 17:15

1. Aggregate Functions

  • Aggregate values in multiple rows to one value
  • SUM, AVG, MIN, MAX, COUNT
  • Syntax
SELECT
SUM (amount),
ROUND(AVG (amount), 3) AS AVERAGE -- 3 decimal places
FROM payment

 

* Challenge

Write a query to see the minimum, maximum, average(rounded), sum of the replacement cost of the films

SELECT
MIN (replacement_cost),
MAX (replacement_cost),
ROUND(AVG(replacement_cost), 3) AS AVG,
SUM (replacement_cost)
FROM film

 

2. GROUP BY

  • Used to group aggregations by specific columns
  • Comes right after FROM or WHERE clause
  • Every coulmn should be in either GROUP BY or Aggregate functions
  • Syntax
SELECT
customer_id,
SUM(amount)
FROM payment
WHERE customer_id > 3
GROUP BY customer_id

 

* Challenge

Find out which of the two employees (staff_id) is responsible for more payments

1) Which of the two is responsible for a higher overall payment amount?

2) How do these amounts change if we don't consider amounts equal to 0?

SELECT
staff_id,
SUM (amount),
AVG (amount),
COUNT (*)
FROM payment
WHERE amount != 0
GROUP BY staff_id
ORDER BY SUM(amount) DESC

2. GROUP BY multiple columns

SELECT
staff_id,
customer_id,
SUM(amount),
COUNT(*)
FROM payment
GROUP BY staff_id, customer_id -- multiple columns to group values by
ORDER BY COUNT(*) DESC

 

 

* Tips

Extracting only date

SELECT
DATE(column namme)
FROM

 

* Challenge

Find out which employee had the highest sales amount in a single day

Which employee had the most sales in a single day (not counting payments with amount - 0)?

SELECT
staff_id,
DATE(payment_date),
SUM(amount),
COUNT(amount)
FROM payment
--WHERE amount != 0
GROUP BY staff_id, DATE(payment_date)
ORDER BY COUNT(amount) DESC

 

3. HAVING

  • Used to filter Groupings by aggregations
  • HAVING is only used with GROUP BY
  • Syntax
SELECT
staff_id,
SUM(amount),
COUNT(*)
FROM payment
GROUP BY staff_id, DATE(payment_date)
HAVING COUNT(*) = 28 OR COUNT(*) = 29 --multiple conditions
ORDER BY COUNT(*) DESC

 

 

* Day 3 Challenge

In 2020, April 28, 29 and 30 were days with very high revenue.

- Find out what is the average payment amount grouped by customer and day - consider only the days/customers with more than 1 payment (per customer and day). 

- Order by the average amount in a descending order.

SELECT
customer_id,
DATE(payment_date),
ROUND(AVG(amount), 2) AS avg_amount,
COUNT(*)
FROM payment
WHERE payment_date BETWEEN '2020-04-28' AND '2020-04-30 23:59'
--WHERE DATE(payment_date) IN ('2020-04-28','2020-04-29','2020-04-30')
GROUP BY DATE(payment_date), customer_id
HAVING COUNT(*) > 1
ORDER BY AVG(amount) DESC