Friday, May 15, 2015

Filtering/Grouping by multiple columns in sql subquery

I have been struggling for a while to have multiple columns in subquery and then filtering out the result.

Often with composite keys in the staging table, you want to check if there are any duplicates.

Taking an example of a fictitious "sales" table:

select * from sales where order_number in
(select order_number from sales
group by order_number, customer_id
having count(*)>1)

Above query might not give the desired results if I am looking for particular results with filters. The reason being I did not apply all the fields in the select query for the ones used in group by.

An extensible solution below takes care of that:

--Use CTE to store result for further filtering
with cte as (select s1.field1, s1.field2, s1.field3 from sales as s1
where exists
--Have a subquery for multiple column grouping. Compare it with the fields of the parent query, which would yield 1 corresponding record
(select s2.field1, s2.field2, sd.field3  from sales as s2
where s2.field1=sd1.field1 and s2.field2=s1.field2 and sd.field3=s1.field3
group by field1,field2,field3
) --Have your extra condition here for filtering
)
select * from cte where reached_target='Y'  -- Get all those qualified records needed

In this case, Common Table Expression (CTE) does come in very handy to filter out the final resultset.