SQL Server / Oracle: DISTINCT or GROUP BY

Firstly, sometimes sticking in a DISTINCT is quick and easy, but so easily missed when reading the SQL and also conversely the group by can be pretty long if there are a lot of columns selected.

Firstly a basic example:
SQL Server

SELECT
a
FROM
(SELECT 1 AS a
UNION ALL SELECT 1 AS a
UNION ALL SELECT 2 AS a) b
GROUP BY a

SELECT
DISTINCT a
FROM
(SELECT 1 AS a
UNION ALL SELECT 1 AS a
UNION ALL SELECT 2 AS a) b

Oracle

SELECT
a
FROM
(SELECT 1 AS a FROM DUAL
UNION ALL SELECT 1 AS a FROM DUAL
UNION ALL SELECT 2 AS a FROM DUAL) b
GROUP BY a;

SELECT
DISTINCT a
FROM
(SELECT 1 AS a FROM DUAL
UNION ALL SELECT 1 AS a FROM DUAL
UNION ALL SELECT 2 AS a FROM DUAL) b;

They both return the same result, simple, now imagine a lot more columns in the select and make your choice of DISTINCT or GROUP BY carefully.

Is COUNT(DISTINCT a) the same as DISTINCT COUNT(a)?
No, the former you select distinct then count them, the later you count then select the distinct counts, so results are 2 and 3.
Also you can’t swap DISTINCT for a GROUP BY in this case, because the GROUP BY would return you two counts, one for each grouped value, however using a nested query you could select the count from the grouped values, like so:

SQL Server

SELECT COUNT(*)
FROM
(
	SELECT
	a
	FROM
	(SELECT 1 AS a
	UNION ALL SELECT 1 AS a
	UNION ALL SELECT 2 AS a) b
	GROUP BY a
) c

Oracle

SELECT COUNT(*)
FROM
(
	SELECT
	a
	FROM
	(SELECT 1 AS a FROM DUAL
	UNION ALL SELECT 1 AS a FROM DUAL
	UNION ALL SELECT 2 AS a FROM DUAL) b
	GROUP BY a
) c;

I always try to remind people that DISTINCT works on what is selected, don’t just go sticking it wherever you feel like as I won’t see it, put it right after the SELECT as a means of helping me see it.

There is also the group by pitfall of grouping on something that you aren’t selecting. Let’s say you originally select two columns grouping on both so you get distinct values, great, then you decide, actually I am only interested in the first column so the second is removed from the select, unless it is also removed from the group by the results will not be the same as distinct! This sort of thing normally leads to the group by being dropped and replaced by distinct.

I hope this helps.

Advertisements

One response to “SQL Server / Oracle: DISTINCT or GROUP BY

  1. Pingback: LINQ Distinct / Group By | Tim's cycling blog

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s