How can I list all records for a joined table but filter by it at the same time?

Let's say I have the following structure:
Courses
ID | name |
---|---|
1 | Math |
2 | History |
Students
ID | name | CourseID |
---|---|---|
1 | Bob | 1 |
2 | Alice | 1 |
I want to make a query which lists all the students in a course, but filter courses if a specific student is in that course.
If I use the following query, the student "Bob" is not in the comma separated list of students.
SELECT
c.name,
GROUP_CONCAT(s.name separator ',')
FROM
Courses as c
LEFT JOIN Students s ON s.courseID = c.ID
WEHRE
s.name = 'Alice'
GROUP BY
c.name
I know that I can join twice and use one to do the filtering, and the other to use for the GROUP_CONCAT
, but it feels inefficient because I'm joining twice. Is there a more efficent and/or straigtforward way to do this?
Answer
Here are two different solutions:
One that uses two joins. I assume the join can use an index on s.courseID
, so they are pretty optimized.
The optimizer would look up the single row for 'Alice' (again, if s.name
is indexed). This row is aliased sa
. Then from that lookup the single row in Courses
(c
), then from that find the set of Students enrolled in c
(s
). So it ends up examining 1 + 1 + n rows, where n is the size of the one course.
SELECT
c.name,
GROUP_CONCAT(s.name separator ',')
FROM
Courses as c
JOIN Students sa ON s.courseID = c.ID AND s.name = 'Alice'
JOIN Students s ON s.courseID = c.ID
GROUP BY
c.name;
Another solution uses a single join, and a HAVING
condition such that a group is included in the result only if 'Alice' is present in the group. This works, but in my opinion it's a bit harder to explain if you had to show it to a colleague.
SELECT
c.name,
GROUP_CONCAT(s.name separator ',')
FROM
Courses as c
JOIN Students s ON s.courseID = c.ID
GROUP BY
c.name
HAVING
MAX(CASE s.name WHEN 'Alice' THEN 1 END) = 1;
As for optimization, it's worse, because HAVING
doesn't filter until after groups have been formed. So it would scan all courses and the students enrolled in those courses, whether 'Alice' is enrolled or not. Then form the groups, then filter out groups that don't include 'Alice'.
So this solution tends to examine a lot more rows, and performance is more or less proportional to the number of rows examined.
Enjoyed this article?
Check out more content on our blog or follow us on social media.
Browse more articles