Using A Conditional Join To Flag Group Membership

Published: {ts '2012-12-11 00:00:00'}
Author: Steven Neiland
Site Url: http://www.neiland.net/article/using-a-conditional-join-to-flag-group-membership/

If I've said it once I've said it a hundred times, only get the data you need from the database. Yet time and again I come across applications that load more data than is needed to accomplish a task, or go about parsing it in the most inefficient way possible.

This particular example is one I come across a lot.

Example: Group Membership

Lets take a common design problem. You have a table of users, a table of groups and you need to create a many-to-many relationship between the two. The database design is fairly simple.

Users
PKuserId
 username
Groups
PKgroupId
 groupname
GroupMembership
PKuserId
PKgroupId

Now in you application you wish to create an interface listing all possible groups that a user could be a member of, and flag those groups which the user actually is a member of. Lets say this is a form with a list of groups and checkboxes.





The Wrong Way

The way I normally see this solved is by using two queries, one to get all possible groups and another to get all groups the given user is a member of. Then the application loops over all the possible groups, and for each possible group iteration it does a nested loop over all member groups to look for matching groupId's like this.

Aside: This is written in cfml, but I've seen the pattern in just about every language I've come across.

Get All Possible Groups

SELECT g1.groupId , g1.groupName FROM Groups AS g1

Get All Groups This User Is A Member Of

SELECT m1.groupId FROM GroupMembership AS m1 WHERE userId = 1

Generate The Form Fields

checked="checked">

So Whats Wrong With This?

Well on the face of it not too much. It generates the output we want and marks the correct checkbox's. It even passes my rule of only getting enough data to perform the action.

All in all its not that bad. In fact a few years ago I frequently used this pattern myself (before I knew better).

The problem with this solution has two parts. Firstly this solution requires two separate calls to the database to get related data in order to generate one list. Secondly but more importantly it requires nested loops.

Why Is This Bad

Continuing the users and group example. Say you have 50 possible groups, and a given user is a member of 10 of those groups. In order to generate the above output you could do as many as 500 comparisons between the two result sets. This is mitigated somewhat by having a break in the inner loop but even so as either group grows in size the total number of loop iterations multiplies.

The Right Way

The right way to solve this problem is actually very simple to implement and even requires less coding. What we need to do is generate a single result set with the {groupId,groupName} columns as before but with an additional third flag column showing if the given user is a member or not.

Step 1: Get All Possible Groups

To start lets get all the possible groups as before.

SELECT g1.groupId , g1.groupName FROM Groups AS g1

Step 2: Join The GroupMembership Table

Next we join on the group membership table. Note that we use a "Left Join" so that we are guarenteed to always get all possible groups even if they are not currently in the groupMembership table.

SELECT g1.groupId , g1.groupName FROM Groups AS g1 LEFT JOIN GroupMembership AS m1 ON m1.groupId = g1.groupId

Step 3: Limit By User

Now that we have all groupMemberships joined we filter them down to our target user, in this case the user with id of 1.

SELECT g1.groupId , g1.groupName FROM Groups AS g1 LEFT JOIN GroupMembership AS m1 ON m1.groupId = g1.groupId AND m1.userId = 1

You will note that the userId condition is on the join statement and not a where clause. The significance of this is that if we put it on a where clause we would only get groups that the user is a member of, when in fact we want all possible groups.

Step 4: Create A Flag

Finally we need to create a flag to mark those groups which this user is a member of. To do this we use the fact that the value of m1.userId will be NULL when the user is not a member of a particular group. So we simply test for this condition with a CASE statement like this and output the result as a simple true/false flag.

SELECT g1.groupId , g1.groupName , CASE WHEN m1.userId IS NOT NULL THEN 1 ELSE 0 END AS isMember FROM Groups AS g1 LEFT JOIN GroupMembership AS m1 ON m1.groupId = g1.groupId AND m1.userId = 1

The Result

Using the resultset generated from this single query, our application code to generate the form is significantly simplified and requires less overall processing.

checked="checked">

Granted this example is somewhat simplified but the theory can be applied in many similar situations to good effect. So please, lets cut down on the nested loops.