Published:
Warning: This blog entry was written two or more years ago. Therefore, it may contain broken links, out-dated or misleading content, or information that is just plain wrong. Please read on with caution.
If I've said it once I've said it a hundred times, only get the data you need from the database. Yet time and again I come across applications that load more data than is needed to accomplish a task, or go about parsing it in the most inefficient way possible.
This particular example is one I come across a lot.
Example: Group Membership
Lets take a common design problem. You have a table of users, a table of groups and you need to create a many-to-many relationship between the two. The database design is fairly simple.
Users | |
---|---|
PK | userId |
username |
Groups | |
---|---|
PK | groupId |
groupname |
GroupMembership | |
---|---|
PK | userId |
PK | groupId |
Now in you application you wish to create an interface listing all possible groups that a user could be a member of, and flag those groups which the user actually is a member of. Lets say this is a form with a list of groups and checkboxes.
The Wrong Way
The way I normally see this solved is by using two queries, one to get all possible groups and another to get all groups the given user is a member of. Then the application loops over all the possible groups, and for each possible group iteration it does a nested loop over all member groups to look for matching groupId's like this.
Aside: This is written in cfml, but I've seen the pattern in just about every language I've come across.
Get All Possible Groups
SELECT
g1.groupId
, g1.groupName
FROM
Groups AS g1
Get All Groups This User Is A Member Of
SELECT
m1.groupId
FROM
GroupMembership AS m1
WHERE
userId = 1
Generate The Form Fields
<!--- Loop over all possible groups --->
<cfloop query="allGroups">
<cfset isMember = false>
<!--- Loop over all groups the user is a member of --->
<cfloop query="usersGroups">
<!--- Check the box if user is a member of this group --->
<cfif allGroups.groupId EQ usersGroups.groupId>
<cfset isMember = true>
<cfbreak>
</cfif>
</cfloop>
<label>#allGroups.groupName#</label>
<input
type="checkbox"
name="memberGroupId"
value="#allGroups.groupId#"
<cfif isMember>checked="checked"</cfif>><br/>
</cfloop>
So Whats Wrong With This?
Well on the face of it not too much. It generates the output we want and marks the correct checkbox's. It even passes my rule of only getting enough data to perform the action.
All in all its not that bad. In fact a few years ago I frequently used this pattern myself (before I knew better).
The problem with this solution has two parts. Firstly this solution requires two separate calls to the database to get related data in order to generate one list. Secondly but more importantly it requires nested loops.
Why Is This Bad
Continuing the users and group example. Say you have 50 possible groups, and a given user is a member of 10 of those groups. In order to generate the above output you could do as many as 500 comparisons between the two result sets. This is mitigated somewhat by having a break in the inner loop but even so as either group grows in size the total number of loop iterations multiplies.
The Right Way
The right way to solve this problem is actually very simple to implement and even requires less coding. What we need to do is generate a single result set with the {groupId,groupName} columns as before but with an additional third flag column showing if the given user is a member or not.
Step 1: Get All Possible Groups
To start lets get all the possible groups as before.
SELECT
g1.groupId
, g1.groupName
FROM
Groups AS g1
Step 2: Join The GroupMembership Table
Next we join on the group membership table. Note that we use a "Left Join" so that we are guarenteed to always get all possible groups even if they are not currently in the groupMembership table.
SELECT
g1.groupId
, g1.groupName
FROM
Groups AS g1
LEFT JOIN
GroupMembership AS m1
ON m1.groupId = g1.groupId
Step 3: Limit By User
Now that we have all groupMemberships joined we filter them down to our target user, in this case the user with id of 1.
SELECT
g1.groupId
, g1.groupName
FROM
Groups AS g1
LEFT JOIN
GroupMembership AS m1
ON m1.groupId = g1.groupId
AND m1.userId = 1
You will note that the userId condition is on the join statement and not a where clause. The significance of this is that if we put it on a where clause we would only get groups that the user is a member of, when in fact we want all possible groups.
Step 4: Create A Flag
Finally we need to create a flag to mark those groups which this user is a member of. To do this we use the fact that the value of m1.userId will be NULL when the user is not a member of a particular group. So we simply test for this condition with a CASE statement like this and output the result as a simple true/false flag.
SELECT
g1.groupId
, g1.groupName
, CASE WHEN m1.userId IS NOT NULL
THEN 1
ELSE 0
END AS isMember
FROM
Groups AS g1
LEFT JOIN
GroupMembership AS m1
ON m1.groupId = g1.groupId
AND m1.userId = 1
The Result
Using the resultset generated from this single query, our application code to generate the form is significantly simplified and requires less overall processing.
<!--- Loop over all possible groups --->
<cfloop query="groupMembership">
<label>#groupMembership.groupName#</label>
<input
type="checkbox"
name="memberGroupId"
value="#groupMembership.groupId#"
<cfif groupMembership.isMember>checked="checked"</cfif>><br/>
</cfloop>
Granted this example is somewhat simplified but the theory can be applied in many similar situations to good effect. So please, lets cut down on the nested loops.
Reader Comments
Tuesday, December 11, 2012 at 6:34:37 AM Coordinated Universal Time
You can also use the isNull function rather than the switch
isnull(m1.userId,0) AS isMember
Nice post (+1)