Monday 21 December 2015

A Quest to Understand Select Query

Its quite a sometime now, haven't posted anything new, So thought of writing a post on SQL Query. Continuing on my quest to develop an app, I recently came across a very typical yet a very interesting requirement (may be it was interesting to me just because I din't know how to tackle it ).

Background:
Let's assume I have a table called student with few relevant columns in it. Hmmmmm... why just assume? let's create one :)

create table student 
Firstname varchar(255), 
Secondname varchar(255),
Descriptionaboutstudent varchar(255)
);

The above query not so surprisingly creates a table named student with description that looked somewhat like:








A blank table is not of much use I guess. So, let's just populate it with some dummy data...what say?









Looks like the stage is set now. We can now dive into the problem description and explore my way of solving it I guess.

Problem:
Now if the requirement says, you need to search the table "student" and get all the rows/records that has text/string "sandeep" in any of its column, you know what to do. don't you?

"Yeah, run the LIKE query man!!!". That might be your response if you are normal.
 If you are a performance junkie, then your response might sound like "Run a Full-text search man!!!". (Full-text search has its own inherent limitations, description on that may be in next post).

There are more than one way to accomplish the task and each way has its own pros and cons. Let's not debate on that, there are many other post that do a fair job of analyzing all of them. For this post, we will stick with basics. That means we gonna accomplish the task with LIKE query.

Now that being said, how will a typical LIKE query might look like huh??

select * 
from student
where Firstname LIKE '%sandeep%' || Secondname LIKE '%sandeep%' || Descriptionaboutstudent LIKE '%sandeep%';

Good. Query looks familiar and might work as well. Let's run.










Yeah!! Thank god. Query worked and we got the out put as well.
"Wait a second. WHAT THE HELL WAS YOUR PROBLEM THEN? WHAT WERE WE TRYING TO SOLVE??".

If I dump the query result as it is without processing it to my rest API and rest API sends it to client, client will show the data set as it is. Isn't it? 

Now what's wrong in that???

Don't you feel the though query fetched all the rows containing the text sandeep, it never understood the relevance of search?? 

If you are a user don't you wanna see the person whose First name is "sandeep" on top of the list than a person whose second name is "sandeep" or even worse seeing a person on top just because he has a friend named "sandeep"??? (Please refer the query results pasted above)

How results Should have looked:
1> Rows which have "sandeep" in their First name column should have been given a highest preference and should have appeared on top.

2>Rows which have "sandeep" in their Second name column should have been given a second highest preference and should have appeared in second position. 

3>Rows which have "sandeep" in their Descriptionaboutstudent should have been given a least preference and should have appeared at last position.










Severity of Issue:
The issue might not sound severe in this case. But think of situation when you are maintaining a database with huge data like the one in e-commerce web site and on your inventory data base if user searches 'FAN' don't you wanna show him a fan rather than showing a toy/computer that has fan as its part on top??

Issue here is, though our LIKE query understands that it has to search the query text in all columns, it does not understand the importance/priorities of these columns in the current search context.

Solutions:
In my opinion this can be easily solved with two approaches. I love the first one though.

Solution 1: I don't care!!! That's not my JOB
Simple, straight forward and 100% effective approach to this problem is to say "That's not my JOB bob." Works every time believe me.

LIKE command is not meant for this. LIKE is designed to search not to sort the order of the search results. Now sorting is either the job of rest API developer or the client application developers. Why should I burden my DB for that?.

Though very lame answer, cant argue much against it and have to agree that its probably one of the correct solutions available.

Solution 2: I am a DB nerd and I can do that!!
Now if you are the creature of my kind continue reading else its time to say good bye my friend.

There was nothing wrong in the LIKE query we executed before, though I personally recommend concatenating all the columns on which you want to apply LIKE and make an alias for the same and apply LIKE on that alias.

Makes code clean and easy to understand I believe. Might improve performance as well not sure of that though.

Typical LIKE query:
select * 
from student
where Firstname LIKE '%sandeep%' || Secondname LIKE '%sandeep%' || Descriptionaboutstudent LIKE '%sandeep%';

My LIKE query:
select * 
from student t
where concat(t.Firstname,t.Secondname,t.Descriptionaboutstudent) LIKE '%sandeep%';

Now lets get rid of ordering :)
Our LIKE Query is perfect just that it does not know the relevance/priorities of columns. So let's teach it the same.

select t.*, CASE WHEN 
                            t.Firstname LIKE '%sandeep%' then 5 WHEN 
                            t.Secondname LIKE '%sandeep%' then 2 WHEN 
                            t.Descriptionaboutstudent LIKE '%sandeep%'then 0 END as priority 
from student t 
where concat(t.Firstname,t.Secondname,t.Descriptionaboutstudent) LIKE '%sandeep%' 
order by (priority) DESC; 

Thats it!!!!. Let's run the query.















What I am doing here is quite straight forward. Am creating an alias called priority and assigning it value based on which column the text '%sandeep%' was found.
Egg: If 'sandeep' is found in Firstname column then assign the value 5 to priority.
        If 'sandeep' is found in Secondname column then assign value 2 to priority.
        If 'sandeep' is found in Descriptionaboutstudent column then assign the value 0 to priority.

Once you are done with assigning value to each row, all we need to do is to sort the search results based on this priority value obviously in DESCENDING order.

I wrote this query with my very limited knowledge in MySql and there are lots of chance that the query might not be the best way to achieve the goal. Remember? Am still a beginner. 

If you have suggestion to improve it, or you appreciate it or you think the entire approach itself is wrong, please feel free to contact me. You know how to contact me? Don't you??? :)

No comments:

Post a Comment