Thibauld - Imagination and Execution -

17Jul/091

PHP function to draw nice looking XY plot charts with google chart API

RT @thibauld PHP function to draw nice looking XY plot charts with google chart API

Creating nice charts used to be a hard task. Lately, I dug a little bit into Google chart API to see what they had to offer and how they could help me draw nice looking charts easily.

In theory, using Google chart is dead simple: just build an URL with the right parameters and Google will return you an image with your nice looking chart. However, in practice, it turned out a little bit more tricky. If you only want to draw a pie charts then it is pretty straight forward. But if you want to build an XY plot chart, then it gets harder!

Indeed, you should not use Google chart API like you'd use a spreadsheet. Keep in mind that Google chart API is about drawing an image: the X axis and the Y axis are treated completely independantly of your points coordinates.  If you do not pay enough attention, you'll end up with a chart with a completely wrong scale.

I tried to find some PHP libraries to help me draw my XY chart but found nothing satisfactory, so I finally decided to write my own little PHP function. Let me share it with you:

function getChart($chart) {
$y_max = $chart['y_max'];
$y_min = $chart['y_min'];
$y_step = $chart['y_step'];
$y_grid_step = $chart['y_grid_step'];
$x_max = $chart['x_max'];
$x_min = $chart['x_min'];
$chart_size = $chart['chart_size'];
$title = $chart['title'];
$x = $chart['x_axis'];
$x_nb = count($x);
$x_int = 100/$x_nb;
$first = 0;
if ($chart['x_labels_centered']) $first = $x_int/2;
for($i=0;$i<=$x_nb-1;$i++) { $positions[] = floor($first + ($i * $x_int)); }
foreach($positions as $n => $pos) { $x_values[$pos] = $x[$n]; }
$x_range = implode(',',array_keys($x_values));
$x_labels = implode('|',$x);
$grid_step = $y_grid_step*100/$y_max;
$url = "http://chart.apis.google.com/chart?";
$cht = "cht=lc";
if (!empty($chart['data_y'])) $cht="cht=lxy";
$chd = "chd=t:".implode(',',$chart['data']);
if (!empty($chart['data_y'])) $chd.="|".implode(',',$chart['data_y']);
$chg = "chg=$x_int,$grid_step,1,5";
$chxt = "chxt=x,y";
$tmp = array();
for ($i=0;$i<=$y_max;$i+=$y_step) { $tmp[] = $i; }
$y_labels = implode(',',$tmp);
$chxp = "chxp=0,".$x_range."|1,".$y_labels;
$y_labels = implode('|',$tmp);
$chxl = "chxl=0:|".$x_labels."|1:|".$y_labels;
$chxr = "chxr=1,$y_min,$y_max,$y_step";
$chtt="chtt=".str_replace(' ','+',$title);
$chs = "chs=".$chart_size;
$chds = "chds=$y_min,$y_max";
$chm="chco=0000FFFF&chm=B,76A4FB,0,0,0";#|s,0000FF,0,-1,10";
if (!empty($chart['data_y'])) $chds ="chds=$x_min,$x_max,$y_min,$y_max";
$url .= implode('&', array($cht,$chd,$chg,$chm,$chxt,$chxl,$chxp,$chxr,$chtt,$chs,$chds));
return $url;
}

Dont' get me wrong: this is really a quick and dirty function and not meant to be beautiful code. Here is an invokation example:

$data_x = array(0,4,5,9,10);
$data_y = array(20,5,7,9,10);
$months_x = array('Jan','Fev','Mar','Avr','May');
$chart_xy = array(
'title'=>"Chart Title", // CHART TITLE
'data'=>$data_x, // CHART DATA (X)
'data_y'=>$data_y, // CHART DATA (Y)
'x_axis'=>$months_x, // X AXIS LABELS LIST
'x_labels_centered'=>true, // SHOULD LABELS ON X AXIS BE CENTERED?
'y_min'=>0, // MIN VALUE OF Y AXIS
'y_max'=>25, // MAX VALUE OF Y AXIS
'y_step'=>5, // Y AXIS INTERVAL (NUMBERS ON THE AXIS)
'y_grid_step'=>5, // Y AXIS GRID INTERVAL (HORIZONTAL LINES ON THE GRID)
'x_min'=>0, // MIN VALUE OF X AXIS
'x_max'=>10, // MAX VALUE OF X AXIS
'chart_size'=>'300x300' // CHART DIMENSIONS
);
$url = getChart($chart_xy);

This code would result in the following graph:

And if I change :

$months_x = array('Jan','Fev','Mar','Avr','May');

by

$months_x = array('Jan','Fev','Mar');

I get the following graph:

What is important to note here is that it is up to you to make your X and Y axis consistent with your data, ortherwise, you'll end up with a totally meaningless chart.

Of course, this code is adapted to my needs so please feel free to copy this PHP function and adapt it so that it fits your particular needs!

Hope it will be useful for some of you...

25Jun/092

Interviewed about entrepreneurship on “Décideurs TV” [french web TV]

RT @thibauld Interviewed about entrepreneurship on “Décideurs TV” [french web TV]

Hi there! Just a quick post to let you know that I've been interviewed by Décideurs TV (a business focused web television) about entrepreneurship: Should you start your business alone or find a partner / cofounder? How to find the right partner for your business. What is a good partner? etc... The result can be seen here or directly below:


Start up par DECIDEURS TV

It was my first video interview ever and I must say that I'm pretty happy with the result :)

13Jun/092

Alain Storck Conference at the CNAM

RT @thibauld Alain Storck Conference at the CNAM

Last week, we invited Alain Storck, Chairman of the INSA de Lyon engineering school (the great school I graduated from!), to talk about the past and future of the school. The event took place in the great auditorium of "Conservatoire des Arts et Métiers" (CNAM) in Paris and finished with a "dinner cocktail" in the "Salons d'Honneur" of the CNAM. The conference was great, Alain Storck is a really bright and punchy guy and I'm glad he is the Chairman of our school as he will, for sure, keep the school ranked among the top engineering school in France ! I was only a bit disappointed by the relatively low number of people who attended the conference.

Here are a few photos that were taken during the conference :

Alain Storck
Alain Storck during his speech


Amphi Abbé Grégoire - CNAM
The audience in the Abbé Grégoire auditorium


Alain Storck
Alain Storck with Arnaud and Daniel at the cocktail



More info on the conference can be found here :
Conférence avec Alain Storck organisée par l'association des ingénieurs INSA d'île de France

2May/091

upgrade from Hardy to Jaunty: what a difference on my dell M2300!

RT @thibauld upgrade from Hardy to Jaunty: what a difference on my dell M2300!

Last week I upgraded (actually dist-upgraded) my laptop from Hardy Heron (ubuntu 8.04) to Jaunty Jackalope (ubuntu 9.04) and all I can say is "Wow" (TM).

I used to blame my laptop for its poor performance but this upgrade made me realize that the problem was coming from Ubuntu Hardy... hmm.. too bad for a LTS (Long Term Support) distribution... My laptop was heating so much that he used to freeze if not elevated 1 or 2 cm above the table! I'm not even talking about battery life...

I tried everything, included BIOS upgrade but nothing solved the problem. Also I had a lot of problem with flash on my firefox, it used to make firefox eat 100% cpu after a while. I was tired of this situation that I even got my old laptop (the best laptop ever) back to life and I was planning to use it as my day-to-day laptop again.

An upgrade to jaunty later (and 1.5Go new packages installed), I can finally feel the power of my Core 2 duo T8300 (2.4Ghz) with 4Go RAM! Even with the most impressive 3D options activated, it still flies and battery life has returned to an "acceptable" level (a little bit above 3h). I can now test some cool 3D games (did you try armagetron?) which were forbidden to me on my old system.

Once the upgrade completed, I was curious about which new applications got included in the ubuntu 9.04 repository (compared to the 8.10 repository) so we put a dedicated page on allmyapps . If you're interested, you can browse the list of the new applications included in ubuntu 9.04 here.

By the way, the dist-upgrade process always amazes me! It so impressive watching an entire system updating itself without a glitch (ok I had to use 2 dpkg -i --force-overwrite but still... ). So congratulations to the developers!

20Apr/091

Linux distributions explained

RT @thibauld Linux distributions explained

While cleaning my disk, I found a presentation about Linux distributions and the challenges they are facing which I did about 2 years ago. It is pretty unique since I gave this presentation only once during a 1 day training session I gave to Canonical OEM sales team. This presentation is particularly interesting as it graphically explains what is the work of Linux distributions and their challenges. Being an Open Source enthusiast and former Mandriva (Mandrakesoft) employee, this is a subject I know pretty well so I thought I'd share it with you. Hope you'll find it useful!

15Apr/090

Entrepreneurs Club Paris – EM Lyon, 8 avril 2009

RT @thibauld Entrepreneurs Club Paris – EM Lyon, 8 avril 2009

On April 8, 2009, my friend Arnaud and I organized an entrepreneurs meeting at the Bedford hôtel in Paris. This event is organized to help new companies ventured by EM Lyon alumni get some visibility (for our english friends, EM Lyon Management School is Lyon's business school). I though I would share with you the pitch (in french sorry!) of the 5 companies which were presented:

La Compagnie du Bain (by Denis Virot): La Compagnie du Bain est l'inventeur du vêtement de Bain isotherme seconde peau et propose des solutions de prêt-à-porter pour une entrée facile dans l'eau, confortable et conviviale.

La Compagnie du Bain

Yoolink Pro (by Sébastien Blanc): YoolinkPro est une nouvelle façon de gérer et de capitaliser sur l'information et les connaissances en entreprise. C'est une nouvelle génération d'Intranet qui permet de diffuser l'actualité au sein des équipes, de construire une base de connaissance pérenne et unique et de capitaliser sur l'expertise de ses collaborateurs.

Yoolink Pro

New Boutique Hostel (by Laurent Nissan): la première « auberge de jeunesse » haut de gamme à Paris. L’ambition de New Boutique Hostel est de créer la première chaîne de boutique hostels à Paris autour d’un concept neuf, sur un marché de l’hébergement économique pour jeunes en net déficit d’offres.

New Boutique Hostel

Insiteo (by Pierre Varrod): INSITEO develops an indoor GPS system. Avec INSITEO, le GPS pénètre les grands bâtiments: 3 milliards de voyageurs aériens stressés seront enfin guidés jusqu'à leur porte d'embarquement, les visiteurs du Louvre trouveront la Joconde...

Insiteo

Toofoot (by Gilles Wagnon): Toofoot est la première communauté du football amateur. Toofoot est la plate-forme qui se cache derrière Monfoot de francefootball.fr.

Toofoot

12Apr/091

Web application implementation: a bad example

RT @thibauld Web application implementation: a bad example

I recently went to a website which is a perfect example of what *not* to do if you want your web app to feel fast and responsive from a user perspective, especially when it is the first this user comes to your site! Indeed, the first time you go to a website, your web browser will have to download everything: html code, css stylesheets, javascript files, images... and it takes time so you'd better limit the number of files to download if you want the user first experience with your web app to be user friendly.

This is clearly not what moblin solution zone is doing! When you first go to moblin solution zone, you have to wait 25 seconds (!!) before anything displays on your web browser... and what is funny is that there is finally not much to see :)

If you fire up your brave little firebug and take a look at what is happening behind the scene, you quickly realize that your web browser has to make 40 (fourty) requests before displaying the full page!

file requested to display the page

Even though each request is relatively small, at the end you wait 25 seconds only to see this:

moblin solution zone

In conclusion, if you want your users to have a good experience when they first visit your website, you'd better minimize the number of requests needed by grouping js files, css files and using CSS sprites. CSS sprites are a technique which consists in grouping all images like icons into 1 unique image and then use css to only display the image needed. This is really important for the user first visit to your web app since when he comes back, his web browser will have cached a lot of the needed static files which will vastly improve the response time! Provided you set your apache default_expire setting correctly of course...

4Feb/090

FOSDEM this weekend!

RT @thibauld FOSDEM this weekend!

In case you don't know, FOSDEM 09 is taking place this weekend (7/8 february) in Brussels. It's a great place to meet open source developers and drink a bunch of good beers! I'll be there with my friend Arnaud starting from friday night... See you there!

I'm going to FOSDEM, the Free and Open Source Software Developers' European Meeting

26Jan/091

Introducing allmyapps!

RT @thibauld Introducing allmyapps!

If you follow this blog, you probably know that, last august, I accepted to help develop freelancebusinessclub's new website. Three months later, on November 3rd, the new website was launched! Since then, after having squashed the small little bugs left, I started working on another project, my project. It's something that I've been thinking about a looooong time ago, it took me several years (really!) to mature it and, having completely recovered my technical skills, I thought it was the right time to make it become a reality.

Feeling that the market has reached a point where my project finally makes sense, I'm proud to introduce you to my new baby : allmyapps, the first linux app store designed to help users find and install new applications on their Ubuntu Linux system.

To fully understand this project, you need to know that I'm a long time Linux user and evangelist. I started using Linux in 1998 (with a slackware) while still in high school and I've always been convinced that Linux would find its way to the desktop sooner or later. There are 2 factors in particular that make me think that this time has come:

  • Canonical and Mark Shuttleworth's vision is, by far, the most important point. In my opinion, the biggest reason of the Linux Desktop relative failure til now is lack of willingness from the major Linux players. Let's face it... who, today, really works to make the Linux Desktop a reality? Red Hat? They purely and simply abandonned this market. Novell? They only considers the enterprise desktop... The only company really working towards making the Linux Desktop a reality is Canonical, Mark Shuttleworth (its founder) being committed to put the Linux Desktop into everybody's hands. And it's not only words! In the past months, Canonical has hired usability experts, desktop experience engineers... and the company is really working, both technically and commercially, to make the Linux Desktop happen. Unlike everybody else, they're not waiting for the market to move but are working to make it move!
  • The netbook opportunity. Did you see the laptop market last year? People are fond of netbooks: 14 millions of netbooks were sold in 2008, with around 30% of them preinstalled with Linux! For the first time, normal people (unlike you and me) are using Linux and loving it. Do not trust the rumor that says that the return rate is higher with linux netbooks as it is simply not true. Linux price and versatility is a strong asset when it comes to run machines with low spec and sell them at a low price. Netbooks are a great market opportunity for the Linux desktop! For the first time, "normal users" are buying machines with a Linux pre-installed on it.

With the rise of the netbooks, more and more users today are confronted to a linux desktop. While using the system in itself is not a problem for most users, I've experienced that people tend to have a problem when it comes to finding new applications to install on their system. On the contraty to Windows systems, Linux systems historically have a very easy, quick and reliable installation mechanism, thanks to their centralized repositories and their associated package management systems. The only drawback of this mechanism is that it is different from the one users are used to on Windows... and it's a big drawback because users are getting lost... and lost users quickly turn into angry users! And the last thing you want is your users becoming angry! With allmyapps, we want to address this very specific issue.

I will probably not be blogging much about allmyapps on this blog. If you want to follow the development of allmyapps, I advise you to subscribe to allmyapps' blog. I will continue to write posts on my personal blog (this blog), I already have a few posts waiting about web application debugging / profiling and web application performance tuning... so stay tuned!

To conclude this post, I just wanted to say that we will be attending Recked in Amsterdam on monday 26th of january (today!) and then FOSDEM in Brussels on february 6th and 7th. If you'll be there too and want to meet, just leave me a comment!

4Jan/096

Web application step 5: implement a “relevant” search engine

RT @thibauld Web application step 5: implement a “relevant” search engine

For this very first post of 2009, let me first wish you all the best for the coming year! I worked *a lot* in 2008 and 2009 will be the year where all this underground work will finally make sense so I'm really looking forward to it... and I hope you do too! Let's make 2009 rock :)

Let's now go back to the subject of this post! Search is a very important feature for most web applications (read this book in case you need to be convinced) but, unfortunately, it is also one of the most overlooked feature. Why ? Probably because implementing a good search is a difficult thing: Placing a search form at the top right of your web app is not difficult, having it display some results when a user searches for something is not a problem either... but displaying search results which are actually relevant and useful for the end user is much harder!

Implementing a relevant search engine for your web app is a task that cannot be delegated to a third party web service or to an external library, you have to do it by yourself for a simple reason: a search can only be relevant if it leverages the information contained in your business objects with intelligence. This is why technologies like the new Google Custom Search will not help you to implement a relevant search engine. Google Custom Search is ok for your blog or any other editorial website... but what if you want, for example, your users to be able to look for the freelancers that have the longest experience with SAP FI CO ?

Implementing a good search begins with pen and paper:

  • On which business objects can the end user perform a search ?
  • And for each business objects:
    • How to rank results by relevancy for any given search ?
    • How should I weight each information ?
    • What information do I need to rank my results ?
    • Is this information already present in my database ? Where ?
    • Which filters would be relevant ?
    • What are the relevant way to sort the search results ?

Once you've thought about all this, you did the most important part... but not the most difficult. Now let's implement it!

The main problem you'll be facing when implementing a relevant search engine is performance. Being relevant means processing a lot of information, which is very time consuming, even for today's fast processors and hard drives. This is why you should let the database handle most of the searching work, using well constructed indexes and queries, and avoid as much as possible doing work outside the database.

Use full-text search

Obviously, using the LIKE sql operator (or any other regexp matching...) will not help much in implementing a good search. It is way too slow because LIKE queries cannot use table indexes and requires a full sequential scan of your table for each query. The only exception to this is when you know the first characters you're looking for (i.e. WHERE name LIKE 'query%'), in this case, the database will leverage the btree index on column name (if available) and you're likely to get acceptable performance with very small and targeted queries.

In all other cases, you will have to use fulltext search mecanisms. Fulltext search refers to special type of indexes which are specialized in string pattern matching. Fulltext search has been implemented completely differently in MySQL and PostgreSQL. I will now give you an overview on how to use fulltext search with both databases.

Full-text search with MySQL

As I've said in a previous post, MySQL's fulltext search mechanism focuses on ease-of-use. Indeed, with MySQL you just have to create a 'fulltext' index on the relevant columns to enable fulltext search. Example:

CREATE FULLTEXT INDEX description ON experiences (description);

It creates a fulltext index named description on the column description in the table experiences. To leverage this new index, you need to write your queries using the following syntax:

SELECT id, MATCH(description) AGAINST('') AS score
FROM experiences
WHERE MATCH(description) AGAINST('')
ORDER BY score DESC
OFFSET 0 LIMIT 10;

Using the fulltext index, MySQL will quickly determine the best matching descriptions for the input query. For each row, MySQL returns its relevancy score. You can influence the calculation of this score by using parameters in the AGAINST() operator.

As you can see, it is really easy to implement fulltext search in MySQL... maybe too easy as there are common search related issues that are not addressed. For example, a very classical requirement is perform a search on multiple columns of a table (i.e. title,summary and description) and with different weights assigned to each column.. Now how do I implement that in MySQL? Well... I'm sorry to announce that I did not figure out a beautiful way to address the issue. On Freelance Business Club, we solved the issue by constructing queries like the following one:

SELECT e.id, e.title, e.desc, SUM(s.score) AS score
FROM experiences AS e
INNER JOIN (
SELECT id, <title_weight>+MATCH(title) AGAINST (query' IN BOOLEAN MODE) AS score
FROM experiences
WHERE MATCH(title) AGAINST ('query' IN BOOLEAN MODE)
UNION ALL
SELECT id, <summary_weight>+MATCH(summary) AGAINST('query' IN BOOLEAN MODE) AS score
FROM experiences
WHERE MATCH(summary) AGAINST('query' IN BOOLEAN MODE)
UNION ALL
SELECT id, <description_weight>+MATCH(description) AGAINST('query' IN BOOLEAN MODE) AS score
FROM experiences
WHERE MATCH(description) AGAINST('query' IN BOOLEAN MODE)
) AS s ON a.id=s.id
ORDER BY score DESC
OFFSET 0 LIMIT 10;

Even if I'm not particularly proud of this query, it gives us the flexibility we were looking for (searching different columns with different weight for each) and offers at the same time rather good performance. I was really hoping MySQL would offer a standard way to perform this kind of search but I was unable to found an 'official' solution. I would be really interested if someone here had a more elegant / performant solution, so please let me know!

It is important to note that you cannot use fulltext indexes in MySQL on InnoDB tables, you can only use it with MyISAM tables. For more information, please check MySQL documentation for full-text search.

Full-text search with PostgreSQL

Fulltext search support has been integrated in PostgreSQL official distribution since v8.3. (before this version, it was distributed as a plugin). I personally think that PostgreSQL implementation of fulltext search is way better than the MySQL one. It may be a little harder to apprehend at first sight but it is really worth the effort!

Like MySQL, PostgreSQL provides a special index (called GIN) optimized for fulltext searching. But to leverage the real power of this index, you will need to store the text to be searched in a special column of type 'tsvector'. An example will make it easier to understand:

CREATE TABLE experiences_desc (
id_exp integer NOT NULL REFERENCES experiences(id) ON DELETE CASCADE,
locale regconfig NOT NULL DEFAULT 'english',
title text NOT NULL,
summary text NOT NULL DEFAULT '',
description text NOT NULL DEFAULT '',
search_field tsvector
);
INSERT INTO experiences (id_exp,title,summary,description,search_field) VALUES (
1,
'title',
'summary',
'description',
setweight(to_tsvector('english','title'),'A') ||
setweight(to_tsvector('english','summary'),'B') ||
setweight(to_tsvector('english','description'),'C')
);
CREATE INDEX experiences_idx ON experiences USING gin(search_field);

What is important to note here is the construction of the tsvector search_field ('||' is the string concatenation operator):

  • A tsvector tokenizes your string according to the given locale. This is why you need to pass it the locale in which your string is written. The function to_tsvector() uses a dictionary in the given locale to tokenize the input string. This way, it tokenizes intelligently the string (by detecting plurals, verbs...) and prevents common words from being tokenized. It means that the end user will not have to enter the exact words to have your search engine return relevant results.
  • You can assign different weights to the different parts of your tsvector. PostgreSQL defines 4 weights (1 >= A > B > C > D >= 0) that you can assign to a tsvector and that will help PostgreSQL rank your search results automatically and also filter your search by limiting your search to a certain weight category. Of course, you can change the default weight values assign to each letter.

Now to perform a search, you can use the following:

SELECT title, summary, description, ts_rank_cd(search_field, to_tsquery('english','search query')) AS score
FROM experiences_desc
WHERE to_tsquery('english','search query') @@ search_field
ORDER BY score DESC
OFFSET 0 LIMIT 10;

Important things to note are:

  • you need to use to_tsquery() to prepare your search query and the operator @@ to make it search using the specific GIN index.
  • ranking is done automatically by PostgreSQL thanks to the ts_rank_cd() operator. Several other ranking algorithms are available (using different functions / parameters )

This is really just a quick overview of the tools PostgreSQL provides to perform fulltext search. You can also ask PostgreSQL to highlight the matching patterns in the search results, gather documents statistics... it is really flexible and powerful! To find out more on these features, here's a link to PostgreSQL 8.3 Full Text Search documentation. Needless to say that I find PostgreSQL to be much more powerful than MySQL on the fulltext search aspect.

Use SQL jointures with caution

In order to keep performance acceptable, you should also try to keep the number of SQL jointures in the search query as low as possible. Also, don't forget to perform your jointures after you have found your results and not before. To illustrate this, consider the 2 following queries, the first one:

SELECT e.id, e.title, e.desc, SUM(s.score) AS score
FROM experiences AS e
INNER JOIN (
SELECT id, MATCH(title) AGAINST (query' IN BOOLEAN MODE) AS score
FROM experiences
WHERE MATCH(title) AGAINST ('query' IN BOOLEAN MODE)
) AS s ON a.id=s.id
INNER JOIN some_other_table AS sot ON (sot.id=s.id)
ORDER BY score DESC
OFFSET 0 LIMIT 10;

and the second one:

SELECT e.id, e.title, e.desc, SUM(s.score) AS score
FROM experiences AS e
INNER JOIN (
SELECT id, MATCH(title) AGAINST ('query' IN BOOLEAN MODE) AS score
FROM experiences
WHERE MATCH(title) AGAINST ('query' IN BOOLEAN MODE)
ORDER BY score DESC
OFFSET 0 LIMIT 10;
) AS s ON a.id=s.id
INNER JOIN some_other_table AS sot ON (sot.id=s.id)

The 2 queries give the exact same results but the second one will take only a tiny fraction of the time required by the second one to perform the search. This may seem obvious here as the query is small, but this is the kind of stupid errors that can slip easily in bigger queries (especially if queries are dynamically built).

Dedicated search tables

If you still cannot reach decent performance with all the above, you might have too many jointures in your query and should consider building a dedicated table to perform your searches. This method might also fit your needs in case you want full text search on an InnoDB table. The problem with these dedicated search tables is that they are never completely up-to-date. So it's up to you to know, given your business requirements, if you can use this method or not. We use this method in Freelance Business Club, this is why it may take up to 1 day for new members to appear in members search results.

Limit computation in your search queries

To achieve good performance, you should also limit as much as possible the use of user defined functions or calculus in your queries. Example: If you want to search the most experienced freelancers on SAP, then the total number of months a guy spent working on SAP is a relevant indicator. The problem is that you only have a start date and an end date at your disposal. In this case, instead of calculating the total number of month directly in your query, you'd better use a batch process and set up a cron job which would pre-calculate the total number of month for each experience every <put_a_frequence_here>. This way, it will not slow down every one of your search queries!

Filter and sort your search

It is very convenient for the end user to be able to filter and sort the search results. You should think about this possibility when building your search engine. Sorting is quite easy but beware of filtering. In the best case, you will just have to add another constraint in the WHERE clause or in an existing jointure but some filters might require you to add another jointure to your search which might slow it down.

Calculate the total number of results for your search

Finally, in order to paginate your search results well, you probably need the total number of results for a given search query. To get this value, I know no other method than performing a new SQL search query (without any ORDER BY / OFFSET / LIMIT and without any useless jointure) with a brutal COUNT(*) in the SELECT statement. If anybody has another / better method, please let me know!

To avoid this additional query (COUNT(*) can be a very time-consuming query!), some sites just don't give the total of number of results. See Facebook for example, they don't give the exact number of results and their pagination displays '1 2 3 next'. I guess this is one the reasons (but this is just pure speculation...).

Search implementation methodology

I wanted to finish this long post with the method I use to implement search. In my presentation layer, I build a query array which looks like this :
array(
'text'=>'<search_query>',
'filter'=>array(
'filter_type1'=>'<filter_value1',
'filter_type2'=>'<filter_value2>'
),
'order_by'=>'<order_by>',
'limit'=> <limit>,
'start'=> <start>
);

This query structure is the same no matter what type of search is performed, this is very flexible and facilitate code reuse enormously.
Then, in the business layer, I have a method <object_type>_search() which:

  • transform the array query values into values understandable by the data layer (when necessary).
  • calls <object_type>_dbSearch()
  • computes the search results to return a result array which looks like:
    array(
    'results'=> <search_results>,
    'nb_pages'=> <nb_pages>,
    'nb_results'=> <nb_results>
    );

Finally, in the data layer, I have a method  <object_type>_dbSearch() which calls a method <object_type>_buildSearchQuery() twice : the first time to build the search query and the second time to build the search count query (the one that counts the total number of results for the search query). It returns the raw search results.

This was definitely a long post but I hope nevertheless that it will be intelligible and useful for you!

See you next post!