Thibauld - Imagination and Execution -

15Feb/100

How to exclude multiple directories while creating an archive with tar

RT @thibauld How to exclude multiple directories while creating an archive with tar

A very quick post again as I've just spent way too much time to find out how to use the --exclude option of tar. All I wanted is tar to omit a few subdirectories while creating an archive of a directory. I was surprised to see how much tar is picky about his --exclude option: if you don't use the exact syntax, it won't work and, unfortunately, the exact syntax is not so easy to figure out from the man page.

So here is the exact syntax you should use if you want to exclude multiples directories with tar:

tar cvfz myproject.tgz --exclude='path/dir_to_exclude1' --exclude='path/dir_to_exclude2' myproject

Hope it will save you some time!

5Feb/100

Configuring iptables to allow internet surfing while blocking all unsolicited incoming connexions

RT @thibauld Configuring iptables to allow internet surfing while blocking all unsolicited incoming connexions

I'm so used to connect to the Internet through network masquerading (NAT) that I was really surprised today when I realised that my laptop was actually receiving a lot of unsollicited connexions attempts from random external machines. Then I remembered that, by default, a freebox gives you a public ip !

It could not have been an issue if I was not doing web development on my laptop using a local (badly configured) webserver which happened to be worldwide accessible... oOops :)

A few iptables commands later, everything was secured :

iptables -P INPUT DROP
iptables -P FORWARD DROP
iptables -P OUTPUT ACCEPT
iptables -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
iptables -A INPUT -s 127.0.0.1 -j ACCEPT

The above lines configure iptables (the firewall) to drop every incoming connexions except the ones corresponding to a connexion I established with a remote server. Indeed, the server you're connecting to needs to be able to send information back to your machine in order for you to get it (blocking all incoming traffic is the same as unplugging the network connection).

I thought it might be useful to some of you too...

31Dec/090

Google real time search results

RT @thibauld Google real time search results

I've just noticed beautiful live search results integrated into google search results tonight. I had never seen them before so I thought I'd share this news-to-me with you :) Here's what I saw when I looked for "2010 resolutions" :

google_live_search_results

And then, a few seconds later, without refreshing anything on my side :

google_live_search_results1

I love it :)
Now let's go party for 2010!
Happy new year everybody!

17Jul/091

PHP function to draw nice looking XY plot charts with google chart API

RT @thibauld PHP function to draw nice looking XY plot charts with google chart API

Creating nice charts used to be a hard task. Lately, I dug a little bit into Google chart API to see what they had to offer and how they could help me draw nice looking charts easily.

In theory, using Google chart is dead simple: just build an URL with the right parameters and Google will return you an image with your nice looking chart. However, in practice, it turned out a little bit more tricky. If you only want to draw a pie charts then it is pretty straight forward. But if you want to build an XY plot chart, then it gets harder!

Indeed, you should not use Google chart API like you'd use a spreadsheet. Keep in mind that Google chart API is about drawing an image: the X axis and the Y axis are treated completely independantly of your points coordinates.  If you do not pay enough attention, you'll end up with a chart with a completely wrong scale.

I tried to find some PHP libraries to help me draw my XY chart but found nothing satisfactory, so I finally decided to write my own little PHP function. Let me share it with you:

function getChart($chart) {
$y_max = $chart['y_max'];
$y_min = $chart['y_min'];
$y_step = $chart['y_step'];
$y_grid_step = $chart['y_grid_step'];
$x_max = $chart['x_max'];
$x_min = $chart['x_min'];
$chart_size = $chart['chart_size'];
$title = $chart['title'];
$x = $chart['x_axis'];
$x_nb = count($x);
$x_int = 100/$x_nb;
$first = 0;
if ($chart['x_labels_centered']) $first = $x_int/2;
for($i=0;$i<=$x_nb-1;$i++) { $positions[] = floor($first + ($i * $x_int)); }
foreach($positions as $n => $pos) { $x_values[$pos] = $x[$n]; }
$x_range = implode(',',array_keys($x_values));
$x_labels = implode('|',$x);
$grid_step = $y_grid_step*100/$y_max;
$url = "http://chart.apis.google.com/chart?";
$cht = "cht=lc";
if (!empty($chart['data_y'])) $cht="cht=lxy";
$chd = "chd=t:".implode(',',$chart['data']);
if (!empty($chart['data_y'])) $chd.="|".implode(',',$chart['data_y']);
$chg = "chg=$x_int,$grid_step,1,5";
$chxt = "chxt=x,y";
$tmp = array();
for ($i=0;$i<=$y_max;$i+=$y_step) { $tmp[] = $i; }
$y_labels = implode(',',$tmp);
$chxp = "chxp=0,".$x_range."|1,".$y_labels;
$y_labels = implode('|',$tmp);
$chxl = "chxl=0:|".$x_labels."|1:|".$y_labels;
$chxr = "chxr=1,$y_min,$y_max,$y_step";
$chtt="chtt=".str_replace(' ','+',$title);
$chs = "chs=".$chart_size;
$chds = "chds=$y_min,$y_max";
$chm="chco=0000FFFF&chm=B,76A4FB,0,0,0";#|s,0000FF,0,-1,10";
if (!empty($chart['data_y'])) $chds ="chds=$x_min,$x_max,$y_min,$y_max";
$url .= implode('&', array($cht,$chd,$chg,$chm,$chxt,$chxl,$chxp,$chxr,$chtt,$chs,$chds));
return $url;
}

Dont' get me wrong: this is really a quick and dirty function and not meant to be beautiful code. Here is an invokation example:

$data_x = array(0,4,5,9,10);
$data_y = array(20,5,7,9,10);
$months_x = array('Jan','Fev','Mar','Avr','May');
$chart_xy = array(
'title'=>"Chart Title", // CHART TITLE
'data'=>$data_x, // CHART DATA (X)
'data_y'=>$data_y, // CHART DATA (Y)
'x_axis'=>$months_x, // X AXIS LABELS LIST
'x_labels_centered'=>true, // SHOULD LABELS ON X AXIS BE CENTERED?
'y_min'=>0, // MIN VALUE OF Y AXIS
'y_max'=>25, // MAX VALUE OF Y AXIS
'y_step'=>5, // Y AXIS INTERVAL (NUMBERS ON THE AXIS)
'y_grid_step'=>5, // Y AXIS GRID INTERVAL (HORIZONTAL LINES ON THE GRID)
'x_min'=>0, // MIN VALUE OF X AXIS
'x_max'=>10, // MAX VALUE OF X AXIS
'chart_size'=>'300x300' // CHART DIMENSIONS
);
$url = getChart($chart_xy);

This code would result in the following graph:

And if I change :

$months_x = array('Jan','Fev','Mar','Avr','May');

by

$months_x = array('Jan','Fev','Mar');

I get the following graph:

What is important to note here is that it is up to you to make your X and Y axis consistent with your data, ortherwise, you'll end up with a totally meaningless chart.

Of course, this code is adapted to my needs so please feel free to copy this PHP function and adapt it so that it fits your particular needs!

Hope it will be useful for some of you...

12Apr/091

Web application implementation: a bad example

RT @thibauld Web application implementation: a bad example

I recently went to a website which is a perfect example of what *not* to do if you want your web app to feel fast and responsive from a user perspective, especially when it is the first this user comes to your site! Indeed, the first time you go to a website, your web browser will have to download everything: html code, css stylesheets, javascript files, images... and it takes time so you'd better limit the number of files to download if you want the user first experience with your web app to be user friendly.

This is clearly not what moblin solution zone is doing! When you first go to moblin solution zone, you have to wait 25 seconds (!!) before anything displays on your web browser... and what is funny is that there is finally not much to see :)

If you fire up your brave little firebug and take a look at what is happening behind the scene, you quickly realize that your web browser has to make 40 (fourty) requests before displaying the full page!

file requested to display the page

Even though each request is relatively small, at the end you wait 25 seconds only to see this:

moblin solution zone

In conclusion, if you want your users to have a good experience when they first visit your website, you'd better minimize the number of requests needed by grouping js files, css files and using CSS sprites. CSS sprites are a technique which consists in grouping all images like icons into 1 unique image and then use css to only display the image needed. This is really important for the user first visit to your web app since when he comes back, his web browser will have cached a lot of the needed static files which will vastly improve the response time! Provided you set your apache default_expire setting correctly of course...

24Feb/091

How to recursively rename directories using a regexp

RT @thibauld How to recursively rename directories using a regexp

Just a very quick post because I just figured out a command to recursively rename directories. As it is the kind of useful commands you don't want to loose and as it might be of interest for others, I thought I would share it here. So here is the command:
find -type d -name '*-test' | while read A; do OLD=$(basename $A); NEW=$(echo $OLD | sed s/-test//); mv $A $(dirname $A)/$NEW; done;

In this example, the command recursively finds all directories named <anything>-test and renames them <anything> (removing the trailing '-test'). I hope it will be useful for some of you...

Tagged as: , , , , 1 Comment
22Feb/090

Web application implementation step 6: make it fast!

RT @thibauld Web application implementation step 6: make it fast!

To continue in the "how to implement great web apps" series, here the step 6: make it fast! This is a vast subject and I'll not try to be exhaustive, I'd rather make a list of checkpoints with clues on how to improve each point. Making its web app fast is key for user adoption, as there is nothing more frustating than a slow web application.

Just to be clear, when I say "fast", I mean "fast from the user perspective" because what's really important in the end is how responsive your app will feel from the user perspective. To illustrate this point, let's take the example of a web page that requires 5 secondes to load, it could be acceptable if the web page becomes usable after 1 second (from the user perspective). What would be unacceptable is a page that would require 5 seconds to be finally usable by the end user. Now what should you do to make your app fast?

Slowness can come either from :

  1. the backend: the server takes too much time to process the request
  2. the network: transporting the response from the server to the client (the user web browser) takes too much time
  3. the rendering: actually displaying the response received from the server takes too much time

Today we'll tackle the backend part:

Profile your application. It is necessary to measure how fast is your app and where lie the problems if it is too slow. If you're developing on a LAMP stack, there is an awesome tool for that called Xdebug. Once you have installed xdebug, all you need to do is to enable it in /etc/php5/apache2/conf.d/xdebug.ini, here's my configuration:
zend_extension=/usr/lib/php5/20060613+lfs/xdebug.so
xdebug.profiler_output_dir = "/tmp/"
xdebug.profiler_enable = Off
xdebug.profiler_enable_trigger = 1

The output dir must be writable by apache and I encourage you to enable it only via trigger. Now if you want to profile a screen of your web application, just append the argument XDEBUG_PROFILE=1 to the list of arguements. Example:
http://dev.domain.com/a/screen/path?XDEBUG_PROFILE=1

Now, when you access the above url, a file gets created in the output dir (here /tmp/). Just open this file with kcachegrind and you'll be able to visualize exactly what has happened behind the scene to process your request.

kcachegrind

It will be easy then to answer the following questions:

  • what is being called? how many times?
  • how much time does my request take?
  • is my caching system working as expected?
  • why is this request so slow?
  • are my sql requests fast enough?
  • is the framework I use adding too much overhead?
  • etc....

Kcachegrind provides an incredible value to profile your web application. In the next article, we'll stay in this 'make it fast' step and talk about how to make your web app network efficient.

Stay tuned!

4Jan/096

Web application step 5: implement a “relevant” search engine

RT @thibauld Web application step 5: implement a “relevant” search engine

For this very first post of 2009, let me first wish you all the best for the coming year! I worked *a lot* in 2008 and 2009 will be the year where all this underground work will finally make sense so I'm really looking forward to it... and I hope you do too! Let's make 2009 rock :)

Let's now go back to the subject of this post! Search is a very important feature for most web applications (read this book in case you need to be convinced) but, unfortunately, it is also one of the most overlooked feature. Why ? Probably because implementing a good search is a difficult thing: Placing a search form at the top right of your web app is not difficult, having it display some results when a user searches for something is not a problem either... but displaying search results which are actually relevant and useful for the end user is much harder!

Implementing a relevant search engine for your web app is a task that cannot be delegated to a third party web service or to an external library, you have to do it by yourself for a simple reason: a search can only be relevant if it leverages the information contained in your business objects with intelligence. This is why technologies like the new Google Custom Search will not help you to implement a relevant search engine. Google Custom Search is ok for your blog or any other editorial website... but what if you want, for example, your users to be able to look for the freelancers that have the longest experience with SAP FI CO ?

Implementing a good search begins with pen and paper:

  • On which business objects can the end user perform a search ?
  • And for each business objects:
    • How to rank results by relevancy for any given search ?
    • How should I weight each information ?
    • What information do I need to rank my results ?
    • Is this information already present in my database ? Where ?
    • Which filters would be relevant ?
    • What are the relevant way to sort the search results ?

Once you've thought about all this, you did the most important part... but not the most difficult. Now let's implement it!

The main problem you'll be facing when implementing a relevant search engine is performance. Being relevant means processing a lot of information, which is very time consuming, even for today's fast processors and hard drives. This is why you should let the database handle most of the searching work, using well constructed indexes and queries, and avoid as much as possible doing work outside the database.

Use full-text search

Obviously, using the LIKE sql operator (or any other regexp matching...) will not help much in implementing a good search. It is way too slow because LIKE queries cannot use table indexes and requires a full sequential scan of your table for each query. The only exception to this is when you know the first characters you're looking for (i.e. WHERE name LIKE 'query%'), in this case, the database will leverage the btree index on column name (if available) and you're likely to get acceptable performance with very small and targeted queries.

In all other cases, you will have to use fulltext search mecanisms. Fulltext search refers to special type of indexes which are specialized in string pattern matching. Fulltext search has been implemented completely differently in MySQL and PostgreSQL. I will now give you an overview on how to use fulltext search with both databases.

Full-text search with MySQL

As I've said in a previous post, MySQL's fulltext search mechanism focuses on ease-of-use. Indeed, with MySQL you just have to create a 'fulltext' index on the relevant columns to enable fulltext search. Example:

CREATE FULLTEXT INDEX description ON experiences (description);

It creates a fulltext index named description on the column description in the table experiences. To leverage this new index, you need to write your queries using the following syntax:

SELECT id, MATCH(description) AGAINST('') AS score
FROM experiences
WHERE MATCH(description) AGAINST('')
ORDER BY score DESC
OFFSET 0 LIMIT 10;

Using the fulltext index, MySQL will quickly determine the best matching descriptions for the input query. For each row, MySQL returns its relevancy score. You can influence the calculation of this score by using parameters in the AGAINST() operator.

As you can see, it is really easy to implement fulltext search in MySQL... maybe too easy as there are common search related issues that are not addressed. For example, a very classical requirement is perform a search on multiple columns of a table (i.e. title,summary and description) and with different weights assigned to each column.. Now how do I implement that in MySQL? Well... I'm sorry to announce that I did not figure out a beautiful way to address the issue. On Freelance Business Club, we solved the issue by constructing queries like the following one:

SELECT e.id, e.title, e.desc, SUM(s.score) AS score
FROM experiences AS e
INNER JOIN (
SELECT id, <title_weight>+MATCH(title) AGAINST (query' IN BOOLEAN MODE) AS score
FROM experiences
WHERE MATCH(title) AGAINST ('query' IN BOOLEAN MODE)
UNION ALL
SELECT id, <summary_weight>+MATCH(summary) AGAINST('query' IN BOOLEAN MODE) AS score
FROM experiences
WHERE MATCH(summary) AGAINST('query' IN BOOLEAN MODE)
UNION ALL
SELECT id, <description_weight>+MATCH(description) AGAINST('query' IN BOOLEAN MODE) AS score
FROM experiences
WHERE MATCH(description) AGAINST('query' IN BOOLEAN MODE)
) AS s ON a.id=s.id
ORDER BY score DESC
OFFSET 0 LIMIT 10;

Even if I'm not particularly proud of this query, it gives us the flexibility we were looking for (searching different columns with different weight for each) and offers at the same time rather good performance. I was really hoping MySQL would offer a standard way to perform this kind of search but I was unable to found an 'official' solution. I would be really interested if someone here had a more elegant / performant solution, so please let me know!

It is important to note that you cannot use fulltext indexes in MySQL on InnoDB tables, you can only use it with MyISAM tables. For more information, please check MySQL documentation for full-text search.

Full-text search with PostgreSQL

Fulltext search support has been integrated in PostgreSQL official distribution since v8.3. (before this version, it was distributed as a plugin). I personally think that PostgreSQL implementation of fulltext search is way better than the MySQL one. It may be a little harder to apprehend at first sight but it is really worth the effort!

Like MySQL, PostgreSQL provides a special index (called GIN) optimized for fulltext searching. But to leverage the real power of this index, you will need to store the text to be searched in a special column of type 'tsvector'. An example will make it easier to understand:

CREATE TABLE experiences_desc (
id_exp integer NOT NULL REFERENCES experiences(id) ON DELETE CASCADE,
locale regconfig NOT NULL DEFAULT 'english',
title text NOT NULL,
summary text NOT NULL DEFAULT '',
description text NOT NULL DEFAULT '',
search_field tsvector
);
INSERT INTO experiences (id_exp,title,summary,description,search_field) VALUES (
1,
'title',
'summary',
'description',
setweight(to_tsvector('english','title'),'A') ||
setweight(to_tsvector('english','summary'),'B') ||
setweight(to_tsvector('english','description'),'C')
);
CREATE INDEX experiences_idx ON experiences USING gin(search_field);

What is important to note here is the construction of the tsvector search_field ('||' is the string concatenation operator):

  • A tsvector tokenizes your string according to the given locale. This is why you need to pass it the locale in which your string is written. The function to_tsvector() uses a dictionary in the given locale to tokenize the input string. This way, it tokenizes intelligently the string (by detecting plurals, verbs...) and prevents common words from being tokenized. It means that the end user will not have to enter the exact words to have your search engine return relevant results.
  • You can assign different weights to the different parts of your tsvector. PostgreSQL defines 4 weights (1 >= A > B > C > D >= 0) that you can assign to a tsvector and that will help PostgreSQL rank your search results automatically and also filter your search by limiting your search to a certain weight category. Of course, you can change the default weight values assign to each letter.

Now to perform a search, you can use the following:

SELECT title, summary, description, ts_rank_cd(search_field, to_tsquery('english','search query')) AS score
FROM experiences_desc
WHERE to_tsquery('english','search query') @@ search_field
ORDER BY score DESC
OFFSET 0 LIMIT 10;

Important things to note are:

  • you need to use to_tsquery() to prepare your search query and the operator @@ to make it search using the specific GIN index.
  • ranking is done automatically by PostgreSQL thanks to the ts_rank_cd() operator. Several other ranking algorithms are available (using different functions / parameters )

This is really just a quick overview of the tools PostgreSQL provides to perform fulltext search. You can also ask PostgreSQL to highlight the matching patterns in the search results, gather documents statistics... it is really flexible and powerful! To find out more on these features, here's a link to PostgreSQL 8.3 Full Text Search documentation. Needless to say that I find PostgreSQL to be much more powerful than MySQL on the fulltext search aspect.

Use SQL jointures with caution

In order to keep performance acceptable, you should also try to keep the number of SQL jointures in the search query as low as possible. Also, don't forget to perform your jointures after you have found your results and not before. To illustrate this, consider the 2 following queries, the first one:

SELECT e.id, e.title, e.desc, SUM(s.score) AS score
FROM experiences AS e
INNER JOIN (
SELECT id, MATCH(title) AGAINST (query' IN BOOLEAN MODE) AS score
FROM experiences
WHERE MATCH(title) AGAINST ('query' IN BOOLEAN MODE)
) AS s ON a.id=s.id
INNER JOIN some_other_table AS sot ON (sot.id=s.id)
ORDER BY score DESC
OFFSET 0 LIMIT 10;

and the second one:

SELECT e.id, e.title, e.desc, SUM(s.score) AS score
FROM experiences AS e
INNER JOIN (
SELECT id, MATCH(title) AGAINST ('query' IN BOOLEAN MODE) AS score
FROM experiences
WHERE MATCH(title) AGAINST ('query' IN BOOLEAN MODE)
ORDER BY score DESC
OFFSET 0 LIMIT 10;
) AS s ON a.id=s.id
INNER JOIN some_other_table AS sot ON (sot.id=s.id)

The 2 queries give the exact same results but the second one will take only a tiny fraction of the time required by the second one to perform the search. This may seem obvious here as the query is small, but this is the kind of stupid errors that can slip easily in bigger queries (especially if queries are dynamically built).

Dedicated search tables

If you still cannot reach decent performance with all the above, you might have too many jointures in your query and should consider building a dedicated table to perform your searches. This method might also fit your needs in case you want full text search on an InnoDB table. The problem with these dedicated search tables is that they are never completely up-to-date. So it's up to you to know, given your business requirements, if you can use this method or not. We use this method in Freelance Business Club, this is why it may take up to 1 day for new members to appear in members search results.

Limit computation in your search queries

To achieve good performance, you should also limit as much as possible the use of user defined functions or calculus in your queries. Example: If you want to search the most experienced freelancers on SAP, then the total number of months a guy spent working on SAP is a relevant indicator. The problem is that you only have a start date and an end date at your disposal. In this case, instead of calculating the total number of month directly in your query, you'd better use a batch process and set up a cron job which would pre-calculate the total number of month for each experience every <put_a_frequence_here>. This way, it will not slow down every one of your search queries!

Filter and sort your search

It is very convenient for the end user to be able to filter and sort the search results. You should think about this possibility when building your search engine. Sorting is quite easy but beware of filtering. In the best case, you will just have to add another constraint in the WHERE clause or in an existing jointure but some filters might require you to add another jointure to your search which might slow it down.

Calculate the total number of results for your search

Finally, in order to paginate your search results well, you probably need the total number of results for a given search query. To get this value, I know no other method than performing a new SQL search query (without any ORDER BY / OFFSET / LIMIT and without any useless jointure) with a brutal COUNT(*) in the SELECT statement. If anybody has another / better method, please let me know!

To avoid this additional query (COUNT(*) can be a very time-consuming query!), some sites just don't give the total of number of results. See Facebook for example, they don't give the exact number of results and their pagination displays '1 2 3 next'. I guess this is one the reasons (but this is just pure speculation...).

Search implementation methodology

I wanted to finish this long post with the method I use to implement search. In my presentation layer, I build a query array which looks like this :
array(
'text'=>'<search_query>',
'filter'=>array(
'filter_type1'=>'<filter_value1',
'filter_type2'=>'<filter_value2>'
),
'order_by'=>'<order_by>',
'limit'=> <limit>,
'start'=> <start>
);

This query structure is the same no matter what type of search is performed, this is very flexible and facilitate code reuse enormously.
Then, in the business layer, I have a method <object_type>_search() which:

  • transform the array query values into values understandable by the data layer (when necessary).
  • calls <object_type>_dbSearch()
  • computes the search results to return a result array which looks like:
    array(
    'results'=> <search_results>,
    'nb_pages'=> <nb_pages>,
    'nb_results'=> <nb_results>
    );

Finally, in the data layer, I have a method  <object_type>_dbSearch() which calls a method <object_type>_buildSearchQuery() twice : the first time to build the search query and the second time to build the search count query (the one that counts the total number of results for the search query). It returns the raw search results.

This was definitely a long post but I hope nevertheless that it will be intelligible and useful for you!

See you next post!

28Dec/080

Web application implementation step 4: Copying is not reusing

RT @thibauld Web application implementation step 4: Copying is not reusing

It is often said that "good coders code, great reuse" (see an example here). I definitely agree with this... unfortunately, too often, developers think that reusing code means copying code. How often did you see developers looking for code snippets on google to find a way to implement any given functionality, convinced that it will save them time... Unless the code snippet you're looking for is yours (= one you developed and, by extension, master), this behaviour will not help you much... Worse, it will even slow you down and here's why:

  1. you spend time on google, blogs, groups, forums... trying to find the code snippet that fits your needs
  2. you find something that is supposedly doing what you want but how to know exactly if it's ok ? It would require to carefuly examine the code... but it takes time... and you wanted to find a code snippet to save time so you decide it must be ok.
  3. you include the snippet in your own code but you have to spend some time adapting it because it can't be included "as is".
  4. finally you think it's ok and test your code. Not bad but it's not exactly what you expected so you're back hacking the snippet. As it is not your code, you're reluctant to study it in depth (and you're a bit lazy) so you treat it as a black box: you change a value here, a value there... "it should do it". And you spend time fine tuning it.
  5. you finally have the functionality you wanted. Too bad, the guy who wrote the snippet was not an expert and you're now facing crashing, security and/or performance issues with this code. You now have to spend time trying to fix a code that is not yours.

Now did you really save time ? To make things worse, you did not learn anything in the process... This method looks sexy at first sight but beware as it's a trap! When facing a new problem, you'll be better off understanding it and tackling it on your own and now! You have several solutions:

  1. Find a library, a webservice... which adresses the issue you're facing. What's the difference between this solution and using code snippets ? Simple: library, webservices etc... is code that is meant to be reused! They are tested, documented and maintained pieces of code... not a piece of code coming out of nowhere pasted on a blog around 2 photos from flickr.
  2. Use a code snippet you wrote. It is an issue you already tackled in the past and you coded something to address the issue. In this case, there is no problem in reusing your own code. You wrote it, you know it, know how it works, in which context it has been developed and which exact problem it solves.
  3. Code a solution by yourself. You can look for inspiration in others' code snippets but you should code the solution yourself. This way, you are sure to code something that adress your exact issue and you're improving your knowledge and skills. Plus, next time you'll be facing the problem, you'll be able to solve it in a few minutes.

It is often said that "good coders are lazy". This is true, but it's hard becoming a lazy coder and lazy coders are most of the time hard working people :)

21Dec/084

Web application implementation step 3: Framework vs Methodology

RT @thibauld Web application implementation step 3: Framework vs Methodology

Now you must be wondering why I'm so skeptical about frameworks... actually, I have nothing against frameworks in themselves (see my post on the subject), some are better than others but generally speaking, they are all useful in some way. No, the problem lies more in what people tend to expect from frameworks. Indeed, a lot of people expect that, because they're using a good framework, they'll be able to code efficiently and deliver a great web app quickly. Unfortunately, this is a wrong reasoning... your coding efficiency depends more on your methodology than on the framework you're using. Just as an illustration, coding version 1.0 of freelance business club required only 3 months, starting from scratch and without using any framework.

The ultimate rule for coding efficiently is the following: first think, then code. I will now explain in more details the methodology I personally use when building web applications. This methodology is summed up in the following schema:

methodology

Process listing

This is, by far, the most important step, the one that can save you an incredible amount of time! When you start a project, everybody has a gazillion ideas on the application features: "it must do this, that" "how about this ?" "It would be great if we could do that" "You know what would be great ? this!"... Don't get me wrong, having people brainstorming around the application potential features is great, even essential, and you should write down all these great ideas. But obviously, you won't be able to have all these features for version 1.0.

First, you should translate these features into processes. A process puts a feature into perspective. The question to ask is the following: "To which business process does this feature belong to ?". You'll be amazed eventually at how many proposed features don't belong to any real business process and are, thus, irrelevant to your application. In particular, developers (me included) often think about adding "cool" features from a technical point of view... but each "cool" feature, even the smallest ones, take time to code so it is very important to think about its real usefulness from the end user point of view. Thinking in terms of business processes rather than in term of features is a good method to separate relevant features from irrelevant ones.

At this point, don't forget the "backend" processes, that is, the ones required to run your web application. These backend processes are too often overlooked, which result in people finally spending a lot of energy to administrate their web app. And "no", PhpMyAdmin cannot be your only backend application.

Once you you've listed all your processes, the next step is to break them down in milestones. The most crucial processes will go in the first milestone, the less important ones in the next milestone.... ect... the least important ones going in the "later" pseudo-milestone. At the end of this step, you've got all your processes listed by milestone.

Now, before going any further, I encourage you to perform some research to see, for each process, if there's not an available application/api/webservice out there that you can leverage to help you implement this process faster and/or better. This can be a huge booster and might prevent you from reinventing the wheel (and probably a poorer one).

Maquetting / Wireframing / Prototyping

Now that you know which processes to implement for the next milestone, you need to "wireframe" it (in french we say "maquetter", I'm not sure of the english word for it). What is important in this step is to synchronize everybody on each screen which are to be develop. What's in a screen:

  • General layout of the page
  • Information displayed
  • Links available

To build wireframes, I personally tend to use Impress (the PowerPoint like module of OpenOffice.org) but you can use anything you like, pen and paper are ok too... If you're a Windows user, a friend of mine uses Axure, a commercial tool dedicated to wireframing and prototyping. Once you've got all your screens ready,  you may want to turn them into a real prototype to have your users validate how you will implement the business processes. On linux, I found that such a task can be easily realized with kimagemapeditor: just import screenshots of your screens and define the links between them using html image maps. You'll end up with a static website in which you can navigate easily.

Once this is done, you can have your graphical designer start working to turn these ugly wireframes into art!

URL listing

At this point, once you have all your screens and their relationships setup, it is time to decide the URL of each resource. It is very important to think about your URL strategy beforehand for several reasons. First, as you're building a web application, you cannot ignore Google and need to implement a google-friendly URL scheme. Then, thinking about your URLs help you switch mentally between your user requirements and how you will code it.

For example, in the code architecture I implemented, this step is key as I cannot code anything until I translated my URL scheme in a .htaccess (using the urlrewriting Apache module). This .htaccess determines how I organize my presentation handlers.

Database design

During the previous step, you probably began thinking about the 'objects' you are going to manipulate. It is now time to write it down by coming up with a complete and coherent database design. I cannot stress enough how much database design is key. If you fail here, you'll have to code a lot more than needed. You will not only have to paliate the inefficiencies of your database design by coding more, but you will also have to deal with poor application performance sooner than expected.

Does "referential integrity" sound familiar to you ? it should... Too few people realize how powerful a database like Mysql or Postgresql is. SQL being a declarative language, a lot of developers don't like it much but it's a pity... it can be really powerful and you can delegate _a lot_ of work to your database. Letting your database handle work saves you coding time, and will probably boost your application performance. Database design is a vast subject... If I have time, I'll probably write a post dedicated to it but meanwhile, I encourage you to read this great book : The Art of SQL.

Coding

Now you can begin coding :) Of course, I also have a methodology for coding. This is good practice to implement process by process (you can parallelize work depending on your team size). The most important thing being what the end user sees in the end, you should start for each process with the presentation layer, then continue with the business layer to finally end with the data layer. To ensure a good level of quality, once you finished coding a layer, validate it using dummy values from the n-1 layer (because the n-1 layer has not yet been implemented). If you follow this methodology, you'll be amazed at how fast you'll be able to develop your web application.

That's all for today, see you next post!