Saturday, January 19, 2008

Merge in Source Control

In theory, this could be very difficult:

* What happens if Jane changed some of the same lines that Joe changed, but in different ways?
* What happens if Jane's changes are functionally incompatible with Joe's?
* What happens if Jane made a change to a C# function which Joe has deleted?
* What happens if Jane changed 80 percent of the lines in the file?
* What happens if Jane and Joe each changed 80 percent of the lines in the file, but each did so for entirely different reasons?
* What happens if Jane's intent was not clear and she cannot be reached to ask questions?

Friday, January 18, 2008

SQL Performance

* One: only "tune" sql after code is confirmed as working correctly.

* Two: ensure repeated sql statements are written absolutely identically to facilate efficient reuse: re-parsing can often be avoided for each subsequent use.

Writing best practices: all sql verbs in upper-case i.e. SELECT; separate all words with a single space; all sql verbs begin on a new line; sql verbs aligned right or left within the initial verb; set and maintain a table alias standard; use table aliases and when a query involves more than one table prefix all column names with their aliases. Whatever you do, be consistent.

* Three: code the query as simply as possible i.e. no unnecessary columns are selected, no unnecessary GROUP BY or ORDER BY.

* Four: it is the same or faster to SELECT by actual column name(s). The larger the table the more likely the savings.
SELECT customer_id, last_name, first_name, street, city FROM customer; Rather than:
SELECT * FROM customer;

* Five: do not perform operations on DB objects referenced in the WHERE clause:
SELECT client, date, amount FROM sales WHERE amount > 0;
Rather than:
SELECT client, date, amount FROM sales WHERE amount!= 0;

* Six: avoid a HAVING clause in SELECT statements - it only filters selected rows after all the rows have been returned. Use HAVING only when summary operations applied to columns will be restricted by the clause. A WHERE clause may be more efficient.
SELECT city FROM country WHERE city!= 'Vancouver' AND city!= 'Toronto'; GROUP BY city;
Rather than:
SELECT city FROM country GROUP BY city HAVING city!= 'Vancouver' AND city!= 'Toronto';

* Seven: when writing a sub-query (a SELECT statement within the WHERE or HAVING clause of another sql statement):
-- use a correlated (refers to at least one value from the outer query) sub-query when the return is relatively small and/or other criteria are efficient i.e. if the tables within the sub-query have efficient indexes.
-- use a noncorrelated (does not refer to the outer query) sub-query when dealing with large tables from which you expect a large return (many rows) and/or if the tables within the sub-query do not have efficient indexes.
-- ensure that multiple sub-queries are in the most efficient order.
-- remember that rewriting a sub-query as a join can sometimes increase efficiency.

* Eight: minimise the number of table lookups especially if there are sub-query SELECTs or multicolumn UPDATEs.

* Nine: when doing multiple table joins consider the benefits/costs for each of EXISTS, IN, and table joins. Depending on your data one or another may be faster.
Note: IN is usually the slowest.
Note: when most of the filter criteria are in the sub-query IN may be more efficient; when most of the filter criteria are in the parent-query EXISTS may be more efficient.

* Ten: where possible use EXISTS rather than DISTINCT.

* Eleven: where possible use a non-column expression (putting the column on one side of the operator and all the other values on the other). Non-column expressions are often processed earlier thereby speeding the query.
WHERE SALES < 1000/(1 + n);
Rather than:
WHERE SALES + (n * SALES) < 1000;

* Twelve: the most efficient method for storing large binary objects, i.e. multimedia objects, is to place them in the file system and place a pointer in the DB.

* Thirteen: Use inner-joint rather than left/right/cross joint

* Fourteen: In most of cases, GROUP+HAVING < WHERE

Thursday, January 17, 2008


If you build websites that require users to register it’s your responsibility to keep their passwords safe. And if you’re storing the passwords in plain text then you’re not doing your job properly. It may be that, like Reddit, you think that storing passwords in plain text leads to a better user experience. I happen to agree with you. But then, like Reddit, what happens if your database is stolen? It’s not just your site that is compromised. Since most users use the same password on multiple sites, all those sites have also been compromised.

No data is entirely secure, and if anyone else has access to your webserver (the company managing the server for you?) or your database (the company storing the backups?) then you don’t have total control over the security anyway. So there’s always a chance your database could be stolen. So, the simple rule is to hash your passwords.


A hash is a string derived from the original password via a one-way algorithm. In other words, it’s easy to create the hash from the original, but harder (when used for security, ideally impossible) to create the original from the hash. You store the hash in the database, and when the user signs-in you hash the password they sign-in with and compare it to the hash in the database. Something like this

if( $user->passwordhash == sha1( $_POST['password'] ) )

That way, you never store the user’s password.

There are a number of hashing algorithms in PHP, of which md5 and sha1 are the most commonly used. Unfortunately, neither is as secure as they were once thought to be. It would be better to use a more secure hash, and if you have the Hash engine in your PHP installation (included by default since PHP 5.1.2) then you have access to many more algorithms. So a better example would be

if( $user->passwordhash == hash( 'whirlpool', $_POST['password'] ) )

Rainbow tables

But there’s another problem. Once your database is stolen, the thief has plenty of time to crack the passwords using a simple Rainbow Table attack. This involves creating a large selection of hashes based on likely passwords (e.g. every word in the dictionary) and then comparing the hashes with the hashes in your database. Within an hour or so, half the passwords in your database will probably have been cracked.

To prevent this you should salt each password by adding a random string to it (called a salt or nonce). The time consuming part of a rainbow table attack is building the dictionary of hashes. Adding a random salt to the password means the thief has to build a whole new dictionary of hashes for each salt, making a rainbow table attack too time consuming to be viable. Each password should have a different salt, and the salt doesn’t even need to be secret.

The Code bit

So, for secure passwords you need code that looks something like this

// get a new salt - 8 hexadecimal characters long
// current PHP installations should not exceed 8 characters
// on dechex( mt_rand() )
// but we future proof it anyway with substr()
function getPasswordSalt()
return substr( str_pad( dechex( mt_rand() ), 8, '0',
STR_PAD_LEFT ), -8 );

// calculate the hash from a salt and a password
function getPasswordHash( $salt, $password )
return $salt . ( hash( 'whirlpool', $salt . $password ) );

// compare a password to a hash
function comparePassword( $password, $hash )
$salt = substr( $hash, 0, 8 );
return $hash == getPasswordHash( $salt, $password );

// get a new hash for a password
$hash = getPasswordHash( getPasswordSalt(), $password );

You don’t have to attach the salt to the hash, you can instead store them separately within the database, but I like keeping them together in a single string. Equally, the salt needn’t be in hexadecimal, but I like the symmetry with the hexadecimal hash.

Finally, as Thomas Ptacek points out, you don’t want the fastest hash algorithm in the world for this - a fast algorithm is more useful to an attacker than it is to you.

Good Encryption:

Wednesday, January 16, 2008

Proxy Detector

* Proxy Detector v0.1
* copyrights by: Daantje Eeltink (
* first build: Mon Sep 18 21:43:48 CEST 2006
* last build: Tue Sep 19 10:37:12 CEST 2006
* Description:
* This class can detect if a visitor uses a proxy server by scanning the
* headers returned by the user client. When the user uses a proxy server,
* most of the proxy servers alter the header. The header is returned to
* PHP in the array $_SERVER.
* License:
* GPL v2 licence. (
* Support:
* If you like this class and find it usefull, please donate one or two
* coins to my PayPal account
* Todo:
* Add open proxy black list scan.

class proxy_detector {

* Set defaults...
function proxy_detector(){
$this->config = array();
$this->lastLog = "";

//set default headers
$this->scan_headers = array(

* VOID setHeader( STRING $trigger )
* Set new header trigger...
function setHeader($trigger){
$this->scan_headers[] = $trigger;

* ARRAY $triggers = getHeaders( VOID )
* Get all triggers in one array
function getHeaders(){
return $this->scan_headers;

* VOID setConfig( STRING $key, STRING $value)
* Set config line...
function setConfig($key,$value){
$this->config[$key] = $value;

* MIXED $config = getConfig( [STRING $key] )
* Get all config in one array, or only one config value as a string.
function getConfig($key=''){
return $this->config[$key];
return $this->config;

* STRING $log = getLog( VOID )
* Get last logged information. Only works AFTER calling detect()!
function getLog(){
return $this->lastLog;

* BOOL $proxy = detect( VOID )
* Start detection and return true if a proxy server is detected...
function detect(){
$log = "";

//scan all headers
foreach($this->scan_headers as $i){
//proxy detected? lets log...
$log.= "trigger $i: ".$_SERVER[$i]."\n";

//let's do something...
$log = $this->lastLog = date("Y-m-d H:i:s")
."\nDetected proxy server: "
.gethostbyaddr($_SERVER['REMOTE_ADDR'])." ({$_SERVER['REMOTE_ADDR']})\n".$log;

//mail message
Proxy detected at {$_SERVER['REQUEST_URI']}",$log);

//write to file
$f = $this->getConfig('LOG_FILE');
$fp = fopen($f,'a');
die("<strong>Fatal Error:</strong> Couldn't write to file:
Please check if the path exists and is writable for the webserver or php...");

return true;

//nope, no proxy was logged...
return false;


Browsing the code you will notice that it uses a log file to store the data so we will have to create one called "
proxy_detector.log". Don't forget to give it the proper permission on the server (CHMOD it to make it writable).

Ok so we already have 2 files. Let's go ahead and create a new one called ""
.This one will initiate our class for our future use and do what we want it to do so I suggest you to edit it to suit your needs. Copy paste this code and save it:

* Proxy Detector v0.1
* Implementation example.
* Mon Sep 18 23:29:47 CEST 2006
* by:
* Documentation:
* I use this file as an include at the top of some php files
* to block proxy users from the scripts that included this file.
* This file is only an example on how to implement the detector class.
* But it could be usefull as is...
* Check the remarks in the class for more documentation.

//include detector class, assuming it's in the same directory as this file...

//init class
$proxy = new proxy_detector();

//set optional extra triggers, no need to...
//I think I've got all of them covered in the class...
// $proxy->setTrigger('HTTP_SOME_HEADER_1');
// $proxy->setTrigger('HTTP_SOME_HEADER_2');

//set optional config
// $proxy->setConfig('MAIL_ALERT_TO','');
// $proxy->setConfig('LOG_FILE','/home/daantje/public_html/proxy/proxy_detector.log');

//start detect

//returned true, lets die...
echo "<h1>Proxy detected</h1>";
echo "
Please disable your proxy server in your browser preferences or internet settings,
and try again.<br><br>";

//parse logged info
echo nl2br($proxy->getLog());

//some credits...
echo "<hr><strong>proxy detector v0.1</strong>
- &copy;2006 < a href=\"\"

//and do nothing anymore! (but not in my example)

//else, proceed as normal, put your code here...