Archive for December, 1969
Having fun with geotagged photos
A couple of months ago I wrote an article about geotagging, a new feature of Picasa 2.5. Thanks to this new feature you are able to store the location where a photograph was taken by finding the location in Google Earth. Today I am going to explain how to extract the geotagging information stored by Picasa and do some fun things with it in PHP.
What are coordinates?
When you geotag a photograph you are basically adding a very specific combination of a latitude and a longitude coordinate to the image file. Every point on earth can be represented by a combination of these two coordinates.

The way these coordinates work take a bit of getting used to because the earth is not a simple flat surface but essentially a sphere. Some basic knowledge about geometry is required to fully understand the reasoning behind the coordinate system.
Latitude
The location of a point on the north – south axis. The latitude is measured in degrees from the equator. The north pole is 90 degrees, the south pole is -90 degrees. Every point on earth is somewhere in between.

The angle between the equator and point α is the latitude of Rome (in this case 41.890278°)
Longitude
The location of a point on the east – west axis. The longitude is measured in degrees from the north – south axis called the Prime Meridian. The prime meridian we use nowadays is not a naturally occurring point of reference, but originally used by the British Navy – it intersects Greenwich, England. Almost every seafaring nation used their own prime meridian, but given the need for a universal coordinate system the Greenwich meridian was adopted as the universal prime meridian. The latitude is measured in degrees from the prime meridian, ranging from -180 to 180 degrees. Every point on earth is somewhere in between.

The angle between the prime meridian and point β is the longitude of Rome (in this case 12.492222°)
Different notations
The notation that we are going to use today is by using a single fractional decimal number for each coordinate. This notation is used by most modern computer applications and web services, such as Yahoo Maps, Live Maps, Google Maps and Google Earth.
The traditional notation of the latitude and longitude is in degrees. Every degree is divided into 60 minutes and every minute is divided into 60 seconds. With this method of notation the degrees and minutes are integer numbers. Seconds can be written as a fractional decimal number. Also, instead of using positive and negative numbers this notation always uses a positive number and one of the following letters N, S, E or W as a suffix to indicate. A negative latitude uses the letter S a suffix, a positive number uses the letter N. A negative longitude uses the letter W as suffix, a positive number uses the letter E.
The following coordinates use different notations, but both represent the same location:
41° 53' 25" N = 41.890278° and 12° 29' 32" E = 12.492222°
Extracting the coordinates from the image file
Picasa stores the coordinates inside a special area inside the image file called Exif, designed to store meta information about the images. It is usually used to store the camera model, the settings of the camera and the date when the picture was taken. Exif is a common standard and also allows us to include geotagging information. Extracting this information with PHP is quite easy because PHP provides us with an easy to use module for reading Exif data.
$exif = exif_read_data($filename, 'EXIF');
var_dump($exif['GPSLatitude']);
-> array(3) { [0]=> string(4) "41/1" [1]=> string(4) "53/1" [2]=> string(8) "25/1" }
var_dump($exif['GPSLatitudeRef']);
-> string(1) "N"
var_dump($exif['GPSLongitude']);
-> array(3) { [0]=> string(4) "12/1" [1]=> string(4) "29/1" [2]=> string(8) "32/1" }
var_dump($exif['GPSLongitudeRef']);
-> string(1) "E"
After obtaining the raw coordinates from the Exif data we need to convert it to something that we can use. I use two helper functions - one to convert the degrees, minutes and seconds to fractional decimal numbers - Exif stores numbers as actual fractions - and one to convert these three to a single fractional decimal degree. This is the format that we can use for all kinds of fun stuff. All of this is wrapped into a single function called getCoordinates(). The result of this function is an array containing the latitude and longitude of the specified file.
if ($c = getCoordinates($filename)) {
$latitude = $c[0];
$longitude = $c[1];
// use the data in fun ways...
}
Creating links to a mapping website
The easiest way to use the geotagged information on your website is by creating a link to one of the mapping services such as Google Maps, Yahoo Maps or Live Maps from Microsoft. The only thing you have to do is create a simple link with a couple of parameters that control the location, the level of detail and the type of map. Below you will find examples of the most common mapping services.
Google Maps
http://www.google.com/maps?z=zoom&ll=latitude,longitude&t=mode
- zoom
- an integer from 0 to 19 - higher is closer, I usually use 16 for photographs
- mode
- k = satellite, m = map, h = hybrid
http://www.google.com/maps?z=16&ll=41.890278,12.492222&t=k
Yahoo Maps
http://maps.yahoo.com/broadband/#mvt=mode&lon=longitude&lat=latitude&mag=zoom
- zoom
- an integer from 16 to 1 - smaller is closer, I usually use 2 for photographs
- mode
- s = satellite, m = map, h = hybrid
http://maps.yahoo.com/broadband/#mvt=s&lon=12.492222&lat=41.890278&mag=2
Live Maps
http://maps.live.com/default.aspx?cp=latitude~longitude&style=mode&lvl=zoom
- zoom
- an integer from 1 to 17 - higher is closer, I usually use 16 for photographs
- mode
- a = areal, r = road, h = hybrid
example or download the source
Embedding a map of the location
Alternatively you could also use the Google Maps API (example, source) or the Yahoo Maps API (example, source) to embed a map of the location directly in your own webpage. Instead of linking to Google or Yahoo, your visitors are able to view the location of a photograph without even leaving your own website. All you need to do is sign up for use of one of their APIs and follow their examples and use our coordinates. Both APIs are very extensive and we are only scratching the surface of what is possible, but that is far beyond this article.
On my own personal photo album I use the Google Maps API and the prototype window class to show the map in a floating window on top of the webpage. Just click on the Toon locatie (translation: Show location) link at the bottom of the page to see the result.
Creating KML files for Google Earth
Thanks to the KML specification it is possible to use Google Earth as your personal photo album. KML are simple XML files that contain a number of placemarks - geographical locations and for each placemark you can show a small thumbnail of the photograph that was taken in that location. If you click on that thumbnail you can show some extra information about that photo including a larger version and perhaps a link back to the website where the original photo can be downloaded. This is the same technology that is also used by Google Earths new Geographic Web feature - it pulls the information from Wikipedia and photo website Panoramic.com.
I've created a similar feature for my personal photo album. Click on Toon in Google Earth (translation: Show in Google Earth) and if you configured your browser to open KML files in Google Earth all my vacation photographs will show up in Google Earth (otherwise you must manually save the KML file and open it in Google Earth).
View the example or download the source
And more...
How about embedding the location of the photograph on the web page that contains the photograph. You can easily embed geographical information in web pages using the GeoURL ICBM meta tag or by using microformats. You can do the same thing with RSS and Atom feeds using the GeoRSS or ICBM RSS module.
5 things you didn’t know about me
Joost de Valk tagged me. That means I will have to reveal some things that you probably didn’t know about me.
- I asked my girlfriend to marry me last summer. We were on vacation in Italy and one morning we rented a small rowing boat. On the middle of Lake Bracciano I popped the question. She said ‘Yes’ by the way and we are to be married in April next year.
- My bookcase is completely full with books about the history and religion of Israel. There is nothing better than going offline for a couple of hours, sit down in a comfy chair and just read something completely different.
- I did not finished my studies. I started studying computer sciences and dropped out after two years. After taking a little break I decided to study graphic design at Minerva, an arts academy. I didn’t finish that study either. At the time nobody understood my move from computer sciences to graphic design, but given my current day job it seems like the perfect combination.
- I never thought I would like ballroom dancing so much. My girlfriend and I have been dancing for about 2 years and I am amazed what we have learned so far. My favourite dances are the English waltz and the Jive.
- Today has been a good day. I’ve been tagged, accepted into the 9rules network and featured on Ajaxian.
Wouter Demuynck, Appie Verschoor, Tristan. Tag, you're it!
Make your pages load faster by combining and compressing javascript and css files
As some of you may know, I am currently working on a content management system. Although I am not able to share all of the code - it is proprietary after all - I already made one debugging tool public. This tool can be used to test some common techniques which decreases the bandwidth generated by feed consumers. Today I am going to make a second tool public - including source code. It is a method to decrease the loading time of a page by combining all the different css or javascript files and compress them.
About six months ago I noticed the pages generated by the content management system were in itself very clean and small, but that these pages still took a long time to load for new visitors. Even on a fast internet connection it took more than 8 seconds to load a basically empty page. The server generated the page in about 350ms, so that wasn't the problem. The problem turned out to be a combination of two things: each page used more than 12 different css files because each plugin supplied its own css definitions and because the use of the rather large prototype and scriptaculous javascript libraries which also consists of a couple of different files. Now that an article about the same problem featured on the Yahoo! User Interface blog, I decided to make my solution public, so others can benefit from it.
The solution turned out the be simple, combine all the different files into a single large file and compress that file using gzip. Unfortunately, if you do this manually you are going to run into maintenance problems. That single compressed file is no longer editable. So after editing one of the original source files you will have to recombine it with the other files and re-compress it.
Instead of going for the easy - but hard to maintain - solution I decided to automate the process and thanks to a small PHP script and some clever URL rewriting I now have an easy to maintain method to speed up the loading of pages that use many or large css and javascript files.
The idea is that you have one directory for css files and one directory for javascript files on your server. If you rewrite the URLs that point to these directories to a small script that intercepts the requests for those files. The script loads the file from disk and compresses it using gzip. It then sends that compressed file back to the browser. Given that javascript and css files compress really well this will greatly decrease the size of the data that is going to be transferred and thus decrease the time needed to download these files. Because this works completely transparently you do not need to change anything in your existing code.
But there is more. Compressing the files will decrease the size of the data that needs to be transferred, it does not solve the problem that the browser can only download a limited number of files at the same time. If you have many different files that need to be loaded the browser will not optimally use the bandwidth it has access to. It will request some files from the server and wait until those files are retrieved before the rest of the files are requested. The solution to this problem is to combine all those different files into one large file. And this is exactly what the script tries to do. You can concatenate different files by simply adding the names of the other files to the URL of the first file.
Take for example the following URLs:
http://www.creatype.nl/javascript/prototype.js
http://www.creatype.nl/javascript/builder.js
http://www.creatype.nl/javascript/effects.js
http://www.creatype.nl/javascript/dragdrop.js
http://www.creatype.nl/javascript/slider.js
You can combine all these files to a single file by simply changing the URL to:
http://www.creatype.nl/javascript/prototype.js,builder.js,effects.js,dragdrop.js,slider.js
The script will intercept the attempt to retrieve something from the javascript directory and will notice that
you want to fetch multiple files at once. It will then concatenate the requested files, compress it and
send it as one to the browser. Also notice that I include the files that come with scriptaculous manually
and I do not use the scriptaculous.js file like you normally would. The reason for this is that
scriptaculous.js loads each javascript file individually. If I use the scriptaculous.js
file I will get the benefit of compression, but the different files won't be combined into a single file.
Unfortunately I noticed a nasty side effect of the combination of these two methods. If you combine many files the resulting files can be come quite large. Compressing those files takes some time and on a busy server that time will become large enough to negate a significant portion of the improvements you made earlier. But this problem can also be solved by simply adding a cache that stores an already combined and compressed version of the files. The cached version is automatically created the first time that particular combination of files is used and used every time - as long as the files are not changed. The result is that once the cache is created there is almost no overhead and the compressed file is delivered almost instantly.
I've done some informal testing on my own website and I did get some impressive results. Before this script was added to my website you needed to download 8 javascript files, in total 168 Kb - the prototype and scriptaculous libraries. On average this took about 1905 ms. After installing this script you now need to download only a single file of 37 Kb which only takes around 400 ms. Your results may vary of course, but given that it shaved 1.5 seconds of a total loading time of 3.5 seconds, this script almost cut the time needed to load a page on my weblog in two.
Configurating this script is easy. First you need to
download and configure the
combine.php script. By default this script look in the javascript and css
directory in the root of your website, but if you are currently using different directories you can change
these values at the top of the combine.php script. Upload the combine.php script to
the root of your website. Secondly you need to create a cache directory that is writable by the
web server. Again, by default this script will look for the cache directory in the root of the
website, but you can change this in the combine.php script. Finally you need to create or
modify your .htaccess file. If you do not have a .htaccess file you can create
it in the root of your website and add the following lines. If you already have an preexisting
.htaccess file you can simply add the following lines to the file:
RewriteEngine On RewriteBase / RewriteRule ^css/(.*\.css) /combine.php?type=css&files=$1 RewriteRule ^javascript/(.*\.js) /combine.php?type=javascript&files=$1
Note: if your preexisting .htaccess file already uses URL rewriting you do not need to add
the first two lines. You can simply add the last two lines to the bottom of the .htaccess
file.
Why modifying the GPL is bad…
Alexander Muse explains on his weblog why Big in Japan uses a slightly modified version of the GPL. In fact, he doesn't even consider it a change or modification, but only a clarification of something that he considers unclear.
When we released the source code for the Big in Japan tools we used this license, but decided to clarify our interpretation of the term "distribution". We did not modify or change the license in any way, instead before including the full text of the license we defined "distribution".
I'm sure that Alexander has the best of intentions, but even by adding your own clarification of a term can have very big consequences and can complete change the way the license is interpreted and the result is that you are actually modifying the original meaning of the license. Anticipating this response from the open-source community Alexander already has written an explanation of why we are wrong: the GPL allows you to attach additional notices.
Some people disagree, suggesting that by defining the term we have "defacto" changed the license. We disagree with this position because the license itself (i.e. at the end of the license in a section titled "How to Apply These Terms to Your New Programs" recommends that you attach various notices to the program (i.e. prior to the text of the license) similar to intent notice we placed prior to the preamble.
While this is true, these recommended notices are quite a bit different from what Alexander is trying to do. These notices are indented for the licensor to grant the licensee additional rights that the GPL alone does not grant. An example of such a notice is a linking exception for binary applications. Alexander tries to do the opposite: he tries to limit the rights granted to licensees and that is simply not allowed. You can give more rights than the GPL, but you cannot use the GPL and take away some of the rights that the GPL intends to grant.
So what did Alexander Muse add to the GPL and what is specifically wrong with it. To find out we must first look at the actual text:
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation version 2 of the License. For the purposes of this license our intent is that anyone who modifies the source code must provide those modifications to Big in Japan via the provided source code repository. For purposes of this license, it is our intent that hosting the source code (regardless of whether or not you distribute the binary code) will be considered distribution. This applies to anyone regardless of whether or not they actually publish or distribute the modified code. For example, if you build a hosted application using the source code and modify it in any way, you MUST contribute the changed code back to the project.
Sounds pretty harmless, right? It is not like they snuck in clause that says "Where this license says 'free' we mean free in a monetary sense. You do not have to pay us money to use this software, but you will have to give us your firstborn". They only want ensure that changes are contributed back to the project. So the intentions may indeed be good, unfortunately the actual result isn't good at all.
Distribution
As I explained above by changing the definition of a single term you can completely change the way the license is interpreted. And this is exactly what happens when you change the term distribution.
Distribution is not defined in the GPL itself, but we are talking about distribution with regard to copyright legislation and there is more than enough case law to establish what distribution means in that sense. It comes down to this: if you are using or making one or more copies of a copyrighted works, you are not distributing it – if you are giving those copies to other persons or legal entities you are distributing. Use or copying is not distributing – giving to others is.
Alexander wants to change this:
For purposes of this license, it is our intent that hosting the source code (regardless of whether or not you distribute the binary code) will be considered distribution. This applies to anyone regardless of whether or not they actually publish or distribute the modified code.
Given that hosting the source code is required for using it – without hosting it on a server you cannot possibly use it in any useful way – it seems that Alexander want to make 'using the software' the same as distributing.
This is quite a big departure from established copyright law and also quite a big departure from the intentions of the author of the GPL. The GPL FAQ states:
Does the GPL require that source code of modified versions be posted to the public?
The GPL does not require you to release your modified version. You are free to make modifications and use them privately, without ever releasing them. This applies to organizations (including companies), too; an organization can make a modified version and use it internally without ever releasing it outside the organization.
Is making and using multiple copies within one organization or company "distribution"?
No, in that case the organization is just making the copies for itself. As a consequence, a company or other organization can develop a modified version and install that version through its own facilities, without giving the staff permission to release that modified version to outsiders.
So, it may look like a minor point, but it severely impacts the way licensees can use the licensed application. With the regular GPL you can internally change the software and use these changed without ever having to release your changes to the public. The modified GPL requires me to always make my changes public. The modified GPL is taking rights away that GPL already granted me.
In itself this is not a problem, but you can’t call the license GPL and attach some explanation of how you want to GPL to work. The GPL allows authors to keep internal changes secret – if you want to prevent that you need to move away from the GPL and create your own license.
Contributing back to the project
Another big change is that the Alexander wants to require that the changes we already talked about are always contributed back to the project – using the projects SVN repository even.
For the purposes of this license our intent is that anyone who modifies the source code must provide those modifications to Big in Japan via the provided source code repository.
First of all, this small change makes the GPL quite a different beast. It is a common mistake, but the GPL does not require you to contribute changes back to the original author. You do not even have to notify the original author that you modified or distributed his source code. This is basis of the GPL license. It is a unilateral license that grants anybody who agrees to the terms certain additional rights. There is no obligation to the original author and licensor.
Secondly there are some practical problems. For example there is no guarantees that Big in Japan will still exist in 10 years, or even next month. What if Big in Japan no longer exists how can we contribute those changes back? And if we are not able to do so – we will no longer legally be able to modified code.
Another problem – what if Big in Japan won't accept my changes in their source repository? Their license says I am required to provide those modification via the source code repository – so if they don't accept my changes I cannot fulfil all my obligations and cannot use my own modifications.
How about I use 10 lines of the code in question in my own GPLed application. The result is that my application is also covered by the additions to the GPL – so everybody who changes my application has to contribute my application back to Big in Japan.
The results
All in all these changes make it quite a different license – a license that is not compatible with the original GPL anymore. Calling this new license GPL is not only misleading but also not allowed according to the GPL itself.
I'm not saying that the changes are necessarily a bad idea. But they do make the code essentially useless for other real GPL projects because using this code would mean that the new combined code would fall under the modified GPL. If that is what you want, then fine, but stop calling it GPL. It's not.
Furthermore it impossible to include real GPL code in the Big in Japan projects without the express permission of the original authors. So, the changes to the GPL will achieve the effects they imagined, but at the same time it will set the project completely apart from the rest of the open-source community. A side effect of which I imagine they did not expect and I am sure they can't be happy with.
Update:
It seems that Big in Japan already modified the additional requirements. These changes have not yet propagated back to the SVN repository, but if what Alexander Muse wrote is true they completely dropped the requirement that any modifications must be contributed back to Big in Japan. They now simply state that "you MUST release the modified source code". This makes my second point completely moot. The first point – the redefinition of 'distribution' and my conclusion still stand though. The modified modification is still utterly incompatible with the GPL.
Reducing the bandwidth used by feeds
The last couple of days I have been working on implementing proper Atom support for a proprietary content management system. Atom support itself wasn't a big problem, but I did run into a problem implementing some of the more common techniques to save some bandwidth. The result is a tool to check if your own weblog uses one or more of these techniques. For now it is called the "Bandwidth-saving Header Validator for Feeds", I am still looking for a shorter name.
The idea is simple – only send the complete Atom feed to new consumers and if there is a change in the feed. Otherwise simply tell the consumer that the feed has not been changed. Given that consumers try to retrieve the feed more often than the feed is updated this saves considerable bandwidth. There are two main methods to achieve this effect. Before we get into how these methods work I must point out that both the server that sends the feed and the consumer must support these techniques – if the server and the consumer do not support the same technique the complete feed will be send every time – just to be sure.
Last-Modified / If-Modified-Since
The first method is using the Last-Modified and the If-Modified-Since
HTTP headers. Basically what happens is that each time the feed is fetched – it will also send
the date the feed was last modified. That date is stored by the consumer and the next time it
requests the feed it will include instructions to return the feed only if it has been modified
since that previous modification date. If it hasn't been modified since, the server will only
return a 304 Not Modified status code which will let the consumer know that nothing
has changed.
GET /index.atom Host: rakaz.nl HTTP/1.0 200 OK Content-Type: application/atom+xml Content-Length: 43023 Last-Modified: Tue, 05 Dec 2006 13:04:54 GMT <feed> … </feed>
GET /index.atom Host: rakaz.nl If-Modified-Since: Tue, 05 Dec 2006 13:04:54 GMT HTTP/1.0 304 Not Modified Content-Type: application/atom+xml Content-Length: 0 Last-Modified: Tue, 05 Dec 2006 13:04:54 GMT
ETag / If-None-Match
The second method is very similar to the first, but instead of looking at the modification date
it looks at a token that is updated every time the feed changes. Often this is a MD5 digest of the
feed itself or some other form of hashing. The server sends this token by using the ETag
header and the consumer stores this token for future use. If the consumer wants to know if the feed
is updated it will send a If-None-Match header with the stored token. The server will
return the complete feed if the token send by the consumer is different from it's own token. If both
tokens are the same it will once again return a 304 Not Modified status code indicating
the feed has not been modified.
GET /index.atom Host: rakaz.nl HTTP/1.0 200 OK Content-Type: application/atom+xml Content-Length: 43023 ETag: "2938ef27a739cd30e30fe02339402aabf" <feed> … </feed>
GET /index.atom Host: rakaz.nl If-None-Match: "2938ef27a739cd30e30fe02339402aabf" HTTP/1.0 304 Not Modified Content-Type: application/atom+xml Content-Length: 0 ETag: "2938ef27a739cd30e30fe02339402aabf"
A-IM: feed / IM: feed
These two methods were pretty simple to implement and worked flawless. When I wanted to implement a third method my problems started. First of all I'll try to explain how this method works. The third method is based on the second and is sometimes called Delta encoding or RFC3229+feed. When the feed is changed it will not send the complete feed, but instead it will only send the new or changed items.
Just like the previous method the server will send an token by using the ETag header and just
like the previous method, the consumer will store this token for future use. The difference it that when the
consumer wants to update the feed it will send an additional A-IM header together with the
If-None-Match header. This will let the server know that the consumer supports Delta encoding.
The server must now determine – solely based on the token - which items were already send to the consumer and if any of those items were changed in the mean time or if any new items were created. The way the server determines this depends on the implementation, but globally each token represents a certain state. All the server has to do is keep a log of all changes and store the token together with information about what changed between states.
If nothing is changed it will simply send a 304 Not Modified status code – just like before.
If something was changed it will send a 226 IM Used status code – letting the consumer know that
something was changed - and return a feed with only the changed items.
GET /index.atom Host: rakaz.nl HTTP/1.0 200 OK Content-Type: application/atom+xml Content-Length: 43023 ETag: "19b32871240d41ad4234-49-50-51" <feed> … [3 items] … </feed>
GET /index.atom Host: rakaz.nl A-IM: feed If-None-Match: "19b32871240d41ad4234-49-50-51" HTTP/1.0 226 IM Used IM: feed ETag: "29341230cd2823aa2bcd-49-50-51-52" Content-Type: application/atom+xml Content-Length: 438 <feed> … [1 item] … </feed>
After implementing this third method I noticed that instead of the 226 IM Used status code I got a
500 Internal Server Error status code. I returned the 226 IM Used status code in my script,
but somehow it got changed to 500 along the way. It took me a while to find the source of the problem –
Apache 1.3. I still use Apache 1.3 on my development machine and even on some of my servers. Apparently Apache uses
a static list of status codes and when it encounters a status code it does not recognize it will replace it with
500 Internal Server Error. Great!
Given that the 226 IM Used status code is required by the RFC3229
specification it
is impossible to support Delta encoding on Apache 1.3. Luckily this problem has been fixed in Apache 2, but not everybody has
control over which version of Apache they use. Some shared hosting solutions still use Apache 1.3 because they consider
Apache 2 to be less stable than 1.3. So I have to check the server version and will not use Delta encoding on Apache
versions older than 2.
Testing the headers of your own feeds
The content management system I am working on is closed-source, so I am not able to share any of the work, but I am able to make the tool I created for testing public. With this tool you can check if your own feed supports any of the features mentioned above. It currently supports fully automated testing of the first two methods and it will also allow you to test Delta encoding – but you need to run the test first – manually add a new item to your weblog – and then continue with the test by clicking on a link specified on the first step. Enjoy!