Saturday, December 17, 2005

Working Offline

Currently I'm spending many hours working offline and this wasn't very easy when I started doing it. I was used to have internet access 24x7. Since I'm in Thailand (see http://tomionthai.blogspot.com/) now and I don't have an internet access at the place I'm staying, I had to find a new way of working.

My first approach was saving the websites with Firefox (http://www.mozilla.org/) for offline reading and Google Desktop Search (http://desktop.google.com/) for indexing all the pages. After a while I switched from Firefox to HTTrack (http://www.httrack.com/) for downloading the pages, since it allows the download of a complete website with all the links and referenced pages. This approach was ok, but had its disadvantages like:

1) I had to come up with my own directory structure to save all the websites.
2) Using Firefox wasn't very comfortable since it tries to save the file with the URL file name. This is very unpleasant if you have a site providing dynamic content and it's using post - every URL looks exactly the same and you will have to rename each file you would like to save locally.
3) Some websites can't be downloaded with HTTrack (or at least I didn't find the right configuration). I spent hours to find the right configuration for downloading Maven (http://maven.apache.org/), but I wasn't very successful.
4) It took quite some time to start and configure HTTrack everytime I wanted to have a certain page for offline reading.
5) Often I downloaded much more than I wanted with HTTrack, which resultat in very long download times and waste of diskspace.

But the biggest drawback was, that this approach saved a lot of garbage on my laptop, which I dind't wanted. Most of the pages show headers, footers, menus and a lot of advertisement. By using Firefox or HTTrack all of this was downloaded as well and resultet in bad search results in Google Desktop Search, this because menus referenced to other pages with the actual content and this menu links contained my search key words, of course Google Desktop Search listed them too.

Finally I found a much nicer and even easier approach. My dream team is called PDFCreater (http://pdfcreator.sourceforge.net/, http:www.pdfcreator.de.vu/) and Google Desktop Search. Everytime I see a page, an article or some content I would like to read offline, I print it with PDFCreator and save the pdf file on my disk. First of all, many pages offer a printer-friendly version of the content which results in a nice print and in this case in a nice pdf file. I don't have to create complex directory structures anymore, I just created a simple structure with meta descriptions and I save all the pages/pdfs in these directories. Google Desktop Search has no problem indexing the pdf files and it's also nicer to read them in Acrobat Reader (http://www.adobe.com/) than in Firefox. The final advantage for me is, that I can easily backup all the files and I use them as my knowledgebase.

If you have to work a lot offline, but you still would like to read current articles from the internet, you might want to try this approach. If you have a better approach, I'm very interested in knowing it... thanks in advance :)

No comments: