Ben Lisbakken, Gears API team
November 2007
I am a support engineer on Gears, and I'm in love with it. Everyday I sit down and think about what projects, sites, mashups, or fun little tools could benefit from Gears. Well, the truth is there's a lot. Sites with a lot of static information -- Wikipedia, any API documentation, web-based email -- would be great to be able to use when no internet connection is available. But what if you're a user that always has an internet connection? Then adding Gears to a site doesn't do much, right? Wrong. Imagine your favorite website is now stored on your computer, and it syncs whenever there's altered content. Whenever you look at the site, your browser is grabbing everything straight from your hard drive. Did you just make a search for your best friend on Facebook? Don't wait 5 seconds the next time that search runs, have the results immediately! Meanwhile, save the webmasters' precious bandwidth/server power!
But alas, if only it were a reality that every site was Gears enabled -- the internet faster and available anywhere. You'd need every site to implement Gears -- GearsMonkey to the rescue!
By using Gears with the Firefox Greasemonkey plugin, you can inject Gears code into any website that you want. Don't wait for your favorite website to enable offline support -- do it yourself.
Follow along as I show you step-by-step how to take Wikipedia offline.
You need the following tools to take websites offline:
Here is an outline of what we need to do to take Wikipedia offline. It is a step-by-step guide of what needs to happen as soon as a Wikipedia page is loaded.
First, I created two scripts, wikipedia_offline.user.js and wikipedia_offline_img_grabber.user.js, to grab stuff from en.wikipedia.org (HTML and CSS) and upload.wikimedia.org (images, sound clips, videos), respectively. I also needed to use an iFrame whose source was in the origin of upload.wikimedia.org in order to be able to download the media files from that origin. By having two files, it was much easier to separate the logic in the main window from that in the iFrame. After I was satisfied with the scripts, I placed them in the same file.
After setting up the files with the correct @include origins, the next step is to inject the code to initialize Gears. Normally, we would just use code like this:
// WRONG GEARS INIT CODE var server = null; var store = null; function init() { if (!window.google ||!google.gears) { return; } try { localServer = google.gears.factory.create('beta.localserver','1.0'); } catch (ex) {} store = localServer.openStore(STORE_NAME); }
However, this won't do in our case for two reasons. First, our being in the (safe) Greasemonkey sandbox prevents us from being able to access window.google. The second reason is because Gears doesn't allow us to create an instance until the user has "allowed Gears to run on this site". Normally, this dialog would come up as soon as we tried to do a google.gears.factory.create, but this dialog will only trigger if an event is fired which does a GearsFactory().create. This behavior is specific to running Gears in Greasemonkey.
So how do we get around these issues?
For the first, instead of using window.google we will be using unsafeWindow.google. unsafeWindow is actually window.wrappedJSObject which is the raw window object inside the XPCNativeWrapper provided by the Greasemonkey sandbox. To learn more about it (and the security risks of using it!) look here: http://wiki.greasespot.net/UnsafeWindow
Now our code looks like this:
// Correct gears init code var server = null; var store = null; function initGears() { if (!unsafeWindow.google) unsafeWindow.google= {}; if (!unsafeWindow.google.gears){ unsafeWindow.google.gears= {factory: new GearsFactory()}; try { server = unsafeWindow.google.gears.factory.create('beta.localserver', '1.0'); store = server.createStore("wikipedia_offline"); } catch(e) {} }
For the second issue, all we have to do is call initGears() when the script loads. If server == null, we know that Gears hasn't initialized correctly and we need to make an event trigger that does a GearsFactory().create. The best event for this is the load event on the window. That way, as soon as the user navigates to a Wikipedia article they will be asked to allow Gears to run on en.wikipedia.org as well as upload.wikimedia.org.
Here we check if we need to trigger the load event:
initGears(); if (!server){ triggerAllowWikipediaDialog(); }
In the triggerAllowWikipediaDialog() function, we have:
function triggerAllowWikipediaDialog(){ window.addEventListener("load", function(){ new GearsFactory().create('beta.localserver', '1.0'); location.href = location.href; return false; }, true); }
What we just did was get around the fact that the allow dialog won't pop up when creating a gears factory unless an event is triggered to do this. After the allow dialog is accepted, the page will reload and we can create our Gears factory and stores.
We now have a Gears factory with a Gears ResourceStore on en.wikipedia.org! From here we need to insert a link that will trigger the capture of the HTML + CSS. The function insertCacheLink inserts a [Cache Page] link.
Next, we need to create a database so that we can store the URLs that have been captured. Here is the new initGears function that includes the creation of the database.
function initGears() { if (!unsafeWindow.google) unsafeWindow.google= {}; if (!unsafeWindow.google.gears){ unsafeWindow.google.gears= {factory: new GearsFactory()}; }try{ server = unsafeWindow.google.gears.factory.create('beta.localserver', '1.0'); store = server.createStore("wikipedia_offline"); db = unsafeWindow.google.gears.factory.create('beta.database', '1.0'); if (db) { db.open('wikpedia_offline'); db.execute('create table if not exists WikiOffline' + ' (Article varchar(255), URL varchar(255), ' + 'TransactionID int)'); } } catch(e) {} }
Now, we have to create the functionality for the [Cache Page] link. This means we have to grab all URLs to be stored, with getCSSLinks, then store the links in the new database, with saveInCacheHistory, and lastly call capture to actually store the resources in Gears.
For saveInCacheHistory, we want to be sure to use unique IDs for the article entries so that we can delete all the resources from the article when needed. The most important piece of code for this function is just inserting an entry into the database.
db.execute('insert into WikiOffline values (?, ?, ?)', [docTitle, cssURLs[cssURLs.length-1], maxTID]);The reason why we don't insert all CSS links in there is because we won't remove all CSS when we remove this article -- all articles use the same CSS so no point in continually adding/removing it.
Now we call capture on the array of CSS + HTML URLs. The capture function is pretty simple -- if the ResourceStore exists then we call the ResourceStore method, and give it a callback function for when it completes.
function capture(url) { if (!store) { setError('Please create a store for the captured resources'); return; } store.capture(url, function(url, success, captureId){ console.log("Captured: "+url); }); }Now all CSS and HTML has been stored!
Here's where things get a little ugly. You will notice that each image (or for that matter, any media) link is from another origin. All HTML and CSS are from en.wikipedia.org and all media are from upload.wikimedia.org. Due to security risks, Gears prevents cross-origin captures. Thus, we can't do a capture of any media files stored on upload.wikimedia.org from code being run on en.wikipedia.org.
There are two ways around this, both of them involving iFrames:
Insert an iFrame into the page with the src being in the upload.wikimedia.org origin and the resource URLs to grab after the hash in the src. Then run a Greasemonkey script on upload.wikimedia.org that will initialize Gears and capture the URLs located after the hash in the iFrames href.
Do the same as method 1, except use the cross-origin WorkerPool. This is a new feature in Gears versions 0.2+. It allows you to initialize a worker thread from a JavaScript located on another origin.
In method 1, when the user first runs this script, we will have to trigger an event to initialize Gears to run on upload.wikimedia.org. This will cause the "Allow Gears to run on this site" security dialog. This will be inelegant because upon first use, the user will have to click the "always allow" for two security dialogs (note: one for each domain we grab resources from).
But, using method 2, we can sneak our way around the second security dialog! In Gears, the WorkerPool is allowed to use cross-origin workers (read: threads). When these workers are initialized, they are given a block of code which will be what they run. This block of code can be a literal block of code, or a URL to a .js file. If that .js file is on another origin, it is allowed to run as long as the .js file has worker.allowCrossOrigin in it.
So why use a cross-origin worker? By default, as soon as a cross-origin worker is used, Gears automatically adds that site to the list of sites allowed to run Gears. Now we can initialize Gears silently without having the annoying "Allow upload.wikimedia.org to run Gears" popup!
However, we can't use this option with Wikipedia because we would need to get Wikipedia to host the worker.js file! So we are stuck with method 1.
First step is to get that iFrame in there! Thinking ahead, there are three times we will want to insert an iFrame:
function insertIFrame(imgLinks) { var offlineControlsDiv = document.getElementById('offlineControlsDiv'); var iFrame = document.createElement('iframe'); var iFrameSRC = "http://upload.wikimedia.org/"; iFrame.name = 'grabPictures'; iFrame.width = '500px'; iFrame.height = '40px'; iFrame.frameBorder = 0; iFrame.style.overflow = "hidden"; iFrame.style.align = "left"; iFrame.style.display = "block"; iFrame.style.fontSize = "10px"; if (imgLinks != ""){ iFrameSRC += "#" + imgLinks; iFrame.src = iFrameSRC; } else { iFrame.src = iFrameSRC; } offlineControlsDiv.appendChild(iFrame); }
Now we will work on logic for whenever our Greasemonkey script runs on the upload.wikimedia.org domain. In our script we should be doing a quick check to see what domain we're on -- upload.wikimedia.org or en.wikipedia.org -- because that will determine what logic we should use.
function isMediaPage() { if (location.href.indexOf('upload.wikimedia.org') != -1) { return true; } else { return false; } }
Here is the main code we run when we are in the iFrame:
if (isMediaPage()) { hideHTML(); if(errorMsgs.indexOf('Uncheck "Work Offline" in the File menu') == -1 && errorMsgs.indexOf('Server not found') == -1) { initUploadGears(); if (!server){ triggerAllowWikimediaDialog(); } else{ if ( location.hash.length > 5 && location.hash != "#undefined") { var parameters = location.hash.substring(1,location.hash.length); var removeLoc = parameters.indexOf("||remove||"); if(removeLoc == -1) { addToStore(parameters); } else { parameters = parameters.substring(0,removeLoc); removeFromStore(parameters); } } } } }
There's a few interesting things to note here. First, we check if the page says anything about 'Uncheck "Work Offline" in the File menu' or 'Server not found'. If Gears is initialized on an error page, it will crash the browser. This is specific to Greasemonkey + Gears -- if you just had Gears code on a page (i.e. not injected by Greasemonkey) then if there was an error accessing the page, the Gears code would never run!
The next thing is that we have initUploadGears. This is a different function than the en.wikipedia.org init. They are different because we don't need the database in the iFrame.
triggerAllowWikimediaDialog is essentially the same thing as triggerAllowEnDialog -- it just triggers the "Allow this site to run Gears" dialog for upload.wikimedia.org.
If the server has been initialized, though, then we are going to look at the hash in the current URL. If we have something in the hash, then that means we are either adding or removing resources from the ResourceStore. When the iFrame was created, ||remove|| was inserted into the hash if we were removing the resources listed in the hash; otherwise we assume that we are caching them.
The logic for adding and removing from stores is very simple: either call store.capture on the URL array, or loop through it, calling store.remove on the URLs.
After adding some UI touchups and a few modifications to the functions, we have our final product!
Hopefully you have learned what it takes to inject Gears code into any webpage by using Greasemonkey. Using the full functionality of the Gears library, there's a lot that can be done with existing web applications. The whole process is fairly simple, unless there are resources on a page that come from different domains. That's when you have to start using the more complex iFrame approach to sidestep the anti-cross-origin policies in the Gears ResourceStore.
Good luck creating your own GearsMonkeys!