Writings

Async Graph Traversal in JavaScript

Part 1 Regular Traversal

Why not on the server?

In my last post I briefly discussed what a graph is and one example of how we can traverse it. However, as is frequently the case for web based applications sometimes the entire graph is not preloaded in memory. In these scenarios the first inclination is to run the graph search on the server, where it will naturally run much faster. However, what if you are app is running offline? This is frequently true for mobile hybrid apps, but could also be equally true for apps where the data is stored in a browser DB like indexedDB. In either of these scenarios the graph traversal must take place locally.

Why async?

But you might still try to ask- why async? Why not just preload the entire graph into memory and then traverse it normally like discussed in the previous post? The truth is if that option is available than that is a better solution. Async graph traversal is only a good idea as a last resort. For example, if you are dealing with a very large dataset- loading all the data when you will only need a small fraction of it is probably not the wisest choice.

Lets take the example that your app is running offline and all of your data is stored inside a large indexedDB. While indexedDB theoretically has two API's a synchronous and a asynchronous for the time being (and the foreseeable future) only the aysnc version is supported by browsers. If you are ever worked with asynchronous API's before you know they are blessing and a curse. They are great because they don't lock the user into waiting for the operation to finish. But they are terrible for your code since they frequently force you into call-back/promise soup whereas synchronous code just lends better to cleaner flows.

Async Graph Traversal Example

In this example we loading the data from a fictitious indexedDB libary. Each time a call is made to load the data the flow of the algorithm is broken, and any subsequent code will be called right away instead of with the result of the DB retrieval. As a result, we have no choice but to use a recursive function. The recursive function receives the results of the traversal up to the point it was called to avoid going down the same node path more than once, and to keep track of the results. We know we have finished traversing the graph when the queue is empty and there is nothing left to add to the queue. At that point instead of calling the recursive call again- we can call the callback and exit the function- freeing up the stack of closures that have amassed which each subsequent call.

function traverseGraph(member, queue, visitedIds, callback){
  //have we NOT gone down this route already
  if(visitedIds[member.id]){
   //mark it as visited
   visitedIds[member.id) = member.name;
     if(queue.length > 0){
       //take the first item out of our queue
       var memberId = queue.shift();
       memberDB.get(memberId).then(function(member){
         var nextIds = member.connections;
         for(var i = 0,l = nextIds.length; i < l; i++){
           var nextId = nextIds[i];
           //have we NOT gone down this route already
           if(visitedIds[nextId]){
             //add it to the queue
             queue.push(nextId);
           }
         }
       }).catch(function (error) {
         //if the promise fails
         console.error(error.message);
       }).finally(function(){
         //we've finished traversing, the queue is empty
         if(queue.length == 0){
          //has our last id been marked as visited yet?
          if(!visitedIds[memberId]){
            //FIX ME mark it as visited (name is null)
            visitedIds[memberId) = "";
          }
          //we're done here, tell callback
          callback(visitedIds);
         }else{
           //call recursively until the queue is empty
           traverseGraph(memberId, connections, visitedIds, callback);
         }
       });
     }//END if connections not empty
    } //END if visited before
}//END function