Wednesday, November 12, 2014

When you need to be synchronous with NodeJS

NodeJS is awesome. Its asynchronous nature is a great fit for many applications.
But sometimes you want to do something synchronously, without callbacks.

An example for such a need is a little script I was working on lately. The script is supposed to download files from some web services and save them to the disk.
This kind of task doesn't fit into the async model. Many requests were sent in a short time and I supplied a callback to wait for the responses. Each request is an open socket and my machine would throw an error that there are too many open.

There are 3 possible solutions (that I see currently):
1. Restrict the number of open requests to N. When a request is done, its callback would notify the restricting resource and a new request that is waiting would take its place.
2. Just make it synchronous. No callbacks.
3. Use promises. A very elegant solution, but requires libraries to work.

I took the 2nd option and in my case, I had to find an http library for node that allows synchronous methods. So I used urllib-sync. This went well and it solved my problems.

In a different case, I didn't have the privilege of a nice library that will support synchronous methods. I needed to use Git from Node, and the libraries I checked out only allowed async methods.

So I needed to work around this and have the callback to call the next request.

This is the basic idea:

function nextAsyncCall() {
    asyncMethod(function (err, result) {
        // Some logic maybe..
        nextAsyncCall();
    });
}

nextAsyncCall();

This basic skeleton is completely synchronous. The next action will be performed only after the previous one is complete.

So if you are sure you need synchronous behavior - first look for a library that can help you (if you need a sync version of the X library it's usually X-sync or something similar). If there is no such library you can see which of the two solutions above fits you (pool or simply sync).

Thursday, October 30, 2014

An offline StackOverflow clone

My current organization operates in private networks (no connectivity with the internet AT ALL). Beyond the regular arguments of "I don't have Facebook!" / "I can't read the news every 5 minutes!", there are some other, more serious problems: developing software is really hard with no access to the internet. Just think about it: when you write code, how many times a day do you search for info or problem solutions on the web? My guess is A LOT.

Developers in my organization are struggling with this issue, and iv'e seen the pain in their eyes when they are forced to look for an unoccupied internet computer. They have one internet computer per team, at best, and these too loose connectivity from time to time.

This issue is a major productivity killer, to which no one seriously addressed before. So, a few months ago I woke up in the morning and thought to myself: "Why shouldn't I bring the internet to them?". I figured that the most cost-effective thing to do was to bring some kind of a clone of StackOverflow into the network. This is a single source of data being used a lot by every developer.

Luckily, As it turns out, StackOverflow publishes its data as XMLs, every 3 months! It was a real pain getting the data in (it's 14GB compressed), but I finally got it into the network.
Now, I could get the data, but I was still missing a GUI to display the data. I can't just download StackOverflow's site.

So I built a nice little GUI using Play Framework 2, AngularJS and Twitter Bootstrap. That took about a week (maybe some day I will publish it, although it's not hard to build yourself).

I still had to find a solution for the data storage. The final architecture of the app was a web interface talking directly to a Elasticsearch node holding all the data. Getting the data in was not too fun - I wrote some Python scripts that took the XMLs, transformed them to JSONs (since Elasticsearch is a JSON document storage), and sent them (using cURL) to the Elasticsearch node. The uploading process took awhile, because of the large data volumes.

Currently, the application is running for several months and my organization has slightly happier developers (~600). :)

The project (named XXXOverflow - XXX being the name of the organization) apparently inspired some other developers that suggested all kind of interesting ideas for the application. In the future we plan to expand the searching sources of the application, and make it a highly customized little Google for the devs in my organization.

Other organizations, which are in the same position (disconnected from the internet), have asked me to give them the code of the app and help them implement it in their own networks.

Some technical notes
Elasticsearch is an open source search engine solution. I used it to store the data for the app. ES is really great and it made my life so much easier. It's default search algorithm searches through 70GB of textual data with a split of a second, which is pretty amazing to me. Although when I tried to customize the ranking algorithm using their Query DSL, it really slowed down the search speed (I used ES 1.0.0).
Also, I had (and still have) issues with failing shards. It's probably the amount of data, but occasional searches just bring down shards. And not too seldom. I hope these issues will be addressed in future releases.

Tuesday, August 19, 2014

Cesium in action

Cesium is a WebGL virtual globe and map engine.
You can use it to build time-aware GIS applications. What do I mean by "time-aware"?
Suppose you want to let the user playback scenarios on the map. For instance, meteor hits over the years or satellite/airplane tracking.
Most map engines don't "understand" time. The way to implement this kind of functionality using ArcGIS for JS for example, is using different layers for each discrete moment. This is very heavy, not easy to implement and shows only discrete moments.
Cesium, on the other hand, supports continuous playback (and even comes with a clock and timeline built-in!). Cesium offers a data format, called CZML, which is basically a regular JSON array. In this array, you can specify elements. These elements could be points, polylines, polygons, text labels, images or even 3D models. For each element you can specify properties that will determine the element's life span, position, size, color, texture and a bunch of other things.
For example, you could say that a blue point with a black border will appear for 5 seconds at 2014-05-05 18:00:00 at longitude 34, latitude 35, height 0. This is represented as the following CZML:


Seems simple, right? it is. CZML provides a way to paint a scenario for Cesium to play, and you can do amazing things with it, just look at the Cesium Samples Page.
The rest of this post is some issues and best practices with Cesium.
I used Cesium for the past few months to build a small analysis tool for some clients. They can load Excel files that specify times and coordinates of events. The application will allow playback of those events and some advanced processing of those events.
Our application needed to create CZMLs on the fly, so we wrapped CZML creation in a nice small JS API. It is bad to just create CZMLs with actual JSON. This could cause code duplication and performance issues.
Beware of interpolation! Cesium supports interpolation of some properties, which means that if the color of the point is yellow at 5PM and green at 6PM, when playing back you will see the color changes gradually from yellow to green, instead of changing momentarily at 6PM. That is awesome, sometimes. For us it was mostly annoying, because we didn't need it usually, so we had to do an annoying work-around of keeping the same color 5 seconds before the changing time. That way the interpolation would happen only during those 5 seconds, which is short enough to be unnoticeable.
Interpolation sometimes causes acute performance issues. We ran into such a problem in the evening, while loading the application with real data, which was large enough to make the playback hideously slow and then the engine would collapse and Cesium would throw a weird error and just stop working. We spent 5 hours fixing this problem... It turned out to be interpolation again. Luckily, we found out that the show property wasn't interpolatable and we used it to show a point in specific times, instead of changing the alpha of the color to make it disappear.
Also, we noticed that using the show property sometimes causes performance issues, if you put more than one interval in the show array. The workaround is to tell it when not to appear (negative infinity to start time 1, end time 1 to start time 2, end time 2 to positive infinity). Weird but works :)
Loading a Cesium data source is possibly a heavy action, because it reads and processes your CZML. If you have several data sources that gets updated, use multiple data sources for a viewer (yes, Cesium supports that: viewer.dataSources) and load only the relevant ones. We used to load all data sources, even when just one data source gets updated. Looking for performance bottlenecks in our application, we noticed this inefficiency. So we made this process modular, loading only dirty data sources.

Hope this helps someone!

Sunday, June 29, 2014

FoodBetter - a simplistic recipe management application using MeteorJS

Recently, I decided to look into MeteorJS - an open source, full-stack web framework, that wields Javascript, mongoDB and NodeJS to create a simple way for creating reactive web applications.
Like all things in life, to really get to know something, it's not enough to just read and talk about it - you have to get dirty and actually use it for something. So I built a simple application for something that I was missing in my personal life - a recipe management app.
The app is here, and the code is in here.
The app was built for learning purposes and probably has bugs. It is, like said, very simple, and I will love to hear you ideas and feature requests here. You can fork it or even send me pull requests :)

What is awesome about MeteorJS?

  • Meteor supports 3-way binding. This means that when a client changes some data in his browser, another client that looks at the same data in a different computer sees the change immediately. You get that for free, no extra infrastructure code required so you can focus on your business logic. This is pretty amazing in my eyes. This feature makes Meteor an ideal choice if you need to implement a real time game or something like Google docs collaborative editing.
  • You can deploy to Meteor's test servers with a single command. That's right - you can just create a new application (meteor create myApp) and immediately publish it to the world (meteor deploy myapp.meteor.com). This sends your application to Meteor's test servers and makes it available for everyone to use, free of charge. Of course that if you are creating something real you should spend a few dollars and host it somewhere. Because it all runs on NodeJS, you can package the app with meteor bundle and publish it through heroku or nodejitsu. Anyway, the meteor deploy option is great if you want to get your app up and running in no time.
  • Javascript everywhere. You write JS in the server-side too, making it very simple to transfer and manipulate data.
Being a framework, Meteor takes away some of the control that you usually have. For instance, you don't use the script tag anymore, since Meteor will just load all files that reside in certain directories. Maybe it's ok, but you still have to learn about Meteor's loading process. To use Bower, for instance, you have to install a special Meteor package..

Meteor's documentation and community seems really great, and I will continue to follow MeteorJS, and develop FoodBetter with it. 


Monday, June 9, 2014

index in #each expression (Meteor & Spacebars)

Suppose you want to write an #each expression in a Meteor.js template. And suppose that you want to print the index of an item. Something like:

{{#each steps}}
    {{index}} : {{stepDescription}}
{{/each}}

This is a legitimate need, and even available in Handlebars, using {{@index}}, as you can see in this stackoverflow thread or this issue.

Meteor does not support this option yet (0.8).  They even mentioned this issue in their wiki:
Syntax extensions. Handlebars syntax is extremely minimal, and we foresee adding some additional well-chosen extensions over time. (We will also implement the top features of current Handlebars that are missing from Meteor, like #each that supports objects and lets you access the current index or key.)
This looks promising. But for people who need a solution right now, you can just create a helper that adds an _index property to each item in the array:

Saturday, March 15, 2014

Edit-in-place input field implementation with AngularJS

While working on a little side project with Rails and AngularJS, I needed an edit-in-place (or click-to-edit) functionality for text input fields.

Edit-in-place means that the text input field will be toggled between edit-mode and preview-mode, and when in edit-mode - you can persist the change to the server and continue working without page refresh.

This approach is becoming common in reactive web-apps and SPAs, while The "save" button for the whole form is gradually disappearing.

My implementation provides a directive, an HTML template, a controller (in the directive) and a service. With this configuration your controllers remain untouched, since the directive's controller is injected with a generic service and uses it directly. The service uses Restangular to talk to my Rails back-end.

So here's the Gist:

Yes, this code still needs some refactoring..