Skip navigation

It’s out.

It transformed.

And it’s more than meets the eye.

(Download links at the end of post ↓)

For one increment in minor version, jOrder went through significant changes. In fact, more work hours, checkins, and lines of code went into getting from 1.1.0.14 to 1.2 than all that leading up to 1.1.0.14.

I set these four goals for version 1.2:

  • complete refactoring / restructuring
  • near-100% unit test coverage
  • implementing consistent data modification
  • maintaining performance figures

Restructuring

Two words: spaghetti code. Thousand-something lines in one JS file is just not bearable. Before 1.2, jOrder lacked any structure beyond the table and index objects. There still lay a good amount of code that could be lumped together, for the sake of extensibility, into smaller, self-containing units.

jOrder 1.2 Internal Object Structure

The technical challenge here was keeping 1.2 backward compatible. It seems that the right mixture of inheritance and method delegation cracked the problem.

For instance, index was broken down into four objects: indexsignaturelookup, and order. The latter two inherit from signature, while both delegate methods to index, the object that keeps all index operations together, and from the outside, looks about the same as before.

The class diagram illustrates the new structure. It’s not a thorough class diagram, for it shows only the public properties and methods. It does indicate however the connections between components. A more detailed post on jOrder’s inner workings will follow sometime soon.

With this change, jOrder achieved the following:

  • what previously was about 1000 lines in one file, is now 1500 (heavily commented) lines in 12 files neither of them being bigger than 400 lines
  • what belongs together (and only that) is actually in the same file now

Test coverage

Over 130 unit tests (and growing) make sure the library works exactly as it should, and that backward compatibility issues did not and will not arise. Indeed, thanks to these tests, right after refactoring, jOrder worked in live applications with minimal adjustments.

The tests are built upon QUnit.

Check them out in the jOrder GitHub repo.

Consistent modification

Modifying data was already working in pre-1.2 versions, but you had to rebuild the indexes manually after each insert, update or delete. For ordered indexes, this meant another Array.sort() on the whole table for each affected column.

Since browsers implement different sorting algorithms for Array.sort(), it is quite unpredictable how many steps between O(1) and O(n*log(n)) re-ordering will take depending on the current table order.

As updating wasn’t much of an issue in my own jOrder-based applications until now, this method – inefficient as it was – sufficed. But as you draw closer to the limits of jOrder in reducing traffic by getting rid of unnecessary ajax calls – you start to look elsewhere, ie. reducing the size of ajax calls that remain. That’s only possible when you keep client and server data in sync. Which, when dealing with tens of thousands of rows, makes frequently calling Array.sort() not to be an option.

Instead, each time there’s an insertion or update, we look up the position of the new entry within each index, and splice it in. As benchmarks show, building an index this way this is not much slower (about 10-20% on average) than running Array.sort() once, and you have a consistent index after each insert.

You can still opt for leaving indexes untouched after changes, but then you’ll have to take care of rebuilding the indexes manually.

Performance figures

Severe changes to a project’s codebase always risks performance issues. As the primary goal of jOrder is to be fast, I needed to make sure the benchmarks still put it in the same ballpark as before. So I took all JavaScript data manipulation projects I could find, and compiled a rather comprehensive performance comparison. I even wrote a benchmark framework (jOrder Benchmarks or jOB) for that purpose.

Luckily, according to the benchmarks, I didn’t mess up jOrder’s existing advantage in speed. The figure below is taken from the linked benchmark page and shows how jOrder performs against other libraries in initializing, searching and sorting a table of 1000 rows.

Big table benchmarks

One of the next couple of posts will expand on benchmarking and general browser performance issues that have an impact on jOrder’s speed. Until then, feel free to check out the benchmark source (benchmarks.js).

Get it

Download (or hotlink to) these files:

Here’s the first post in the “tips & tricks” section. It’s going to introduce a workaround for one of jOrder’s partially implemented features: N-column sorting.

When sorting by a column that’s composed of no more than a few distinct values, it’s reasonable to expect one of the remaining columns to take a secondary order, so that

  1. the ordered table won’t look ugly, and
  2. sorting will behave predictably even when the algorithm it’s using is not stable.

jOrder is pretty straightforward when it comes to sorting by one field. You add the ordered index with the name and type of the field to sort by just like below, and you’re all set.

var table = jOrder(JSON)
  .index('onefield', ['field'], {ordered: true, grouped: true});

Things can get pretty messy though, when you try to sort by more fields (primary, secondary, etc.) at the same time. The obvious approach would be to create an ordered index with more than one field.

table.index('twofields', ['field1', 'field2'], {ordered: true, grouped: true});

While jOrder allows this, it’s merely a forward-compatibility feature. Doing so won’t work as expected, and therefore, composite indexes cannot be used for sorting. If you do give it a shot, you’ll be surprised to find “Johnathan”, “Smith” preceding “John”, “Doe”, first name being the primary order, last name the secondary.

Under the hood

In order to understand why this happens, we have to take a closer look at the cogwheels of jOrder. When building a composite index, each row is given a signature, composed of its relevant values and concatenated with an underscore into a string. This signature will be the key in the index, pin-pointing the corresponding row in the original JSON table. So when Array.sort() compares the two rows, really, it compares the strings “Johnathan_Smith” and “John_Doe”.

Gaining control

Here comes the workaround. To make multiple column sorting work, for now, you’ll need composite fields instead of composite indexes: values that comply with the specific sorting you need. For instance, if you want first name – last name sorting, you’ll have to concatenate the first and last names with trailing spaces, so field values always have the same length. Array.sort() will then compare “Johnathan_Smith_____” and “John______Doe_______”, giving you the correct order.
The same approach is taken with numbers, but instead of padding and concatenation, you’ll use shifting and addition.

for (var idx = 0; idx < JSON.length; idx++)
    JSON[idx].composite = JSON[idx].year * 100 + JSON[idx].month;
var table = jOrder(JSON)
  .index('composite', ['composite'], {ordered: true, grouped: true, type: jOrder.number});

In the JSON table, fields ‘year’ and ‘month’ are both numeric.

Stay tuned

If this is so simple, why isn’t it alreadt built into jOrder, you might ask. The thing is, there are a bunch of factors to consider when adding anything to the core feature set. Speed, memory, possible loss of information; and this particular feature touches on most of them. There are at least three cases (string, numeric, mixed), each dealing with values of unknown length. That’s too many parameters.

Rest assured though, jOrder will certainly provide better support for multiple-column sorting in the near future, but in a way that these risks remain apparent to the developer.

Among the many things jOrder differs from conventional data table design, you’ll find that the concept of primary keys is completely missing.

No joins

On the surface, this could be explained by the lack of joins in jOrder. Without joins, the sometimes religious debate between natural and surrogate primary keys becomes pointless.

Besides, being just a remote copy of the data that’s on the server, one cannot guarantee that whatever identifies a row remains consistent with its origin and all the rest of its copies.

Before – after

There’s a deeper explanation though, rooted in jOrder’s update mechanism. jOrder.table.update() takes two rows: one before, one after the change. The engine looks up the ‘before’ row using the first unique index it finds, unless specified in the third, ‘options’ parameter. Then it changes the row inside the table to the one passed in ‘after’, and finally, updates the indexes.

In other words, the process is utterly indifferent towards which unique index is used. As long as it unambiguously identifies the row, the update process is happy, and that is the reason primary keys are irrelevant.

This doesn’t mean of course that you can’t have a separate field with unique and permanent values resembling a primary key and do all identifying operations using the index based on that field. It’s just not necessary.

The SQL connection

How to approach this mechanism from an SQL point of view? After all, that’s what most developers are familiar with.

Think of the ‘before’ row as the WHERE clause, but having all (indexed) fields specified, and the ‘after’ row as the SET part, but again, with all (indexed) fields specified. To make things easier, and resonate with the UPDATE command, you can use the result of a jOrder.table.where() as the ‘before’ row, so you needn’t know the contents of all its fields beforehand. Here’s how.

UPDATE table SET fieldToChange = 'new value' WHERE name = 'foo'

is equivalent to:

var table = jOrder([/*...*/])
	.index('name', ['name']);

//...

var before = table.where([{'name':'foo'}], {limit: 1, renumber: true});
var after = jOrder.copyTable(before);
after[0].fieldToChange = "new value";
table.update(before[0], after[0]);

The modifier “limit: 1″ is required for where() to return no more than one rows, while “renumber: true” makes sure the row in question is at zero index in the returned array “before”.

Too much effort?

From the above example it’s apparent that the jOrder version is more complex. Why jump through all these hoops to update one field in a single row?

  • For one, jOrder doesn’t expect a lot of updates. Inserts and deletions can be batched together, but updates should be rare. Its purpose is to help display the data the user wants to see as fast as possible: by means of search, sorting and paging.
  • Another principle of jOrder is not to conceal costly operations. If something has to be done that is a risk e.g. inside a loop, then let the developer do it and be conscious about it.

Of course, there’s plenty of room for improvement, and, keeping these principles in mind, there will be, too.

Having a huge table indexed at your fingertips is handy once the indexes are in place. But chances are, if you put 8-10 of them with ordering on a table of 10 to 20 thousand rows, it’ll take a while to get there. This contradicts the historical purpose of JavaScript, and, not surprisingly, some browsers won’t like the idea.

Blaming it on the code

IE pops up a terminate-script dialog after 5 million operations, while Firefox, Chrome and others do the same when they consider the current script to have run for long enough. The point is, in situations like the one described above, you’re facing a rather ugly outcome, i.e. the browser letting the user know you’ve done something wrong.

Now, solutions throughout the web given to this problem generally fall into two categories. The first one suggests that your code shouldn’t run longer than 100 ms, insinuating that anything not meeting up to this requirement is bad code. This I obviously can’t take sides with. Those in the second one advise to break down the script into smaller parts. While that’s good advice, how do you break down something that requires no user interaction at all?

Fortunately, the window object provides the means of breaking execution time, in exchange for re-formulating the accommodating code as event-based.

Going evented

This solution I call asynchronous indexing, or evented indexing, where the indexing process consists of consecutive calls to the jOrder.index() method, chained together by setTimeout’s. The result is a quasi-recursive function that is controlled from the nearest closure.

function addIndexes(table, indexes, progress, ready) {
	var i = 0, next;
	(next = function () {
		progress(indexes.length, i);
		table.index(indexes[i].name, indexes[i].fields, indexes[i].options);
		if (++i == indexes.length)
			return ready(table);
		setTimeout(next, 0);
	})();
}

Arguments:

  • table: jOrder object
  • indexes: array of objects having parameters to jOrder.index() as properties (name, fields, and options; see the jOrder reference for further details)
  • progress: callback for progress indication
  • ready: callback for when indexing has finished

This solution, apart from relieving us from the uncomfortable pop-ups, also leaves room for injecting a progress event into the loop, so not only do we reach our original goal, but we can even go beyond by giving the user feedback on where the process is standing.

If that’s not enough

Of course the question arises: what happens when applying a single index exceeds the execution time allotted by the browser? There are a few things you can do to lower the chance of that happening.

  • Make sure the index is not over-done. Any ordered index you won’t sort by, or grouped index on a unique field usually adds to CPU and memory demands.
  • Split up the process by adding rows in batches. Create the jOrder object using the first thousand rows and insert the following batches one by one. Note that in order to do this, at least one of the indexes must be unique (i.e. non-grouped). Check out the following example.

var table = jOrder(first1000rows)
	.index('first', ['First'])
	.index('second', ['Second']);

for (;;){
	table.insert(next1000rows);
}

In the pre-web era, databases were used mostly for storage. Unless you developed for a bank, you’ve pretty much gotten away with a basic knowledge of SQL. Then the shift from desktop applications to the browser changed the game by a great deal. Browsers – originally created to navigate between documents – were extremely inefficient at running scripts, so shifting computation towards the back end became sort of a necessity. Ordinary websites began to use server scripting, dynamic SQL and stored procedures, in the end of which we’ve gotten used to the fact that data processing equals back end. Even though in many cases operations such as filtering, search, and grouping belong strictly in the presentation layer, we still burden the server with all that.

This must change now.

The reasons:

  • Powerful browsers: With the death of IE6 and the gradual decay of IE7 cross-browser issues are basically out of the way. If a division remains it’s one between platforms (i.e. desktop vs. mobile) and not between browsers. The point is, today’s hardware (even smartphones) and browsers are powerful enough to cache and process tens of thousands of records.
  • Distributed systems: An entire niche of databases – offering amazing scalability and performance – sprung up in the past few years. Unable to do joins and multiple inequality filters by design, these systems are limited to data storage, just as databases of old used to be.

One may bring up that you can’t cache data in the browser between page loads. While that’s true, let’s do some math. Ten thousand records, in an average scenario would be around 1 Megabyte. Now, why is it better to download 1 Megabyte of data with each page load than having the database do the filtering and load only a fraction of it?

  • Bandwidth is cheaper than CPU.
  • Ajax: There are no page loads between data requests.
  • Client responds to search, filtering etc. faster.
  • Web apps already discourage the use of navigation buttons.

That said, jOrder clearly fulfills two purposes. The first is a technical one: boosting table operations in web apps; the second however, is to raise consciousness about the untapped opportunities browser scripting provides today.

When I first heard about jQuery, the “query” part in the name led me to think it had something to do with data querying. And, it’s in JavaScript, too, so it must be some sort of lightweight client-side data management library. It was a disappointment to learn that it’s not, but I’ve gotten to like and use jQuery since, nevertheless.

I still felt however that there was a gap left open for data manipulation within the browser.

Why:

  • Traffic. Search is mostly done on the client side nowadays, but sorting, paging and aggregation are still most likely to go back to the server and pull a new copy. The same goes for data modifications, when we don’t expect anyone but the current user to change the data he sees.
  • Performance. There are literally scores of JavaScript libraries, frameworks, jQuery plugins dealing with generating, styling and manipulating tables on the DOM level. Most of these support sorting for instance, by making a call to Array.sort() every single time you click on a sorting icon. Search often traverses the entire available table row by row to find what the user has entered.

There’s an app for that?

The obvious solution for boosting performance and decreasing traffic lies right before our eyes in the form of these miraculous things we call “database”. Yes, everything we need in order to resolve the above problems have been implemented as early as the 1970′s (or perhaps even earlier). So, why not bring them over into the realm of browser scripting?

Figuratively speaking, half the work is already done, thanks to JavaScript objects that function as hash tables. We only need to build tables and indexes on top of it preserving data integrity against modifications.

Then, we won’t need to worry about traffic:

  • we can cache a reasonable amount of data (tens of thousands of rows are obviously out of question – or are they?), and run search, sorting, aggregation, paging etc. quickly without making any Ajax calls
  • provided that we are working on data that only we can modify – client and server data can be kept consistent with only diffs being exchanged in between.

We won’t need to worry about performance either, as all of the features I just mentioned do nothing but pull already prepared data from the indexes instead of calculating them on-the-fly. ZAP!

Similarly to database tables however, the downside is introduced when we are modifying heavily indexed data. Every time a record is added, removed or changed, indexes need to be updated, which takes the more time the more indexes there are and the more complex they are. The impact may be reduced if indexes that don’t need to be ordered or grouped by will be generated and used differently than those that do.

First steps

So, led by the urge to put all this into practice, and of course the very projects that called for it – I got down to coding. I named the project jOrder (for obvious reasons), created my first ever public repository on GitHub, and started this blog. I’ll post case studies, improvements, changes on here, and once others join the project, there will be guest posts as well.

GitHub says there were 290 page views in the first two days. I’m not sure what that means, because there are no details like unique visitors, etc. and I can’t track downloads either. But I guess it’s not a bad start. We’ll see how it works out.

Wish me luck!

Follow

Get every new post delivered to your Inbox.