Sunday, June 26, 2011

JSON Compression algorithms

About

JSON (Java Script Object Notation) is a lightweight data-interchange format. It is easy for humans to read and write. It is easy for machines to parse and generate. It can be used as a data interchange format, just like XML. When comparing JSON to XML, it has several advantages over the last one. JSON is really simple, it has a self-documenting format, it is much shorter because there is no data configuration overhead. That is why JSON is considered a fat-free alternative to XML.

However, the purpose of this post is not to discuss the pros and cons of JSON over XML. Though it is one of the most used data interchanged format, there is still room for improvement. For instance, JSON uses excessively quotes and key names are very often repeated. This problem can be solved by JSON compression algorithms. There are more than one available. Here you'll find an analysis of two JSON compressors algorithms and a conclusion whether JSON compression is useful and when it should be used.

Compressing JSON with CJSON algorithm

CSJON compress the JSON with automatic type extraction. It tackles the most pressing problem: the need to constantly repeat key names over and over. Using this compression algorithm, the following JSON:

[
  { // This is a point
    "x": 100, 
    "y": 100
  }, { // This is a rectangle
    "x": 100, 
    "y": 100,
    "width": 200,
    "height": 150
  },
  {}, // an empty object
]
Can be compressed as:

{
  "templates": [ 
    [0, "x", "y"], [1, "width", "height"] 
  ],
  "values": [ 
    { "values": [ 1,  100, 100 ] }, 
    { "values": [2, 100, 100, 200, 150 ] }, 
    {} 
  ]
}
The more detailed description of the compression algorithm, along with the source code can be found here:

Compressing JSON with HPack algorithm

JSON.hpack is a lossless, cross language, performances focused, data set compressor. It is able to reduce up to 70% number of characters used to represent a generic homogeneous collection. This algorithms provides several level of compression (from 0 to 4). The level 0 compression performs the most basic compression by removing keys (property names) from the structure creating a header on index 0 with each property name. Next levels make it possible to reduce even more the size of the JSON by assuming that there are duplicated entries.

For the following JSON:

[{
  name : "Andrea",
  age : 31,
  gender : "Male",
  skilled : true
}, {
  name : "Eva",
  age : 27,
  gender : "Female",
  skilled : true
}, {
  name : "Daniele",
  age : 26,
  gender : "Male",
  skilled : false
}]
the hpack algorithm produces a compressed version which looks like this:

[["name","age","gender","skilled"],["Andrea",31,"Male",true],["Eva",27,"Female",true],["Daniele",26,"Male",false]]
More details about hpack algorithm can be found at project home page.

Analysis

The purpose of this analysis is to compare each of the described JSON compressor algorithms. For this purpose we will use 5 files with JSON content having different dimensions, varying from 50K to 1MB. Each JSON file will be served to a browser using a servlet container (tomcat) with the following transformations:

  • Unmodified JSON - no change on the server side
  • Minimized JSON - remove whitespaces and new lines (most basic js optimization)
  • Compressed JSON using CJSON algorithm
  • Compressed JSON using HPack algorithm
  • Gzipped JSON - no change on the server side
  • Gzipped and minimized JSON
  • Gzipped and compressed using CJSON algorithm
  • Gzipped and compressed using HPack algorithm

Results

This table contains the results of the benchmark. Each row of the table contains one of the earlier mentioned transformation. The table has 5 columns, one for each JSON file we process.
\
json1 json2 json3 json4 json5
Original JSON size (bytes) 52966 104370 233012 493589 1014099
Minimized 33322 80657 180319 382396 776135
Compress CJSON 24899 48605 108983 231760 471230
Compress HPack 5727 10781 23162 49099 99575
Gzipped 2929 5374 11224 23167 43550
Gzipped and Minimized 2775 5035 10411 21319 42083
Gzipped and compressed with CJSON 2568 4605 9397 19055 37597
Gzipped and compressed with HPack 1982 3493 6981 13998 27358

Relative size of transformations(%)

The relative size of transformation graphic is useful to see if the size of the json to compress affects the efficiency of compression or minimization. You can notice the following:
  • the minimization is much more efficient for smaller files. (~60%)
  • for large and very large json files, the minimization has constant efficiency (~75%)
  • compressors algorithms has the same efficency for any size of json file
  • CJson compressing algorithm is less efficient (~45%) than hpack algorithm (~8%)
  • CJson compressing algorithm is slower than hpack algorihtm
  • Gzipped content has almost the same size as the compressed content
  • Combining compression with gzip or minimization with gzip, doesn't improve significantly efficiency (only about 1-2%)

Conclusion

Both JSON compression algorithms are supported by wro4j since version 1.3.8 by the following processors: CJsonProcessor & JsonHPackProcessor. Both of them provide the following methods: pack & unpack. The underlying implementation uses Rhino engine to run the javascript code on the serverside.

JSON Compression algorithms considerably reduce json file size. There a several compression algorithms. We have covered two of them: CJson and HPack. HPack seems to be much more efficient than CJson and also significantly faster. When two entities exchange JSON and the source compress it before it reach the target, the client (target) have to apply the inverse operation of compression (unpacking), otherwise the JSON cannot be used. This introduce a small overhead which must be taken into account when deciding if JSON compression should be used or not.

When gziping of content is allowed, it has a better efficiency than any other compression algorithm. In conclusion, it doesn't worth to compress a JSON on the server if the client accept the gzipped content. The compression on the server-side does make sense when the client doesn't know how to work with gzipped content and it is important to keep the traffic volue as low as possible (due to cost and time).

Another use-case for JSON compression algorithm is sending a large JSON content from client to server (which is sent ungzipped). In this case, it is important to unpack the JSON content on the server before consuming it.

Thursday, March 10, 2011

Build Time Javascript Code Analysis with JsHint

Javascript can be tough to maintain. The bigger is your project, the harder it will be to ensure that everything is ok. Luckily, there are tools to help you with that.

Paul Irish and Anton Kovalyov recently launched JSHint, a online JavaScript checking tool. The tool is very similar to JSLint, but is designed to be more customizable and community-oriented. JSHint can help you to detect errors and potential problems in JavaScript code and to enforce your team's coding conventions. It is very flexible so you can easily adjust it to your particular coding guidelines and the environment you expect your code to execute in.

The purpose of this post is to show you how to use jsHint as a build-time solution using wro4j maven plugin.

Build Time Javascript Code Analysis

Installing an automatic build process for your projects is very common today and best practice. In the java world Maven is a very popular build tool and has proven its matureness over the years. Using maven to build your client-side project has the following advantages:

  • Find potential problems in your code. Bugs might be identified and breaking code conventions can be detected early.
  • Run the build automatically on a regular base
  • Easily handle refactorings and other small changes
  • Ability to do continuous integration
  • Increase confidence in code quality for each build

Introduce wro4j maven plugin

Since version 1.3.5, wro4j maven plugin provides a new goal called jshint which can help you to start auditing and enforcing JS code through a mechanism like JSHint.
In order to use it you have to follow several simple steps.

Project layout

By default, wro4j maven plugin relies on a typical maven project layout.
This structure can be different, depending on your project. The implicit wro4j maven plugin settings will assume you are using this structure. By default it will search for a file called wro.xml at the following location src/main/webapp/WEB-INF/wro.xml. The location of this file is configurable. You'll find more about all available configuration below. The purpose of wro.xml file is to describe the way static resources are grouped and the order in which these should be processed. By default the plugin will process each resource one by one.

Configure pom.xml

Add wro4j maven plugin to the list of plugins in your web project:

  
    ro.isdc.wro4j
    wro4j-maven-plugin
    ${wro4j.version}
    
      
        compile
        
          jshint
        
      
    
    
      devel,evil,noarg
    
  

${wro4j.version} - is the latest wro4j version (1.3.5). The goal which instructs wro4j to run jshint tool is called jshint. Besides this goal, wro4j maven plugin provides also run goal which performs the compression of the resources (both js & css), but this is not in the scope of this post.
This plugin allows you configure the options used by jshint. Specifying these options is optional. For the complete list of available options, visit the JsHint project homepage. As you can see in the above example, options configuration contains a comma separated values.

Detailed plugin configurations

  • options - comma separated values used to instruct jsHint how to analyze the js code.
  • failNever - boolean flag. When true - the project build will succeed even if there are errors reported by jsHint. By default this value is false.
  • ignoreMissingResources - if false, the build will fail if at least one resource is missing or cannot be accessed, otherwise only a warning will be displayed.
  • targetGroups - (optional) a comma separated list of groups you want to build. If you do not specify this parameter, a file for each defined group will be generated.
  • wroManagerFactory - Optional attribute, used to specify a custom implementation of MavenContextAwareManagerFactory interface. When this tag is not specified, a default implementation is used. This attribute is useful if you want to configure other processors than default one.
  • wroFile - the location of wro.xml file which defines how resources are grouped. By default its value is this: ${basedir}/src/main/webapp/WEB-INF/wro.xml . If you have a different project structure or a different location of wro.xml, then you should change the value of this parameter.
  • contextFolder - defines the location of web application context, useful for locating resources relative to servletContext. By default its value is: ${basedir}/src/main/webapp/
These parameters gives you enough control to customize the wro4j maven plugin to work with any project structure. When running the plugin and any problems are encountered, you'll see a detailed list of the errors which indicate the file containing the problems, line number where the problem is and the detailed message describing the problem. Here is an example:
[error] 1 errors found while processing resource: classpath:ro/isdc/wro/maven/plugin/js/undef.js Errors are: [ro.isdc.wro.extensions.processor.algorithm.jshint.JsError@19d75ee[
  line=4
  character=4
  reason='jQuery' is not defined.
  evidence=})(jQuery);
]]
These informations should be enough to help you to fix any problems you may have.

Conclusion

Using wro4j maven plugin with jsHint goal can help you to ensure coding standards on your front-end resources when using a mavenized project.

Sunday, February 13, 2011

Simple client-side build system with wro4j command line tool

Intro

Wro4j is a free and Open Source Java project which will help you to improve your web application page loading time. It can help you to keep your static resources (js & css) well organized, merge & minify them at run-time (using a simple filter) or build-time (using maven plugin or a command line tool) and has a dozen of other features you may find useful when dealing with web resources.

The main target of wro4j framework is help to speed up web applications to be optimized by implementing a couple of recommended performance rules.

Starting with wro4j-1.3.4 release, it is not limited anymore to java development environment. If, for instance, you are developing a js framework with many small modules (ex: jquery, yui, mootools, etc) or if you are working on the client-side of a large web project, then wro4j can help you to easily organize all your static resources (css/js) and build them (merge and minimize) using a simple command line tool. The only prerequisite is to install jdk-1.6 on your machine.

This post will focus on description of the wro4j command line tool.

Target audience: this post may be interesting not only for java developers, but also for web developers (using any kind of technology) and css/javascript framework creators.

Getting Started

Let's make it practical. We'll show how wro4j can help building the jquery tools project.

Currently, it uses ant build script (build.xml) to describe how the js resources are merged and minimized. Also it supports google closure compiler only. Switching to another compressor is not supported. Using an ant script can be a good solution for many similar projects, but it also can be quite complex. Having a verbose and complex script is very hard to understand and maintain. Isn't there a simpler solution?

Installation Steps

With new wro4j command line tool, you can achieve the same results with minimum effort. All you have to do, is to follow the following steps:

  1. Add wro4j-runner-1.3.4-jar-with-dependencies.jar. In our case, we add it to the lib folder (just where other jar files resides). The default relative context path depends on the location of the jar (this can be changed with an argument we'll get back to this later).
  2. Create wro.xml file and add it in the same folder where the jar is located. . This file describes how you want your resources to be merged and the resources location. For more details about wro.xml, visit this page. Here is an example:
    
      
        /jquery-1.4.2.js
        /../src/**.js 
      
    
    
  3. Run the following in your console:
    java -jar wro4j-runner-1.3.4-jar-with-dependencies.jar
    
    As a result, a new folder (called 'wro') will be created. It will have one file: all.js containing a merged content of all js files from ui folder (as described in wro.xml).

Minimization (compression) configuration

Of course, it would be nice to have the all.js file compressed. In order to do that, make a small change to the console script:
java -jar wro4j-runner-1.3.4-jar-with-dependencies.jar -m
Adding -m attribute, inform the wro4j runner to minimize the scripts. By default, it uses JSMin processor for js compression. You can easily switch this compressor with other, here is an example:
java -jar wro4j-runner-1.3.4-jar-with-dependencies.jar -m -c uglifyJs
This will inform wro4j runner to use UglifyJs compressor instead. Similarly, you can use other compressor.

You don't have to worry when invoking the wro4j-runner with wrong arguments, it will inform you about the cause of the problem and in some cases can suggest possible solutions. Also, when everything is ok, you will see in command line the details about processing and total duration of the operation.

Supported Compressors

Currently wro4j-runner support the following js compressors:
  • jsMin - For JsMin compressor. This one is used by default if you don't specify any.
  • uglifyJs - For UglifyJs compressor
  • beautifyJs - Exactly the opposite of the uglifyJs, it does what it says - makes compressed code readable.
  • googleClosureSimple - For Google Closure Compiler with simple optimization
  • googleClosureAdvanced - For Google Closure Compiler with advanced optimization
  • yuiJsMin - For YUI compressor with no munge
  • yuiJsMinAdvanced - For YUI compressor with munge
  • packerJs - Uses Dean Edwards Packer compressor (version 3.1)
  • dojoShrinksafe - Uses Dojo Shrinksafe compressor.

You can easily switch between any of the preferred compressors, depending on your tastes or preferences. Maybe for some projects one compressor suites better than other. The nice part is that wro4j can support any possible existing javascript compressors. For more details visit wro4j project home page.

Wro4j runner arguments

Here you'll find all the arguments supported by wro4j runner tool.
  • -m or --minimize - Turns on the minimization by applying the default or specified compressor
  • -c or --compressor - Name of the compressor to use. The complete list of supported compressors was described earlier.
  • -i or --ignoreMissingResources - This is useful when you want the runner to do its job even when you specify an invalid resource in your wro.xml. By default missing resources are not ignored.
  • --targetGroups ${GROUPS} - Comma separated list of groups (defined in wro.xml) to process. If you don't specify this argument, all existing groups will be processed and for each of them will be created a file.
  • --destinationFolder ${PATH} - The folder where the target groups will be generated. By default it will create a folder called wro
  • --wroFile ${PATH} - location of the wro.xml file. By default runner will search it in the current folder.
  • --contextFolder ${PATH} - folder used as a root of the context relative resources (or where to search when you have a resource starting with / character). By default this is the current folder.

You can find the updated version of jquery tools project using the new wro4j runner for merging and compressing resources at the following location: https://github.com/alexo/jquerytools.

Summary

Starting with wro4j-1.3.4 release, you can easily maintain, merge and minimize client-side resources (css & js) with wro4j-runner command line tool. By following 3 simple steps, you can compress your static resources into a single compressed file, using one of 8 supported javascript compressor. This tool is easily customizable, flexible and can help you simplify the way you are building your client-side project.

Resources

  1. Wro4j project home
  2. Wro4j github homepage
  3. Jquery tools project page - used as an example for this post
  4. Jquery tools using wro4j runner example

Sunday, January 30, 2011

Using Google Closure Compiler with wro4j maven plugin

This post is about how you can integrate Google Closure Compiler into your project with wro4j maven plugin. Google Closure Compiler is a JavaScript optimizing compiler. It parses your JavaScript, analyzes it, removes dead code and rewrites and minimizes what's left. It also checks syntax, variable references, and types, and warns about common JavaScript pitfalls. It is used in many of Google's JavaScript apps, including Gmail, Google Web Search, Google Maps, and Google Docs.

Since 9 december 2010 Google Closure Compiler is on Maven Central Repo. Nevertheless, there are people who complains that there is no official maven plugin which to help you easily to integrate it into their project. Bellow is a comment: (you can find it here)

Let me get this straight: it took over a year for someone at Google to figure out how to get the library into Maven, and still nobody realized at that point there needed to be a plugin and some documentation on how to use the plugin? And now no one at Google can figure out how to write a Maven plugin, so this issue has been "fixed" by saying that there is some third-party plugin---that according to the documentation requires you to manually install the Google Closure Compiler? I'm sure there are many like me who just want to plug the Google Closure Compiler into their Maven project; have found this page; realized the state of understanding of Maven here (as stated in the comments); and moved on to find an alternative.
Wro4j does provide google closure support since 1.3.0 version is out. Maybe because the wiki documentation is not enough, I'll describe how to do it in this post. The wro4j maven plugin, is a very handy tool when you need a build time solution for merging and minimizing resources used in the project. There is also a runtime solution which is as easy as adding a simple filter in web.xml, but here we will focus only on how to use it as a build time solution.

Configuration


Project Structure

A typical 'mavenized' web project has the following structure:
This structure can be different, depending on your project. The implicit wro4j maven plugin settings will assume you are using this structure. What is important for wro4j maven plugin is the following: the location of wro.xml file (the group of resources descriptor) and the location of context folder (used to compute context relative path of resources). Both of these values are configurable.

Create wro.xml

First step is to create wro.xml file which defines the way you want your resources to be grouped (how to merge them). You can find more details about wro.xml file format here. This is a sample wro.xml:


  
    /static/css/lib/global-whitespace-reset.css
    /static/css/lib/tools.css
    /static/css/lib/base.css
    /static/css/lib/layout.css

    /static/js/modernizr/modernizr-1.1.min.js
    /static/js/lib/core.js
    /static/js/lib/site.js
  

Add wro4j maven plugin dependency to pom.xml

Add the following plugin dependency to your web project:

  
    ro.isdc.wro4j
    wro4j-maven-plugin
    ${wro4j.version}
  

${wro4j.version} is a placeholder for wro4j version (latest is 1.3.3). This is the minimum necessary to get started. At this point, you can already start using it by running the following in command line:
mvn wro4j:run
As a result, you'll see that for each defined group in wro.xml, you'll find a minimized version of the resources at the following default location: /src/main/webapp/wro/. At this point, you are not yet using google closure compiler for compressing javascript resources. Instead, a default minimizer is used. In order to replace it with google closure compiler, you have to apply some more advanced configurations on wro4j maven plugin.

Advanced plugin configurations

Below is an example of how you can configure wro4j maven plugin with all possible parameters:

  
    ro.isdc.wro4j
    wro4j-maven-plugin
    ${wro4j.version}
    
      
        compile
        
          run
        
      
    
    
      all
      true
      ${basedir}/src/main/webapp/wro/
      d:/static/css/
      d:/static/js/
      ${basedir}/src/main/webapp/
      ${basedir}/src/main/webapp/WEB-INF/wro.xml
      ro.isdc.wro.extensions.manager.standalone.GoogleStandaloneManagerFactory
      false
    
  

Notice the following:
ro.isdc.wro.extensions.manager.standalone.GoogleStandaloneManagerFactory
This instructs wro4j maven plugin to use google closure compiler. And here is a complete list with explanation of all supported parameters (all of them are optional, if you don't want to specify them, the default value will be used instead).
  • minimize - a flag used to turn minimization on or off. This parameter is optional and by default its value is true.
  • targetGroups - (optional) a comma separated list of groups you want to build. If you do not specify this parameter, a file for each defined group will be generated.
  • wroManagerFactory - Optional attribute, used to specify a custom implementation of MavenContextAwareManagerFactory interface. When this tag is not specified, a default implementation is used. This attribute is useful if you want to configure other processors than default one.
  • wroFile - the location of wro.xml file which defines how resources are grouped. By default its value is this: ${basedir}/src/main/webapp/WEB-INF/wro.xml . If you have a different project structure or a different location of wro.xml, then you should change the value of this parameter.
  • contextFolder - defines the location of web application context, useful for locating resources relative to servletContext. By default its value is: ${basedir}/src/main/webapp/
  • destinationFolder - folder where the result merged resources will be created.
  • cssDestinationFolder - folder where the css merged resources will be created
  • jsDestinationFolder - folder where the js merged resources will be created
  • ignoreMissingResources - if false, the build will fail if at least one resource is missing or cannot be accessed, otherwise only a warning will be displayed.
These parameters gives you enough control to customize the wro4j maven plugin to work with any project structure.

Conclusions

Though there are no official maven plugin for google closure compiler, you can use it with wro4j maven plugin. All you have to do, is to create wro.xml file (for describing how resources are grouped), add wro4j maven plugin to pom.xml and enjoy the outcome.

Sunday, September 5, 2010

Syntax Highlighting in Your Blog

This article describes about how syntaxHighlighter javascript can help you to add code snippets to your blog and how wro4j can help you with it.

There are plenty of sites which helps you to start bloging. But when your blog posts contains some code, suddenly you are not sure anymore which suites you better. Some of them accepts markdown markup language, which allows you to insert code easily without bothering about formatting. Unfortunately markdown has its own disadvantages. For instance,  its hard to define code boundaries or if you want to add xml code, you have to manually escape it. That can be annoying and time consuming.

While searching for an alternative, I have found a javascript widget which solves my problem. It is called SyntaxHighlighter was developed by Alex Gorbatchev and it fits perfectly for my needs. It is very easy to install and it doesn't have the problems I had with markdown.

Below is the example of how the markdown syntax looks like: And this is the example of syntaxHighlighter usage:
For more examples, visit documentation about syntax highlighter installation.
Syntax Highlighter is a great javascript widget, but in order to support formatting for different languages, you have to include a lot of js & css resources. For each language, syntaxHighighter with a dedicated javascript file of the following format: shBrush<language>.js (ex: shBrushJava.js). That means that if you want to support formatting for 10 different programming languages, you have to include more than 10 resources (besides those). This can damage your blog page loading performance. And this is why I decided to use wro4j to fix this problem.

Wro4j maven plugin seemed to be a natural choice to solve this use-case. I have downloaded the syntaxHighlighter distribution and added it to wro4j-examples project which has the maven infrastracture ready. You can checkout the wro4j code from github and play yourself with it.

The first step was to create the configuration file which describes how to group the resources into the bundles:


The next step is to configure pom.xml of the project and instruct it to use wro4j maven plugin

Replace the ${wro4j.version} placeholder with actual wro4j version.
This is the simplest way to make it work. The wro4j maven plugin is very easy to configure, so that instead of default js & css compressor we could use others, like YUI or Google closure compressors. This can be done this way:

If you want to use google closure compressor, use the following value for wroManagerFactory configuration: ro.isdc.wro.extensions.manager.standalone.GoogleStandaloneManagerFactory
In order to run it, we have two options:
1) As a part of the build process of the project. (run mvn install command)
2) Explicitly run wro4j maven plugin: mvn wro4j:run

As a result, you should see the groups bundled under the target/wro folder (the destination folder is also configurable)

To make the resources public available, these resources were copied to google code repository, under the following path:
1) http://wro4j.googlecode.com/svn/wiki/static/syntaxHighlighter/3.0.83/syntaxHighlighter.js
2) http://wro4j.googlecode.com/svn/wiki/static/syntaxHighlighter/3.0.83/syntaxHighlighter.css

One small note regarding the committing these resources to google code repository: by defualt the files commited with svn client have text/plain mime type. This could cause a problem if you expect a different mime type. If you need a different mime type, you have to specify it with your preferred svn client by adding svn:mime-type property with expected value (ex: text/css). Thus, when you will include this resource into your page, it will be served with correct content type in response header.

The installation of the syntaxHighlighter javascript widget is trivial. Just add the resource to the head of the blog template:

    <script src='http://wro4j.googlecode.com/svn/wiki/static/syntaxHighlighter/3.0.83/syntaxHighlighter.js' type='text/javascript' />
    <link href='http://wro4j.googlecode.com/svn/wiki/static/syntaxHighlighter/3.0.83/syntaxHighlighter.css' rel='stylesheet' type='text/css' />