Tip of the week for September 23, 2005 includes a Gzip compression module for Lasso 8.1. The module allows Lasso to automatically compress all pages using Gzip for compatible browsers. The full source code for the modules is included and is a good example of how to create at-begin and at-end processes.
Note - This tip of the week requires the recently released Lasso Professional 8.1. The included code will run on earlier versions of Lasso 8, but with significantly reduced performance.
Introduction
Gzip compression allows the HTML of a Web page to be compressed before it is delivered to the browser. The browser automatically decompresses the page before it is shown to the user so it is completely transparent. Using Gzip compression can improve speeds for dialup users and can reduce the overall bandwidth required by a Web site.
All modern browsers and even many Web crawlers support Gzip compression. HTML files can see compression of up to 75% (so only 25% of the bandwidth/time is used to download the page). Compression and decompression use very little processor power so the time used for compression is often covered by transmission time savings.
Automatic compression will only occur if the client's browser specifies an Accept-Encoding header that includes Gzip. Compression will only occur on files with a Content-Type of text/* (e.g. text/html, text/xml). Compression will not occur on any image or other media types. Compression will not occur for Netscape 4.x browsers since some of them had a bug where they would claim they understood Gzip even when they didn't. Compression will also occur for pages fetched by Web crawlers if they specify the Accept-Encoding header.
Important - Gzip compression is not compatible with the -UseLink option of the [Session_Start] tag. The compression actually occurs before the session manager has a chance to decorate the links. Gzip compression should only be used with -UseCookie sessions or on sites that don't rely on Lasso's built-in session implementation.
Download and Installation
Download the archive from the following URL and decompress it into the "Gzip Site Compression.lasso" file.
<http://support.omnipilot.com/article_files/Gzip%20Site%20Compression.zip>
Place the "Gzip Site Compression.lasso" into a site-specific "LassoStartup" folder and restart the site to enable Gzip compression for an entire site. Place the file into the "LassoStartup" folder in the Lasso Professional 8 application folder to enable Gzip compression for all sites hosted by Lasso.
Alternately, the file can be included in individual pages in order to enabled Gzip compression for that page only. Or, the tags can be copied out of the file and called on individual pages for more flexibility.
Note - The file includes a variable "gzipcompressionlog" which is initially set to True. This enables detail logging of every page that is served with Gzip compression. This variable can be set to False in order to cut down on the size of the log files.
Implementation
This section walks through the implementation of the Gzip compression module. The function of much of the code is described in general terms. You can follow along through the actual code by downloading the archive from the URL above.
The file implements four tags (and one bonus tag) and then automatically calls the appropriate tag to either install site-wide compression or to compress the current page.
For site-wide compression the [Site_UseGzipCompression] tag is called. The tag installs itself as a site-wide at-begin process which will be called once for each page load on the site just prior to executing the Lasso code of the page itself. When called as an at-begin process the tag simply calls the [Page_UseGzipCompression] tag.
When the [Page_UseGzipCompression] tag is called it installs itself as an at-end process which will be called just after the Lasso code of the page itself completes processing. When called as an at-end process the tag checks that the current client is compatible with Gzip, calls the [Compress_Gzip] tag to compress the page contents, and adjusts the headers to report that the page has been compressed.
The [Compress_Gzip] tag is a general purpose tag that can be used to compress any file. The tag requires a -Data parameter that specifies the data for the file to compressed. The tag also takes an optional -Name parameter which can be used if an actual file is being compressed. The tag returns a byte stream which contains the compressed file.
For example [File_Write: 'myfile.gz', (Compress_Gzip: -Data=(File_Read: 'myfile.txt'), -Name='myfile.txt')] could be used to compress a file myfile.txt into myfile.gz. This file could then be decompressed by any Gzip utility.
The [Compress_Gzip] tag uses the tag [Encode_CRC32] to create a 32-bit Cyclic Redundancy Check checksum. This tag is built-in to Lasso Professional 8.1, but an implementation in LassoScript is included for earlier versions of Lasso. The tag [Encode_Adler32] is included as a bonus.
[Compress_Gzip]
This tag performs Gzip compression on some data and returns the result. Gzip compression is similar to the Zlib compression which Lasso's [Compress] tag returns, but has a different wrapper around the compressed data.
The wrapper starts with a header that includes flags which identify the file as including Gzip compression, the date and time the data was compressed, and the optional name of the file. This is followed by data compressed using the "deflate" algorithm. The wrapper concludes with a 32-bit CRC checksum and the original size of the data before compression. These final values are used by the decompressor to check that the compressed data wasn't altered in transit.
The wrapper is assembled into a byte stream using the same techniques as described in the prior tip of the week "Network Tags and Internet Protocols" which can be read at <http://www.omnipilot.com/Tip%20of%20the%20Week.1768.8420.lasso>.
Lasso's built-in [Compress] tag performs Zlib compression which uses the same "deflate" algorithm as Gzip compression, but has a different wrapper. In order to get the compressed data for the Gzip compression we simply strip the Zlib wrapper off the output of the [Compress] tag.
The 32-bit CRC checksum for the data is returned using the [Encode_CRC32] tag which is implemented natively in Lasso 8.1. An implementation of this tag is also provided so the [Compress_Gzip] tag will work in earlier versions of Lasso. However performance in earlier versions of Lasso is not sufficient to use Gzip compression automatically on every page.
[Page_UseGzipCompression]
This tag installs itself as an at-end process so that it will be executed after the Lasso page has finished processing, but just before it is served to the client. Then, when the tag is called as the at-end process it performs the actual Gzip compression of the data of the page and adjusts the HTTP headers.
When the tag is first called the following code is executed. The first line parses the Accept-Encoding header from the [Client_Headers]. If the header includes the "gzip" option then the client understands Gzip compression. The [Define_AtEnd] tag is then used to register the [Page_UseGzipCompression] tag as an at-end process.
Local: 'accept_encoding' = (String_FindRegExp: Client_Headers,
-Find='Accept.Encoding:[^\r\n]+', -IgnoreCase);
If: (#accept_encoding->Size) > 0 && (#accept_encoding->First >> 'gzip');
Define_AtEnd: \Page_UseGzipCompression=(Array: -atend);
/If;
The -AtEnd parameter lets the [Page_UseGzipCompression] tag know that it is being called as an at-end process rather than as a normal tag. If this parameter is seen then the following code is executed.
First, the Content-Type, Content-Encoding, and Client-Type headers are collected from the outgoing HTTP header.
Local: 'content_type' = (String_FindRegExp: $__http_header__,
-Find='Content.Type:[^\r\n]+', -IgnoreCase);
Local: 'content_encoding' = (String_FindRegExp: $__http_header__,
-Find='nContent.Encoding:[^\r\n]+', -IgnoreCase);
Local: 'client_type' = Client_Type;
Next, a series of conditions are checked. Only if all of the conditions are false will the Gzip compression actually be performed. Gzip compression will not be performed if an error occurred on the page, if the Content-Type is not a text type, if the Content-Encoding says the page has already been Gzip compressed, or if the Client-Type identifies the browser as Netscape 4.
Finally, the Gzip compression is actually performed. The size of the uncompressed page data is stored. The page data is compressed. If the compressed data is smaller than the uncompressed data then the HTML reply is set to the compressed data and a Content-Encoding header is inserted into the outgoing HTTP headers. Finally, a log entry is created describing the compression that has occurred.
Local: 'start' = _date_msec;
If: (Error_Code != 0);
// Don't compress error page
Else: (#content_type->Size > 0) && (#content_type->First !>> 'text/');
// Only compress text/* documents
Else: (#content_encoding->Size > 0) && (#content_encoding->First >> 'gzip');
// Don't compress if the page has already been compressed
Else: (#client_type >> 'Mozilla/4') && (#client_type !>> 'MSIE');
// Don't gzip for Netscape 4.x
// MSIE identifies as Netscape 4.x, but does support gzip
Else;
Local: 'size' = $__html_reply__->size;
Local: 'data' = (Compress_Gzip: $__html_reply__);
If: #data->size < #size;
$__html_reply__ = @#data;
$__http_header__->(RemoveTrailing: '\r\n');
$__http_header__ += '\r\nContent-Encoding: gzip\r\n';
if: ($gzipcompressionlog == True);
Log_Detail: 'GZIP compressed ' + #size + ' bytes to ' +
#data->size + ' bytes saving ' +
(percent: 1 - (decimal: #data->size) / (decimal: #size)) +
' in ' + (_date_msec - #start) + ' msec.';
/if;
/If;
/If;
[Site_UseGzipCompression]
This tag installs itself as an at-begin process so that it will be executed before each Lasso page on the site is processed. When the tag is called as an at-begin process it simply calls the [Page_UseGzipCompress] tag so that it can perform the actual Gzip compression as described above.
When the tag is first called the following code is executed. The first line ensures that this tag is only called within LassoStartup. The [Define_AtBegin] tag is then used to register the [Site_UseGzipCompression] tag as an at-begin process. Finally, a log entry is created so the administrator knows that this site is being automatically compressed using Gzip.
Fail_If: (Response_FilePath !== ''), -1, 'This tag must be called at startup';
Define_AtBegin: \Site_UseGzipCompression=(Array: -atbegin);
Log_Warning: tag_name + ' - Site is now using automatic gzip compression.';
The -AtBegin parameter lets the [Site_UseGzipCompression] tag know that it is being called as an at-begin process rather than as a normal tag. If this parameter is seen then the [Page_UserGzipCompression] tag is simply called.
Page_UseGzipCompression;
[Encode_CRC32] and [Encode_Adler32]
The [Encode_CRC32] tag implements a 32-bit CRC checksum on the data it is passed. This tag is implemented natively in Lasso 8.1. If the Gzip compression page is called in an earlier version of Lasso then this LassoScript implementation of the tag will be used. The tag uses a table of pre-calculated values and a series of bit shifts and bitwise and operators to calculate the checksum.
The [Encode_Adler32] implements a similar checksum which is used in the wrapper for the Zlib compression method. It could be useful if you want to implement a Gzip decompressor. Although writing such a decompressor is left as an exercise for the reader.
More Information
More information about the tags used in this tip are available in the Lasso 8 Language Guide or in the online Lasso Reference <http://reference.omnipilot.com>.