Summary:
Tip of the week for March 12, 2004 discusses how to parse and display an RSS feed on your Web site.
Description:
RSS is a simple format in which Web sites can post information in a way that desktop applications and other Web Application Servers can understand. This is a format that is getting more press recently so it is interesting to see how easy it is to implement in Lasso.
Note: Many people say RSS stands for Really Simple Syndication and it's probably easiest to think of it this way. However, RSS actually stands for RDF Site Summary and RDF in turns stands for Resource Description Framework.
This article covers how Lasso can be used to read an RSS feed and display the items contained within in a Web site. A prior article covered how to publish an RSS feed using Lasso. Taken together the techniques in these two articles can be used to transmit data from one Lasso server to another.
- Format of an RSS Feed discusses the basic XML format for RSS.
- Parsing an XML Feed using the XML Tags discusses using the [XML] type to parse an RSS feed and format the results. The code in this section can be downloaded in a file RSS_Example.lasso linked at the end of the article.
- Parsing an XML Feed using the XML Streams Tags discusses using the [XMLStream] type to parse an RSS feed and format the results. [XMLStream] is a much more efficient method of parsing large amounts of XML data. The code in this section can be downloaded in a file RSS_Example.lasso linked at the end of the article.
Additional details about RSS can be found at this IBM Web site: An introduction to RSS news feeds <http://www-106.ibm.com/developerworks/library/w-rss.html> or a Google search will turn up countless resources. Some additional resources are listed at the end of this article.
Note: The code in this tip is for Lasso Professional 7. The basic techniques of using the [XML] type will work in Lasso Professional 6, but some changes to the code may be necessary. [XMLStream] is only supported in Lasso Professional 7.
Format of an RSS Feed
An RSS feed is a simple XML format with the following basic structure. The header must include an XML prefix and a DOCTYPE. The RSS feed is wrapped in <rss> ... </rss> and <channel> ... </channel> tags.
The first block of information provides the title, link, and a description of the Web site that is hosting the feed. Most RSS readers will display this information so users can visit the Web site of the feed. The second block of information provides an optional image for the site.
Each <item> ... </item> is one headline that will be displayed to the user. You can include as many items as you need for your site. Usually, the items will change daily or more often as the contents of your site is updated.
<?xml version="1.0"?>
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http?//my.netscape.com/publish/formats/rss-0.91.dtd">
<rss version="0.91">
<channel>
<title>Web Site Title</title>
<link>http?//www.example.com/</link>
<description>A short description of the Web site.</description>
<language>en-us</language>
<image>
<title>Image Title</title>
<url>http?//www.example.com/example.gif</url>
<link>http?//www.example.com/</link>
<width>144</width>
<height>36</height>
</image>
<item>
<title>Item Title</title>
<link>http?//www.example.com/... link to item ...</link>
<description>A short description of the item.</description>
</item>
</channel>
</rss>
In order to display the RSS feed in a format suitable for our Web site we need to parse the incoming XML data.
Parsing an XML Feed using the XML Tags
This section shows how to parse an incoming RSS feed using the [XML] type. This method is suitable for parsing relatively short RSS feeds (10-20 items). For longer feeds the next technique using [XMLStream] should be used. The code in this section can be downloaded in a file RSS_Example.lasso linked at the end of the article.
The code is included as a LassoScript at the top of the page and breaks down into the following steps:
- The XML data of the feed is loaded using [Include_URL] and parsed using the [XML] tag. For this example an RSS feed from the online magazine Slate is used.
- Variables to store the details of the channel, optional image, and the individual items are created.
- The [XML->ExtractOne] tag is used to pull the first channel out of the RSS feed. We iterate through the children of the channel tag using [XML->Children]. The children of the channel tag are one of three types:
- If the child is an item then we parse and iterate through its children. Each child tag is stored in a map including the name of the tag and the contents of the tag. Each item is then stored in the 'items' variable.
- If the child is an image then we parse and iterate through its children. Each child tag is stored in the 'image' variable including the name of the tag and the contents of the tag.
- If the child is any other tag then the name of the tag and the contents of the tag are stored in the 'channel' variable.
<?LassoScript
var: 'url' = 'http?//slate.msn.com/rss/';
var: 'feed' = (include_url: $url);
var: 'xml' = (xml: $feed);
var: 'channel' = (map);
var: 'image' = (map);
var: 'items' = (array);
iterate: $xml->(extractone: 'channel')->children, (var: 'temp_child');
if: ($temp_child->name == 'item');
var: 'item' = (map);
iterate: $temp_child->children, (var: 'temp_item');
if: ($temp_item->name != 'text');
$item->(insert: $temp_item->name = $temp_item->contents);
/if;
/iterate;
$items->(insert: $item);
else: ($temp_child->name == 'image');
iterate: $temp_child->children, (var: 'temp_item');
if: ($temp_item->name != 'text');
$image->(insert: $temp_item->name = $temp_item->contents);
/if;
/iterate;
else: ($temp_child->name != 'text');
$channel->(insert: $temp_child->name = $temp_child->contents);
/if;
/iterate;
?>
At the end of this code there are three variables established. The 'channel' variable includes details of the channel itself. The 'image' variable includes details of an optional image for the channel. And, the 'items' array includes a series of maps representing each of the items in the channel.
This LassoScript is written in a general format and can be used to parse many incoming RSS feeds. The formatting code that follows needs to be tuned to the particular RSS feed that is being formatted (and customized from the generic HTML we present here so it fits in with the target Web site).
The channel is now formatted using a mix of HTML and LDML square bracket code. information about the channel is output using [output: $channel->(find: 'TAGNAME')]. Each of these tags is similar to a [Field] tag, but the 'TAGNAME' describes what data from the $channel variable should be output.
We then iterate through the 'items' map and output information about each item using [output: $item->(find: 'TAGNAME')].
<h3>[output: $channel->(find: 'title')]</h3>
<p>[output: $channel->(find: 'description')]
<br /><a href="[output: $channel->(find: 'link')]">
[output: $channel->(find: 'link')]</a>
[protect]<br />Refreshed: [date_format: $channel->(find: 'pubdate'), -format='%D'][/protect]
<br />[output: $channel->(find: 'copyright')]</p>
[iterate: $items, (var: 'item')]
<hr />
<p><b>[output: $item->(find: 'title')]</b>
([output: $item->(find: 'category')])
<br />[output: $item->(find: 'description')]
<br /><a href="[output: $item->(find: 'link')]">
[output: $item->(find: 'link')]</a>
[protect]<br />Posted: [date_format: $channel->(find: 'pubdate'), -format='%D'][/protect]</p>
[/iterate]
<hr />
Parsing an XML Feed using the XML Stream Tags
This section shows how to parse an incoming RSS feed using the [XMLStream] type. This method is much faster than the [XML] method described above, but can be a little more difficult to understand. The code in this section can be downloaded in a file RSS_Example.lasso linked at the end of the article.
The code is included as a LassoScript at the top of the page and breaks down into the following steps:
- The XML data of the feed is loaded using [Include_URL] and parsed using the [XMLStream] tag. For this example an RSS feed from the LDML Reference is used.
- Variables to store the details of the channel, optional image, and the individual items are created.
- The 'state' and 'more' variables are used to store the current state of the XMLStream. The 'state' variable records what the current parent tag is. The 'more' variable records whether there are more tags to parse. A while loop is used to continue looping until there are no more tags.
- Inside the while loop the type of the current XML tag is checked. Depending on whether it is a start tag, end tag, or text element a different action is taken:
- Start tags are inserted into the 'state' variable to record where we are in the XML stream. If the start tag is for an item then we create a new 'item' variable.
- End tags are pulled from the 'state' variable. If the end tag is for an item then the 'item' variable is inserted into the 'items' array we are building.
- Text elements are inserted into the 'channel', 'image', or current 'item' variable depending on their parent (via the state variable).
- The end of the loop advanced the XML stream using the [XMLStream->Next] tag.
<?LassoScript
var: 'url' = 'http?//reference.omnipilot.com/rss/';
var: 'feed' = (include_url: $url);
var: 'xml' = (xmlstream: $feed);
var: 'channel' = (map);
var: 'items' = (array);
var: 'more' = true;
var: 'state' = (array);
while: $more;
if: ($xml->nodetype == 'startelement');
$state->(insert: $xml->name, 1);
if: ($xml->name == 'item');
var: 'item' = (map);
/if;
else: ($xml->nodetype == 'endelement');
$state->(remove: 1);
if: ($xml->name == 'item') && (var_defined: 'item');
$items->(insert: $item);
var: 'item' = (map);
/if;
else: ($xml->nodetype == 'text');
if: ($state->first != 'item') && ($state >> 'item');
$item->(insert: $state->first = $xml->value);
else: ($state->first != 'item') && ($state >> 'image');
$image->(insert: $state->first = $xml->value);
else: ($state->first != 'channel') && ($state >> 'channel');
$channel->(insert: $state->first = $xml->value);
/if;
/if;
var: 'more' = $xml->next;
/while;
?>
At the end of this code there are three variables established. The 'channel' variable includes details of the channel itself. The 'image' variable includes details of an optional image for the channel. And, the 'items' array includes a series of maps representing each of the items in the channel.
This LassoScript is written in a general format and can be used to parse many incoming RSS feeds. The formatting code that follows needs to be tuned to the particular RSS feed that is being formatted (and customized from the generic HTML we present here so it fits in with the target Web site).
The channel is now formatted using a mix of HTML and LDML square bracket code. information about the channel is output using [output: $channel->(find: 'TAGNAME')]. Each of these tags is similar to a [Field] tag, but the 'TAGNAME' describes what data from the $channel variable should be output.
We then iterate through the 'items' map and output information about each item using [output: $item->(find: 'TAGNAME')].
<h3>[output: $channel->(find: 'title')]
<img src="[output: $image->(find: 'url')]" align="right" float="right"></h3>
<p>[output: $channel->(find: 'description')]
<br /><a href="[output: $channel->(find: 'link')]">
[output: $channel->(find: 'link')]</a></p>
[iterate: $items, (var: 'item')]
<hr />
<p><b>[decode_html: $item->(find: 'title')]</b>
<br />[decode_html: $item->(find: 'description')]
<br /><a href="[output: $item->(find: 'link')]">
[output: $item->(find: 'link')]</a></p>
[/iterate]
<hr />
Additional Resources
If the code in this file doesn't seem very simple, there are also some third-party tools which help make reading (and publishing) RSS feeds easier.
Lassolution.com hosts a tool by Olivier Miossec called RSS_RDF which is available here: <http://www.lassolution.com/news/166702437600/> Lasso syndication Tools rss_rdf tag.
File Reference:
<http://support.omnipilot.com/article_files/RSS_Example.zip>