RSS

RSS is an XML standard originally designed for providing summaries of new content from a website "channel".

It is currently most heavily used for WebLog aggregation - an RssAggregator can be either a single-use package which aggregates the blogs on that user's subscription list (Publish And Subscribe), or a shared server which provides any of a number of features.

All UserLand tools generate RSS channels automatically.

  • Blogger does not. Blogger Pro added it in May'02.
  • Live Journal has turned it back on. 500k blogs, 200k+ updated in last month. Using RSS 0.91.
  • MovableType also has it, and it's turned on by default.

There's also a couple apps that will "scrape HTML" to generate RSS.

Also see Meatball Wiki:RichSiteSummary and Abbe Normal:WikisWithRss

Mark Pilgrim's "What is RSS" article is a great intro, including some Python code and comparison of the different versions. Once I get around to an RSS output for this wiki, I'll probably use v2.0 (for the per-item date info) (hmmm, I wonder whether PeerKat supports it?.

Dave Winer's spec

Comparison of required/optional elements across versions

Validators:

There are scalability issues associated with heavy use of RssAggregator software - see Publish And Subscribe.

What more "interesting" kinds of processing could be done on RSS repositories to make the BlogWeb more "emergent"?

Enterprise Collaboration Ware may generate RSS feeds (AppLog) so that updates can be browsed easily by various parties, integrated with other sources. At that point I would see a greater reason for an aggregation system. On the other hand, it may make more sense to read within the Collaboration Ware, where the security model is already defined.


There's an ongoing controversy over who "controls" the RSS spec. At the moment, Dave Winer de-facto shepherds 0.91 (see esp. "Timeline" section) and 0.92 but there's also a 1.0 which conforms to RDF. Dave fought this direction as an increase in complexity (and a lack of backward compatibility). But it was adopted, at least by some people, and they kept the name. Which is horribly confusing.

Some other perspectives:

the RDF fork led to Echo Standards, later renamed to Atom Standards

Aaron Swartz has a compatibility chart

Is anyone aware of any reasonably accurate population stats on # of sites (a) by version of RSS and (b) by tool used to generate it? (Actually, a snapshot of such stats from the time that 1.0 hit the radar would be even better...)

  • Syndic8 has a stats page which shows RSS1.0 being used in 23% of the feeds (in Feb'03); v0.91 has 50%, v2 has 12%. (Though their percentages don't seem to align with their raw counts. Maybe it's related to some sites supporting multiple versions?)

  • Syndic8 also shows Headline Portal Engine as being the lead toolkit, followed closely by Radio Userland.

Conversely, stats from 1 or more high-traffic sites with RSS support (SlashDot?) on breakdown of hits to the RSS file by RSS-reader User Agent would be very cool. (You'd probably have to count # of IPs instead of # of hits, since some readers might default to reading more frequently.)


I think I'm going to try to quickly hack together an RSS feed for this WikiWeblog, even though I'm already playing with a newer version of the underlying code.

Feb28'2003

  • just going to include wikiname/URL for each item, no content (to avoid rendering issues - that's why this is a 'hack')

  • look at mix of feeds out there, decide on v0.91. Here's Dave Winer's spec and sample

  • look at mix of URLs for people's RSS, decide it doesn't matter, I just need to make sure I have an image and a 'link' tag to point to it.

  • not clear on how much content to include, will probably just pick 10-20 latest items, then seek feedback.

  • note that Dave's spec says an 'image' is required for a channel, but the couple samples I've looked at don't have it, so I'm not going to worry about it.

  • for items, just need URL and 'title', 'description' is optional.

  • argh, can't get into that folder on my zopesite, something is broken!

  • for now, put at http://fluxent.com/fluxent/webseitz/career/rss.xml - it validates! (with 1 manually-included item)

Plan

  • make a DTML method to deliver 'rss.xml', just put in header info and maybe one hard-coded 'item', then run against validator

  • add DTML code to auto-generate 3 entries. Validate.

  • auto-generate 10-20 entries, validate.

  • Update 'link' URL. Seek feedback. Fix as needed.


try converting to rss v2

want to start including some content, include per-item date and author (esp for IRC bot, aggregator across TeamWiki spaces, etc.)

Nov18, 2003

  • it only takes tiny tweaks to 0.91 to make valid 2.0

  • argh, author has to be an email address rather than a Wiki Name

    • maybe the wiki community should do some hack, like take the author Wiki Name and add a bogus universal domain (@wiki.name)

    • or maybe the wiki crowd just decides to do invalid RSS - it's a well-contained breach, won't mess up the rest of any validation error messages...

  • full content:

    • hack: just take the straight SmartAscii

    • can't just include full content of a blogbit because if someone appends a comment it has crummy HTML inside it.

      • so maybe I should give up an render the full content to HTML and then escape it. Actually Dave Winer's feed doesn't have the description escaped. Argh, not going to bother for now...

Nov19

Nov25

  • item pubDate: argh, have to figure out right format for 'lastEditTime'

    • RFC 822 ; example 'Sun, 19 May 2002 15:21:36 GMT'

      • Python docs on the time module

      • nah, just stole from Zwiki 'dtml-var "lastEditTime" fmt="%a, %d %b %Y %H:%M:%S +0000"' - now it validates.

  • author - having validated, now as last hack go ahead and put my last_editor value in, even though it's not strictly valid since it's not an email address. We'll see if anyone's system chokes on it.

  • auto-discovery - hmm, what's right way to handle multiple link tags?


Edited:    |       |    Search Twitter for discussion