While the overall discussion is populated by far more knowledgeable persons than myself, I’m going to poke my nose into it anyway.
The discussion revolves around issues with current syndication formats (see also: content distribution services); namely RSS and RDF.
Since I can’t say that I fully grokked all the nonsense about what brought everyone to this juncture, my first response was, “WTF?!”. This was swiftly followed by, “There’s no better time to find out.” I have since spent the last few days wading through books, specs, and expert opinions on the subject. While data input does not equate to data absorption, and certainly doesn’t elevate me to expert status, I did stay at a Holiday Inn last night.
My thoughts are as follows:
Anil makes a nice point but…
…it’s ultimately unworkable. While it certainly is possible1, it adds a whole new layer of complexity for people who, as Mark Pilgrim put it so tactfully, “…can’t even spell XHTML.”
The average blogger’s efforts, and this is not meant to be demeaning, are concentrated elsewhere. (X)HTML authoring is not his/her primary concern. In most cases, the bulk of the authoring has been transferred to popular blogging tools. Tools which also currently author, and generate these syndication formats without asking the blogger to lift a finger. To instead, ask the average blogger to accept and shoulder this responsibility (and it would require this), as well as confine him/herself to a given format is, in my opinion, an unrealistic expectation. You think your feeds are broken now? Wait until you start scraping all those non-validating XHTML pages.
The tools themselves should be taking on more responsibility.
CMSs/Blogging tools are evolving daily, so why aren’t people looking to them to properly generate their syndication formats? Why should the average blogger have to worry about whether or not his/her blogging tool of choice is generating a proper XML/RSS/RDF document? A better question than some of the ones being asked is, if we can scrape a XHTML page and transform it into RSS/RDF, why aren’t the tools scraping the content prior to output, and turning any (X)HTML in an entry into valid XML/RSS/RDF? The only requirement put upon them at the moment is to meet structural validity. It is up to the user to make sure his output is well-formed enough to be useful and either stripped of (X)HTML elements, or properly marked as CDATA2.
Why not create a syndication interface, where a valid document can be created on the fly, based on criteria input by the user? A couple of quick questions regarding what data is to be output and what format, and the user never has to see an RSS/RDF template. In fact, taking the whole idea to its logical conclusion, why not build a CMS/Blogging tool that is based entirely on XML? A tool that requires the user only input data, no (X)HTML. It would require a complete rethink of the typical blogging interface, but ultimately it opens the door to transformation of data into whatever format desired.
Why is everyone stuck on using RSS/RDF?
RSS seems to be doing what it was never designed to do. It’s been adapted, adopted/co-opted, hacked at, and extended, all the while having to remain a ultra-liberal framework. It was ideally suited to its original purpose of syndication. Syndication in the form of headlines and brief descriptions of the latest topics from a website. Weblogs have pretty much blown this concept out of the water, and it’s been a scramble to bring RSS up to the task. What RSS has the unenviable task of doing now, is finding ways to completely repurpose content in all its varying degrees. My question is, why all the focus on RSS?
“Blogging” is so all-encompassing in scope that it’s hard to define what exactly it is, let alone create a framework that breaks it down into bite-sized chunks of data. For the most part, the real meat and potatos of the blog, the entry, is stuffed inside either the description element (which seems to me so semantically incorrect it’s funny), or in content:encoded where it’s treated as the bastard child of the document, since no one really seems to know what else to do with all this mixed content. I keep wondering why no one seems to be tackling the the job of transforming it when/where necessary. After all, we’re after metadata here aren’t we? I would have thought something along the lines of TEI would be ideal as the basis for encoding with an eye towards syndication, since blogs, if they are anything, are a narrative. It certainly seems more appropriate to me to think about properly encoding the data first, and then syndicating it second. The current arguments seem to be rallying around RSS because it’s this great syndication format that everyone is already using, but am I wrong when I say it’s just a format? I could syndicate my ass if it was an application of XML that was understood by an aggregator. All it takes is agreeing on a common format (convention) for communication. [ed. note—Speaking of aggregators, the HPANA confounded us when we went browsing for aggregators. I’m happy they took the time to point out that IE5.5 has flaws, but we feel that time might have been better spent perfecting the site’s CSS.]
Metadata, Auto-Discovery and Aggregators
There’s some interesting stuff going on at the WMDI, and I’ve been thinking that marking up your weblog with metadata like this is an interesting possibility for aggregators to auto-discover feeds and subscribe while searching. I don’t see the point in encoding more data than an application would need to discover where it is and where it needs to go to get data. Search engines could be directed to content in the same manner. Instead of scraping entire sites, they could be pointed to XML feeds via metadata where they could soak in raw data without extraneous formatting. Similarly, the content of Flash sites could be indexed and accessed as an XML feed by non-Flash-enabled applications.
Worlds within worlds I tell ya’. Then again, I’m sure the people that dreamt this stuff up see it much more clearly than I do.
Related Reading:
XHTML For Syndication
- XHTML Syndication Module – Joe Gregorio
- Syndication is not publication – Mark Pilgrim
- XHTML vs. the World – Tantek Çelik
- More on XHTML syndication – Tantek Çelik
- XSLT RDF Extraction Form – Sean B. Palmer
- RSS: XHTML Profile – Aaron Swartz
RSS
- RSS 0.91 – Dave Winer
- RSS 0.92 – Dave Winer
- RSS 2.0 – Dave Winer
- RDF Site Summary (RSS) 1.0 Spec
- RSS DevCenter
- Raising the Bar on RSS Feed Quality – Timothy Appnel
- RDF Site Summary – Ian Davis
RDF
- Resource Description Framework (RDF)
- RDF and Metadata – Tim Bray
- RDF in HTML: Approaches – Sean B. Palmer
- Resource Description Framework (RDF) Resource Guide – Dave Beckett
Metadata
Footnotes
- Site Summaries in XHTML/HyperRDF – an idea first introduced by Dan Connolly on the RDF Interest group mailing list, Tue, 21 Mar 2000
- The Next Logical Step for RSS – Timothy Appnel

RSS feed for comments on this post. / TrackBack URI