a question of semantics

While the overall discussion is populated by far more knowledgeable persons than myself, I’m going to poke my nose into it anyway.

The discussion revolves around issues with current syndication formats (see also: content distribution services); namely RSS and RDF.

Since I can’t say that I fully grokked all the nonsense about what brought everyone to this juncture, my first response was, “WTF?!”. This was swiftly followed by, “There’s no better time to find out.” I have since spent the last few days wading through books, specs, and expert opinions on the subject. While data input does not equate to data absorption, and certainly doesn’t elevate me to expert status, I did stay at a Holiday Inn last night.

My thoughts are as follows:

Anil makes a nice point but…

…it’s ultimately unworkable. While it certainly is possibleMark Pilgrim put it so tactfully, “…can’t even spell XHTML.” The average blogger’s efforts, and this is not meant to be demeaning, are concentrated elsewhere. (X)HTML authoring is not his/her primary concern. In most cases, the bulk of the authoring has been transferred to popular blogging tools. Tools which also currently author, and generate these syndication formats without asking the blogger to lift a finger. To instead, ask the average blogger to accept and shoulder this responsibility (and it would require this), as well as confine him/herself to a given format is, in my opinion, an unrealistic expectation. You think your feeds are broken now? Wait until you start scraping all those non-validating XHTML pages.

The tools themselves should be taking on more responsibility.

CMSs/Blogging tools are evolving daily, so why aren’t people looking to them to properly generate their syndication formats? Why should the average blogger have to worry about whether or not his/her blogging tool of choice is generating a proper XML/RSS/RDF document? A better question than some of the ones being asked is, if we can scrape a XHTML page and transform it into RSS/RDF, why aren’t the tools scraping the content prior to output, and turning any (X)HTML in an entry into valid XML/RSS/RDF? The only requirement put upon them at the moment is to meet structural validity. It is up to the user to make sure his output is well-formed enough to be useful and either stripped of (X)HTML elements, or properly marked as CDATATEI would be ideal as the basis for encoding with an eye towards syndication, since blogs, if they are anything, are a narrative. It certainly seems more appropriate to me to think about properly encoding the data first, and then syndicating it second. The current arguments seem to be rallying around RSS because it’s this great syndication format that everyone is already using, but am I wrong when I say it’s just a format? I could syndicate my ass if it was an application of XML that was understood by an aggregator. All it takes is agreeing on a common format (convention) for communication. [ed. note—Speaking of aggregators, the HPANA confounded us when we went browsing for aggregators. I’m happy they took the time to point out that IE5.5 has flaws, but we feel that time might have been better spent perfecting the site’s CSS.]

Metadata, Auto-Discovery and Aggregators

There’s some interesting stuff going on at the WMDI, and I’ve been thinking that marking up your weblog with metadata like this is an interesting possibility for aggregators to auto-discover feeds and subscribe while searching. I don’t see the point in encoding more data than an application would need to discover where it is and where it needs to go to get data. Search engines could be directed to content in the same manner. Instead of scraping entire sites, they could be pointed to XML feeds via metadata where they could soak in raw data without extraneous formatting. Similarly, the content of Flash sites could be indexed and accessed as an XML feed by non-Flash-enabled applications.

Worlds within worlds I tell ya’. Then again, I’m sure the people that dreamt this stuff up see it much more clearly than I do.

———-

Related Reading:

XHTML For Syndication

RSS

RDF

Metadata

Footnotes

  1. Site Summaries in XHTML/HyperRDF – an idea first introduced by Dan Connolly on the RDF Interest group mailing list, Tue, 21 Mar 2000
  2. The Next Logical Step for RSS – Timothy Appnel

Leave a Reply