So I have Prrarf publishing entries for Blosxom as .txt files, but because the content model is more complex than title\nbody, I'm using an item.html that Prrarf uses to Render Files after it Reads RSS. Blosxom's story.html is, in its entirety:
$body
Because I wanted to have some conditional interpolation in the item.html, and I'm a programmer, I ended up going with an $if_foo syntax, which is ugly but works. So I have code like:
def ifReplace(m, d):
if d.has_key(m.group(1)):
return m.group(2)
if None <> m.group(3):
return m.group(3)
return ''
itemTemplate = re.sub(r"\$if_(\w+)(.*?)(?:$else_\1(.*?))?$endif_\1", lambda m: ifReplace(m, item), itemTemplate)
So this works:
$channel_titlelink $time
$titlelink
$if_comments
(C)
$endif_comments
$body
(The first line is blank because that's where Blosxom will look for the story title.) But what if, instead, I went in a line-by-line mode? If I did away with the $if and $endif lines, and simply looked for an exception trying to interpolate that line? If there's no i['comments'] for item i, then I'd get an exception from referencing $comments and drop that whole line from output. (I guarantee $titlelink, $time, and $channel_titlelink (which translates to r.channel['titlelink'], actually) exist for various reasons).
That's a little difficult, though. I can't have an i['good'] variable then; I'd need an i['bad'] variable so I could write:
and have the class attribute disappear for good items. Note also that I can't really leave that div out altogether, because I can't optionalize the closing </div> by putting it on the same line. So altogether it doesn't seem like a wonderful solution, the on-the-same-line thing. Interesting thought experiment though.
Here's my new to-do. For once I've polished off all the previous items in the previous list (well, except the Unicode thing).
- Fix present-but-null titles. These break Radio, but I can treat them like missing titles (though it'd be nice if rssparser handled that instead, hmm). I already have code in but I haven't used it yet.
Check for other date formats to parse. The vast majority of items in the last scan are marked 16:48:46 because that's the previous-scan time it used for dateless items. Thusly most of the items scanned are suddenly not only in random order by channel (like Radio scanned them) but pseudorandom order period; if I understand correctly, they're ordered by perl's whim when it sorts the files by mtime. In fact, I just now realized that shouldn't even be a constant order, because perl's
sortneeds a special option to sort consistently. Ouch.I railed about dates already; the most obnoxious format I've seen is Bruce Eckel's, with dates like
5-24-03.- Unicode quote bug. It shows up in Tim Bray's ongoing and in Aaron Swartz's weblog.
- Channel name overrides? I'd like to see "David Chess: Log" instead of "Log" and "Chris Pirillo" instead of "Search."
- First-resort XML-based parser
- Cleanse items' HTML