Planet Perl feeds not validating

| 10 Comments

Planet PerlIt was brought to our attention that the Planet Perl RSS feeds doesn't validate.

Both feeds are not encoding my name from the configuration file and the RSS 2.0 feed is using an invalid date format apparently.

This is where you, dear reader, comes in. I know you are bouncing in your chair from excitement for fixing the python code already.

The PlanetPlanet site is empty, so I'm not quite sure where to send patches, but when they are made (nudge nudge, dear reader) we can try sending them to Thom and then he can send them to the Planet Debian maintainer and hopefully they'll make it to the other users...

10 Comments

it's a little surprising to me how few of the weblog-related software out there gets character encoding issues right. even blogger.com has problems with utf-8 characters in blog names (or did last i checked).

the blo.gs database is polluted with over-encoded and incorrectly encoded data because of the sloppy handling of character encodings by blogger.com and www.weblogs.com, and how that has propagated out to the various weblog tools.

the entry for your blog is one of the offenders.

(one other thing i blame is the poor encoding handling of php's xml parser before php 5.)

The date error could be handled by changing your config.ini entry for DATE_FORMAT. It defaults to "%B %d, %Y %I:%M %p" and if you change it to
"%a, %d %b %Y %H:%M %Z" (which generates a date like "Wed, 02 Oct 2002 13:00 GMT") that may solve the problem.

I just realized that changing DATE_FORMAT may not give the results you need as the template parsing code doesn't seem to honor it (I'm going to set up my own test planet to double check). But I noticed that there seem to be three date related template variables for each feed item: date, date_iso and date_822 -- so you could try to specify date_822 at line 18 of the rss20.xml.tmpl file. It reads like this currently: "<pubDate><TMPL_VAR date> +0000</pubDate>" and maybe it should read "<pubDate><TMPL_VAR date_822></pubDate>"
hope this helps

I suspect if you change your name in config.ini to 'Bj&oslash;rn' and regenerate the files it'll work fine.

Jim, yes I had noticed that too. I'm just a user of my software. ;-) I have considered that I should figure out what's wrong though. I'm not entirely convinced that it is MT's encoding is wrong.

Kenny, tried that it just makes the validator say " XML Parsing error: :17:14: undefined entity".

Mike, date_822 did the trick! I can't believe I missed that. Thanks. :-)


- ask

&oslash; is not one of the few valid escaped characters for xml 1.0 encoding - you would need to find out what the &#000; value is for that character.

Glad I could help with the date formating routine - I was reading the planet code earlier and your log post just triggered a memory :)

bear aka mike

try &#x00f8;

it should look like: Bjørn

There are indeed three date formats you can put in the templates, 'date' is whatever 'date_format' is in the config file; 'date_iso' is ISO date format and 'date_822' is RFC822 date format.

It looks like your code is a very old version of Planet, there's about 30 patches since then, one of which fixes the various encoding issues.

You can currently grab the latest source through tla from:

scott@netsplit.com--projects http://www.netsplit.com/arch

planet--devel--0.0


Jeff's just caught up again I think, but he's lagged for quite a while.

There's a new code drop at:
http://www.planetplanet.org/planet-devel.tar.bz2 that might help, certainly it has the date_822 problem solved, and has had UTF-8 correctness work done.
-Thom

Hi, Ask! I found it that your name contains non-ascii character and often causes troubles. See "planet" module @ apache.org (config.ini). I replaced one non-ascii character in your name to & o s l a s h ;. Perfect. Hope this might helps. Cheers!

About this Entry

This page contains a single entry by Ask Bjørn Hansen published on February 8, 2004 12:16 AM.

Clueless bank was the previous entry in this blog.

Full keyboard access in OS X is the next entry in this blog.

Find recent content on the main index or look in the archives to find all content.

Pages

OpenID accepted here Learn more about OpenID
Powered by Movable Type 4.38
/* bf */