[FE-discuss] formencode.api.Invalid class lack __unicode_

Discussion:

[FE-discuss] formencode.api.Invalid class lack __unicode__ method

Marcin Stępnicki

2008-10-29 11:12:48 UTC

I've noticed that error messages which are unicode objects give the
following error when used in htmlfill:

UnicodeEncodeError: 'ascii' codec can't encode character u'\u0119' in
position 5: ordinal not in range(128)

In rewritingparser.html_quote there is a fragment which checks if
given object has __unicode__ attribute:

(...)
else:
if hasattr(v, '__unicode__'):
v = unicode(v)

but objects of 'formencode.api.Invalid' class don't have it, so the following:
(...)
else:
v = str(v)

fails with UnicodeDecodeError.

Adding:

def __unicode__(self):
val = self.msg
return val

to formencode.api.Invalid fixes the problem for me.

Regards,
Marcin

Ian Bicking

2008-10-29 16:32:53 UTC

Permalink

Added in r3638 (I think if you use e.unpack_errors() it shouldn't have
this problem, though)

r***@o2.pl

2008-12-18 18:00:05 UTC

Permalink

Sorry for so late response.

This is rediscovery of a problem that I mention in post from
2008-10-07. Unfortunately, solution proposed, in my opinion, is not
sufficient. As I highlighted in earlier post, there is no guarantee
that ``self.msg`` is of ``unicode`` type so just returning it by
``__unicode__()`` is not enough.

Minimal implementation may look like::

def __unicode__(self):
val = self.msg
if isinstance(val, unicode):
return val
else:
return unicode(val)

This still can be source of trouble as if ``self.msg`` will be string
we have no information about the encoding. If it will be latin-1,
above code will work, but in general -- and 'utf-8' in particular --
``UnicodeEncodeError`` will be raised again.

It can be solved in a way I described in earlier post or by requiring
that ``msg`` argument for ``formencode.api.Invalid`` be of ``unicode``
type and checking/converting it in ``Invalid.__init__()``. In latter
case if problem with encoding occur it will be easy to find and solve
by user just by casting ``msg`` argument to unicode with proper
encoding.

Regards, Krzysiek

Post by Marcin StÄpnicki
I've noticed that error messages which are unicode objects give the
UnicodeEncodeError: 'ascii' codec can't encode character u'\u0119' in
position 5: ordinal not in range(128)
In rewritingparser.html_quote there is a fragment which checks if
(...)
v = unicode(v)
(...)
v = str(v)
fails with UnicodeDecodeError.
val = self.msg
return val
to formencode.api.Invalid fixes the problem for me.
Regards,
Marcin
-------------------------------------------------------------------------
This SF.Net email is sponsored by the Moblin Your Move Developer's challenge
Build the coolest Linux based applications with Moblin SDK & win great prizes
Grand prize is a trip for two to an Open Source event anywhere in the world
http://moblin-contest.org/redirect.php?banner_id=100&url=/
_______________________________________________
FormEncode-discuss mailing list
https://lists.sourceforge.net/lists/listinfo/formencode-discuss

Ian Bicking

2008-12-18 21:12:22 UTC

Permalink

Post by r***@o2.pl
Sorry for so late response.
This is rediscovery of a problem that I mention in post from
2008-10-07. Unfortunately, solution proposed, in my opinion, is not
sufficient. As I highlighted in earlier post, there is no guarantee
that ``self.msg`` is of ``unicode`` type so just returning it by
``__unicode__()`` is not enough.
val = self.msg
return val
return unicode(val)
This still can be source of trouble as if ``self.msg`` will be string
we have no information about the encoding. If it will be latin-1,
above code will work, but in general -- and 'utf-8' in particular --
``UnicodeEncodeError`` will be raised again.

OK, committed in r3738. I used a default encoding of utf8 for str being
turned into unicode... I don't think it should come up, but I think
it'll be a little less likely to cause regressions. Perhaps. And I
like utf8 enough to let that bias my software ;)

--
Ian Bicking : ***@colorstudy.com : http://blog.ianbicking.org