Tuesday, February 15, 2011

JAXB, XJC and unmappable chars

Say we have a XSD contains documentation with special characters:

<xs:annotation>
<xs:documentation>Tady je použita čeština se vším všudy<xs:documentation>
</xs:annotation>

Once you setup maven build and "jaxb2-maven-plugin", goal "xjc" and let JAXB to generate Java classes according to your schema you may endup with following error:

unmappable character for encoding UTF-8


Well, you have schema in UTF8 already, you have setup maven build to use UTF8, so why it says that some characters are not in UTF8?

I've found solution described here.

Reason

com.sun.codemodel.writer.CodeWriter#openSource uses the OutputStreamWriter encoding to set the CharsetEncoder to use :

{ OutputStreamWriter bw = new OutputStreamWriter(openBinary(pkg,fileName)); (...) CharsetEncoder encoder = EncoderFactory.createEncoder(bw.getEncoding()); }

It SHOULD instead build a CharsetEncoder based on the user-requested encoding (may fallback to default platform encoding) and THEN create the OutputStreamWriter with this encoder...

{ CharsetEncoder encoder = EncoderFactory.createEncoder( getUserDefinedEncoding() ); OutputStreamWriter bw = new OutputStreamWriter(openBinary(pkg,fileName), encoder); }


Solution

You have to setup build to use native encoding for your OS.

I've added profile (for mac):

<profile>
<activation>
<os>
<family>mac</family>
</os>
</activation>

<properties>
<project.build.sourceEncoding>MacRoman</project.build.sourceEncoding>
</properties>
</profile>

No comments:

Post a Comment