Discussion:
ESQL and utf-8 encoding
Lopke, Michael
2005-01-21 17:19:56 UTC
Permalink
Hi,

Has anyone here used esql with data that is utf-8 encoded? I'm able to connect to my database and get the correct data but it appears that somewhere along the way the data is being interpreted as iso-8859-1 encoding. I'm not sure if I got all of the configurations correct.

For example, the Chinese character.
電

Shows up as this.
電å


In my sitemap.xmap I have the following:

<map:generators default="file">
<map:generator label="content,data" logger="sitemap.generator.file" name="file" pool-grow="4" pool-max="32" pool-min="8" src="org.apache.cocoon.generation.FileGenerator"/>
<map:generator label="content,data" logger="sitemap.generator.serverpages" name="xsp" pool-grow="2" pool-max="32" pool-min="4" src="org.apache.cocoon.generation.ServerPagesGenerator"/>
</map:generators>



<map:serializers default="html">
...
<map:serializer name="xml"
src="org.apache.cocoon.serialization.XMLSerializer"
mime-type="text/xml; charset=utf-8">
<encoding>UTF-8</encoding>
</map:serializer>
</map:serializers>


.
<!-- the XSP pages -->
<map:match pattern="*.xml">
<map:generate type="xsp" src="xsp/{1}.xsp"/>
<map:serialize type="xml"/>
</map:match>

The snippit in my xsp file looks like this:
...
<esql:results>
<esql:row-results>
<data>
<esql:get-string column="display">
<esql:encoding>UTF-8</esql:encoding>
</esql:get-string >
</data>
</esql:row-results>
</esql:results>



It looks like the generator is interpreting the data as iso-8859-1 and passing it through the pipe as such. If I take the same data and put it into an xml file as my source but modify the encoding at the top to iso-8859-1, I can duplicate the problem.

Thanks,
Mike Lopke
Lopke, Michael
2005-01-25 01:18:12 UTC
Permalink
Hi,

I'm following up on my previous post because I have done quite a bit of reading of the mail archives which has been helpful, but I'm still stuck.

I placed some code in my original xsp file that looks like this:

<xsp:logic>
String debug_thing =EsqlHelper.getStringFromByteArray( _esql_query.getResultSet().getBytes
("display"),
""


+ "UTF-8"
,"");



System.out.println("DEBUG " + debug_thing);
</xsp:logic>

This was nested in side the <esql:query>. What I'm finding is that the string that gets printed out to the file is good utf-8. My problem is that the output I'm getting on my browser is still incorrect. Any ideas? I even modified the container encoding in the web.xml file and had no luck.

Thanks,
Mike Lopke




-----Original Message-----
From: Lopke, Michael
Sent: Friday, January 21, 2005 10:20 AM
To: ***@cocoon.apache.org
Subject: ESQL and utf-8 encoding



Hi,

Has anyone here used esql with data that is utf-8 encoded? I'm able to connect to my database and get the correct data but it appears that somewhere along the way the data is being interpreted as iso-8859-1 encoding. I'm not sure if I got all of the configurations correct.

For example, the Chinese character.
電

Shows up as this.
é›»å


In my sitemap.xmap I have the following:

<map:generators default="file">
<map:generator label="content,data" logger="sitemap.generator.file" name="file" pool-grow="4" pool-max="32" pool-min="8" src="org.apache.cocoon.generation.FileGenerator"/>

<map:generator label="content,data" logger="sitemap.generator.serverpages" name="xsp" pool-grow="2" pool-max="32" pool-min="4" src="org.apache.cocoon.generation.ServerPagesGenerator"/>

</map:generators>



<map:serializers default="html">
...
<map:serializer name="xml"
src="org.apache.cocoon.serialization.XMLSerializer"
mime-type="text/xml; charset=utf-8">
<encoding>UTF-8</encoding>
</map:serializer>
</map:serializers>


.
<!-- the XSP pages -->
<map:match pattern="*.xml">
<map:generate type="xsp" src="xsp/{1}.xsp"/>
<map:serialize type="xml"/>
</map:match>

The snippit in my xsp file looks like this:
...
<esql:results>
<esql:row-results>
<data>
<esql:get-string column="display">
<esql:encoding>UTF-8</esql:encoding>
</esql:get-string >
</data>
</esql:row-results>
</esql:results>



It looks like the generator is interpreting the data as iso-8859-1 and passing it through the pipe as such. If I take the same data and put it into an xml file as my source but modify the encoding at the top to iso-8859-1, I can duplicate the problem.

Thanks,
Mike Lopke
Aurélien DEHAY
2005-01-25 08:58:19 UTC
Permalink
Hi,
Hi.

Is your database supports UTF-8 encoding? I know I had this problems
with postgres if I don't create the database with -E unicode parameter.

Rgds.
Lopke, Michael
2005-01-25 18:30:58 UTC
Permalink
Hi,

Yes the database supports UTF-8 encoding. Its oracle 9i. I can use the same driver and data base with a standard java servlet and the data works. Also, I am able to print the data out to a file with some java code I inserted into the xsp page using the <xsp:logic> tag.

I've deployed cocoon using WebLogic, so I half wonder if this is causing the problems.

Thanks,
Mike

-----Original Message-----
From: Aurélien DEHAY [mailto:***@zorel.org]
Sent: Tuesday, January 25, 2005 1:58 AM
To: ***@cocoon.apache.org
Subject: Re: ESQL and utf-8 encoding
Hi,
Hi.

Is your database supports UTF-8 encoding? I know I had this problems
with postgres if I don't create the database with -E unicode parameter.

Rgds.

---------------------------------------------------------------------
To unsubscribe, e-mail: users-***@cocoon.apache.org
For additional commands, e-mail: users-***@cocoon.apache.org
Martinson, Theresa
2005-01-25 18:59:54 UTC
Permalink
We experienced a similar problem with the character encoding on the http request processed by Cocoon. We also were attempting to use UTF-8 encoding but found that the encoding would always default to ISO-8859-1. Looking at the request in a debugger, we found that the actual http request wrapped by the Cocoon HttpRequest did not have the character encoding properly set. We corrected this by modifying CocoonServlet to set the character encoding on the wrapped request to the form-encoding value specified in the web.xml. In order to set the character encoding, we needed to use the 2.3 version of the servlet jar. This solved our encoding problems for page display and for request parameter interpretation.

We understand that this is not directly on point with your problem, but perhaps it may provide a basis for thought.

Good luck.

Theresa

-----Original Message-----
From: Lopke, Michael [mailto:***@hp.com]
Sent: Monday, January 24, 2005 7:18 PM
To: ***@cocoon.apache.org
Subject: RE: ESQL and utf-8 encoding



Hi,

I'm following up on my previous post because I have done quite a bit of reading of the mail archives which has been helpful, but I'm still stuck.

I placed some code in my original xsp file that looks like this:

<xsp:logic>
String debug_thing =EsqlHelper.getStringFromByteArray( _esql_query.getResultSet().getBytes
("display"),
""


+ "UTF-8"
,"");



System.out.println("DEBUG " + debug_thing);
</xsp:logic>

This was nested in side the <esql:query>. What I'm finding is that the string that gets printed out to the file is good utf-8. My problem is that the output I'm getting on my browser is still incorrect. Any ideas? I even modified the container encoding in the web.xml file and had no luck.

Thanks,
Mike Lopke




-----Original Message-----
From: Lopke, Michael
Sent: Friday, January 21, 2005 10:20 AM
To: ***@cocoon.apache.org
Subject: ESQL and utf-8 encoding



Hi,

Has anyone here used esql with data that is utf-8 encoded? I'm able to connect to my database and get the correct data but it appears that somewhere along the way the data is being interpreted as iso-8859-1 encoding. I'm not sure if I got all of the configurations correct.

For example, the Chinese character.
電

Shows up as this.
é›»å


In my sitemap.xmap I have the following:

<map:generators default="file">
<map:generator label="content,data" logger="sitemap.generator.file" name="file" pool-grow="4" pool-max="32" pool-min="8" src="org.apache.cocoon.generation.FileGenerator"/>

<map:generator label="content,data" logger="sitemap.generator.serverpages" name="xsp" pool-grow="2" pool-max="32" pool-min="4" src="org.apache.cocoon.generation.ServerPagesGenerator"/>

</map:generators>



<map:serializers default="html">
...
<map:serializer name="xml"
src="org.apache.cocoon.serialization.XMLSerializer"
mime-type="text/xml; charset=utf-8">
<encoding>UTF-8</encoding>
</map:serializer>
</map:serializers>


.
<!-- the XSP pages -->
<map:match pattern="*.xml">
<map:generate type="xsp" src="xsp/{1}.xsp"/>
<map:serialize type="xml"/>
</map:match>

The snippit in my xsp file looks like this:
...
<esql:results>
<esql:row-results>
<data>
<esql:get-string column="display">
<esql:encoding>UTF-8</esql:encoding>
</esql:get-string >
</data>
</esql:row-results>
</esql:results>



It looks like the generator is interpreting the data as iso-8859-1 and passing it through the pipe as such. If I take the same data and put it into an xml file as my source but modify the encoding at the top to iso-8859-1, I can duplicate the problem.

Thanks,
Mike Lopke
Lopke, Michael
2005-01-27 23:49:22 UTC
Permalink
Thanks for all the help. It appears my data was corrupted in the database.

Mike

-----Original Message-----
From: Martinson, Theresa [mailto:***@personneldecisions.com]
Sent: Tuesday, January 25, 2005 12:00 PM
To: ***@cocoon.apache.org
Subject: RE: ESQL and utf-8 encoding



We experienced a similar problem with the character encoding on the http request processed by Cocoon. We also were attempting to use UTF-8 encoding but found that the encoding would always default to ISO-8859-1. Looking at the request in a debugger, we found that the actual http request wrapped by the Cocoon HttpRequest did not have the character encoding properly set. We corrected this by modifying CocoonServlet to set the character encoding on the wrapped request to the form-encoding value specified in the web.xml. In order to set the character encoding, we needed to use the 2.3 version of the servlet jar. This solved our encoding problems for page display and for request parameter interpretation.

We understand that this is not directly on point with your problem, but perhaps it may provide a basis for thought.

Good luck.

Theresa

-----Original Message-----
From: Lopke, Michael [mailto:***@hp.com]
Sent: Monday, January 24, 2005 7:18 PM
To: ***@cocoon.apache.org
Subject: RE: ESQL and utf-8 encoding



Hi,

I'm following up on my previous post because I have done quite a bit of reading of the mail archives which has been helpful, but I'm still stuck.

I placed some code in my original xsp file that looks like this:

<xsp:logic>
String debug_thing =EsqlHelper.getStringFromByteArray( _esql_query.getResultSet().getBytes
("display"),
""



+ "UTF-8"
,"");



System.out.println("DEBUG " + debug_thing);
</xsp:logic>

This was nested in side the <esql:query>. What I'm finding is that the string that gets printed out to the file is good utf-8. My problem is that the output I'm getting on my browser is still incorrect. Any ideas? I even modified the container encoding in the web.xml file and had no luck.

Thanks,
Mike Lopke




-----Original Message-----
From: Lopke, Michael
Sent: Friday, January 21, 2005 10:20 AM
To: ***@cocoon.apache.org
Subject: ESQL and utf-8 encoding



Hi,

Has anyone here used esql with data that is utf-8 encoded? I'm able to connect to my database and get the correct data but it appears that somewhere along the way the data is being interpreted as iso-8859-1 encoding. I'm not sure if I got all of the configurations correct.

For example, the Chinese character.
電

Shows up as this.
é›»å


In my sitemap.xmap I have the following:

<map:generators default="file">
<map:generator label="content,data" logger="sitemap.generator.file" name="file" pool-grow="4" pool-max="32" pool-min="8" src="org.apache.cocoon.generation.FileGenerator"/>

<map:generator label="content,data" logger="sitemap.generator.serverpages" name="xsp" pool-grow="2" pool-max="32" pool-min="4" src="org.apache.cocoon.generation.ServerPagesGenerator"/>

</map:generators>



<map:serializers default="html">
...
<map:serializer name="xml"
src="org.apache.cocoon.serialization.XMLSerializer"
mime-type="text/xml; charset=utf-8">
<encoding>UTF-8</encoding>
</map:serializer>
</map:serializers>


.
<!-- the XSP pages -->
<map:match pattern="*.xml">
<map:generate type="xsp" src="xsp/{1}.xsp"/>
<map:serialize type="xml"/>
</map:match>

The snippit in my xsp file looks like this:
...
<esql:results>
<esql:row-results>
<data>
<esql:get-string column="display">
<esql:encoding>UTF-8</esql:encoding>
</esql:get-string >
</data>
</esql:row-results>
</esql:results>



It looks like the generator is interpreting the data as iso-8859-1 and passing it through the pipe as such. If I take the same data and put it into an xml file as my source but modify the encoding at the top to iso-8859-1, I can duplicate the problem.

Thanks,
Mike Lopke

Loading...