RapidMiner

Problem while uploading files (with special characters)

Contributor II

Problem while uploading files (with special characters)

Hello there,

we are facing various problems with the RapidAnalytics. I would like to explain those problems here while we hope, that anyone may have experienced the same issues and knows how to solve them.

To begin with, I tried to upload a csv-File with special characters (swedish, which should be possible with UTF-8, as well as ISO-8859-1 and Windows-1552). Well, in RapidMiner there is no problem with this.
But when copy-paste the imported file from the local to the RapidAnalytics-Repository (home-directory), the upload goes well in the first place. But when I try to open the file (for example double-click) from the Analytics-Repository, I get:

"Message: Cannot parse I/O-Object: com.rapidminer.repository.RepositoryException: Cannot download IOObject: 500: Internal Server Error"

The server at this time states the following:
"13:52:14,438 INFO  [de.rapidanalytics.ejb.RepositoryStorageEJBImpl] admin submitted new object to /home/hoepken/data/OverallBookings_Ausschnitt1.
13:52:14,463 INFO  [de.rapidanalytics.ejb.RepositoryEJBImpl] Creating recursively: home/hoepken/data
13:52:14,467 INFO  [de.rapidanalytics.ejb.RepositoryEJBImpl] admin created entry '/home/hoepken/data/OverallBookings_Ausschnitt1 of type 'data'.
13:52:14,482 INFO  [com.rapidminer.example.db.ExampleSetToDB] Dropping data tables for es_65
13:52:14,504 INFO  [com.rapidminer.example.db.ExampleSetToDB] Cannot determine number of subtables. Probably tables already deleted: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist

13:52:14,512 INFO  [com.rapidminer.example.db.ExampleSetToDB] Failed to drop table: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
. (Ignoring)
13:52:14,520 INFO  [com.rapidminer.example.db.ExampleSetToDB] Failed to drop table: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
. (Ignoring)
13:52:14,528 INFO  [com.rapidminer.example.db.ExampleSetToDB] Failed to drop table: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
. (Ignoring)
13:52:14,865 WARNING [de.rapidanalytics.entity.IOObjectVersion] Cannot store example set: java.sql.SQLException: ORA-12899: value too large for column "RAPIDANALYTICS"."es_65_meta"."attributeName" (actual: 22, maximum: 21)
: java.sql.SQLException: ORA-12899: value too large for column "RAPIDANALYTICS"."es_65_meta"."attributeName" (actual: 22, maximum: 21)"


---

The first Infos ("failed to drop table") I get with every Database (in the past days I tried MySQL, PostgreSQL and Oracle as well).


Obviously, the problem is not the file size, as I also tried to cut the csv-file to only some rows, so that the file is only 72kb.


I would be very very glad if anyone is able to help. If you need any more information or log files, please let me know. I will send them immediately.
Thanks in advance!

Ralph-J. Andris
University of Applied Sciences Ravensburg-Weingarten
8 REPLIES
Contributor II

Re: Problem while uploading files

More information to the system:

OS: Windows Server 2008 64-bit
RA-Version: 1.1 Bundle (automatic installer)
Database: Oracle 11g Express Edition (also tried by MySQL and Postgre)

Fresh installation of RA. No problems discovered during installation.
We piped the StdOut to a textfile and went through it, finding no obvious problems (except for this "duplicate bug" :-))
Super Contributor

Re: Problem while uploading files (with special characters)

Hi,

something does not fit here. The error message is a bout storing and you are talking about downloading.

Did you configure your database to use UTF8 in all places? Database, connection, etc? You can edit server/default/deploy/rapidanalytics-ds,xml to add necessary parametrs to the connection URL if that is required.

Best,
Simon
Contributor II

Re: Problem while uploading files (with special characters)

Hello Mr. Fischer,

thanks for your reply.
I still think the problem is the uploading of the file, not the downloading. I just used this example to help on localized the problem.

If I uploaded the file, I do not even get the metadata for the uploaded object, nor can I download it.
The error messages that I posted already occur on uploading the file.

The database is configured to UTF8 and the connection works well.

Thanks for your time!

Best,
Ralph
Super Contributor

Re: Problem while uploading files (with special characters)

Hi,

when you get a 500, there must be an exception in the logs. Can you identify that in the server.log?

Best,
Simon
Contributor II

Re: Problem while uploading files (with special characters)

Hello

 

I'm getting the same error.

 

server.log shows:

Caused by: java.sql.SQLException: ORA-12899: value too large for column "RAPIDM7"."es_46939_meta"."attributeName" (actual: 16, maximum: 8)

 

RM Studio error message is:

Cannot save repository data

Cannot store data in repository at entry '//Raptor.98/home/zzzzz'

Reason: Cannot upload object. 500 internal Server Error

 

We are using RM Studio 7.2 64bit and RM Server 7.2 64bit Linux 

Oracle version is 11.2.0.1

 

Oracle Language Parameters are: (as in sys.pros$ table)

(1)NCHAR Character set - NLS_NCHAR_CHARACTERSET=UTF8

(2)Character set - NLS_CHARACTERSET=UTF8

(3)Language - NLS_LANGUAGE=KOREAN_KOREA.UTF8 

 

Operators used in RM Studio:

(1) Read Excel

(2) Store (to server repository)

 

My personal analysis:

This error occurs when Korean Letters are in the excel header row (first row).

Korean letters in following rows (after the first row) have no problem.

Only the korean letters in the first row generates errors.

The excel table is supposed to be saved in tables such as "es_47542_meta", "es_47542_nominal_mapping", "es_47542_data_1".

The table "es_47542_meta" has column which name is attributeName.

This attributeName column does not work. It should contain the header letters

and when there is any Korean character in this header row, 

necessary calculation to measure the length of this arributeName columen does not work.

We are using UTF-8, so every korean character needs 3 bytes.

If I insert a dummy (fictitous) header in any of the first header row which is large enough to hold the longest korean header,

then this process works fine.

For example, if the longest korean header has three korean letters, then

it requires nine bytes to hold it (3X3), then if there is any english header larger than 9 characters,

it generates the rapidminer table column attributeName as large as nine bytes and this process works fine.

It seems like the system (either RM or oracle) does not count the column width of Korean characters 

represented in UTF-8 character set.

 

 

test1.jpg

 

I used ojdbc7-12.1.0.jar for the jdbc driver of the RM server.

If I change it to ojdbc6.jar, it shows the same error.

 

 

 

 

 

 

Contributor II

Re: Problem while uploading files (with special characters)

 

This problem is not resolved yet.

 

Followings are what I tried to fix this, but it did not work. It shows the same error message.

 

1. standalone.conf

 

JAVA_OPTS="$JAVA_OPTS -Dfile.encoding=UTF-8"
JAVA_OPTS="$JAVA_OPTS -DNLS_LANG=KOREAN_KOREA.UTF8"
JAVA_OPTS="$JAVA_OPTS -Dsun.jnu.encoding=utf-8"
JAVA_OPTS="$JAVA_OPTS -Djavax.servlet.request.encoding=UTF8"
JAVA_OPTS="$JAVA_OPTS -Dorg.apache.catalina.connector.URI_ENCODING=UTF-8"

 

2. standalone.xml

 

<system-properties>
<property name="org.apache.catalina.connector.URI_ENCODING" value="UTF-8"/>
<property name="file.encoding" value="UTF-8"/>
<property name="org.apache.catalina.connector.USE_BODY_ENCODING_FOR_QUERY_STRING" value="true"/>
<property name="sun.jnu.encoding" value="UTF-8"/>
</system-properties>

 

<datasource jta="true" jndi-name="java:/jdbc/RapidAnalyticsDS" . . . . . . . . . . . . . . 

    <connection-property name="char.encoding">UTF-8</connection-property>
    <connection-property name="useUnicode">yes</connection-property>
    <connection-property name="connectionCollation">utf8_bin</connection-property>
    <connection-property name="characterSetResults">UTF-8</connection-property>

 

 

3. server.log shows following (entries related to UTF-8 character encoding)

        NLS_LANG = KOREAN_KOREA.UTF8

        [Standalone] =

        java.version = 1.8.0_101

        java.vm.name = Java HotSpot(TM) 64-Bit Server VM

        os.name = Linux

        os.version = 2.6.32-504.el6.x86_64

        file.encoding = UTF-8

        sun.io.unicode.encoding = UnicodeLittle

        sun.jnu.encoding = UTF-8

        user.language = ko

        javax.servlet.request.encoding = UTF-8

        org.apache.catalina.connector.URI_ENCODING = UTF-8

        user.timezone = ROK

 

Anybody who can help us. Please.

 

Contributor II

Re: Problem while uploading files (with special characters)

 

Server.log shows following Errors:

 

1. case one

14:05:17,936 INFO [de.rapidanalytics.ejb.RepositoryStorageEJBImpl] (http-/0.0.0.0:8080-1) admin submitted new object to /hangul1/kortest1.
14:05:18,061 INFO [com.rapidminer.server.example.db.ExampleSetToDB] (http-/0.0.0.0:8080-1) Dropping data tables for es_123
14:05:18,468 INFO [com.rapidminer.server.example.db.ExampleSetToDB] (http-/0.0.0.0:8080-1) Failed to drop table: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
. (Ignoring)
14:05:18,499 INFO [com.rapidminer.server.example.db.ExampleSetToDB] (http-/0.0.0.0:8080-1) Failed to drop table: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
. (Ignoring)
14:05:18,733 WARNING [de.rapidanalytics.entity.IOObjectVersion] (http-/0.0.0.0:8080-1) Cannot store example set: java.sql.SQLException: ORA-12899: value too large for column "RAPID"."es_123_meta"."attributeName" (actual: 6, maximum: 5)
: java.sql.SQLException: ORA-12899: value too large for column "RAPID"."es_123_meta"."attributeName" (actual: 6, maximum: 5)

 

 

 

2. case two:

13:09:57,120 INFO [de.rapidanalytics.ejb.RepositoryStorageEJBImpl] (http-/0.0.0.0:8080-2) admin submitted new object to /hangul1/kortest101.
13:09:57,143 INFO [de.rapidanalytics.ejb.RepositoryEJBImpl] (http-/0.0.0.0:8080-2) admin created entry '/hangul1/kortest101 of type 'data'.
13:09:57,194 INFO [com.rapidminer.server.example.db.ExampleSetToDB] (http-/0.0.0.0:8080-2) Dropping data tables for es_643
13:09:57,228 INFO [com.rapidminer.server.example.db.ExampleSetToDB] (http-/0.0.0.0:8080-2) Cannot determine number of subtables. Probably tables already deleted: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist

13:09:57,254 INFO [com.rapidminer.server.example.db.ExampleSetToDB] (http-/0.0.0.0:8080-2) Failed to drop table: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
. (Ignoring)
13:09:57,257 INFO [com.rapidminer.server.example.db.ExampleSetToDB] (http-/0.0.0.0:8080-2) Failed to drop table: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
. (Ignoring)
13:09:57,260 INFO [com.rapidminer.server.example.db.ExampleSetToDB] (http-/0.0.0.0:8080-2) Failed to drop table: java.sql.SQLSyntaxErrorException: ORA-00942: table or view does not exist
. (Ignoring)
13:09:57,385 WARNING [de.rapidanalytics.entity.IOObjectVersion] (http-/0.0.0.0:8080-2) Cannot store example set: java.sql.SQLException: ORA-12899: value too large for column "RAPM"."es_643_meta"."attributeName" (actual: 6, maximum: 5)
: java.sql.SQLException: ORA-12899: value too large for column "RAPM"."es_643_meta"."attributeName" (actual: 6, maximum: 5)

at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:450) [ojdbc7-12.1.0.2.jar:12.1.0.2.0]
at oracle.jdbc.driver.T4CTTIoer.processError(T4CTTIoer.java:399) [ojdbc7-12.1.0.2.jar:12.1.0.2.0]
at oracle.jdbc.driver.T4C8Oall.processError(T4C8Oall.java:1017) [ojdbc7-12.1.0.2.jar:12.1.0.2.0]
at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:655) [ojdbc7-12.1.0.2.jar:12.1.0.2.0]

Contributor II

Re: Problem while uploading files (with special characters)

 

Last time I used Rapidminer Server 7.2 on Linux using Operation Data in Oracle 12.1.0.2.0 with Character set UTF8.

 

I did the same test using a Rapidminer Server 7.2 on Windows 2012 with Oracle 12.1.0 with database created with Character set AL32UTF8.

It shows the same error.

This time I did some network TCP sniff to trace the actual data sent from studio to server.

It shows POST was used and Korean letters were encoded in UTF-8.

 

<< HTTP Packet sent from studio to Server >>

POST /api/rest/resources/hangul1/kortest1 HTTP/1.1
Accept-Charset: UTF-8
Content-Type: application/vnd.rapidminer.ioo
User-Agent: Java/1.8.0_51
Host: 192.168.80.202:8080
Accept: text/html, image/gif, image/jpeg, *; q=.2, */*; q=.2
Connection: keep-alive
Authorization: Basic YWRtaW46Y2hhbmdlaXQ=
Content-Length: 569

*q.................aaabb........polynominal....single_value......................... ........

 

 

The last part include korean letters encoded in UTF-8 korean letters composed of 3 bytes.

 

<< Binary data Analysis >>

00000180 75 65 00 00 00 05 00 00 00 00 00 00 00 06 EC 82 ue...... ........
00000190 AC EA B3 BC 00 00 00 01 00 00 00 09 ED 96 84 EB ........ ........
000001A0 B2 84 EA B1 B0 00 00 00 02 00 00 00 0C ED 8C 8C ........ ........
000001B0 EC 9D B8 EC 95 A0 ED 94 8C 00 00 00 03 00 00 00 ........ ........
000001C0 0F EC 98 A4 EC A7 95 EC 96 B4 EC A7 AC EB BD 95 ........ ........

 

EC 82 AC (3bytes) is equivalent to Korean letter '사'

EA B3 BC (3bytes) is equivalent to Korean letter '과'

ED 96 84 (3bytes) is equivalent to Korean letter '햄'

EB B2 84 (3bytes) is equivalent to Korean letter '버'