Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Problem with text processing plugin
Hi all,
I encountered a problem in RM text processing plugin.
The program was working fine before but failed for some text files with non ascii characters.
The setup is using "Process Documents from Files" operator, what's in that operator are:
Transform Cases -> Tokenize -> Filter Stopwords -> Stem -> Filter Tokens (by Length)
Is it a bug in the text processing plugin or sth wrong with my setup/program? Thanks.
--------------------------------------------------------------------------------------------------
SEVERE: Process failed: operator cannot be executed (The name "lnêäûð6ûonxßvâisÿˆïqwòb-ûfåàãwcû-kžîžìeî" is not legal for JDOM/XML Namespace prefixs: Namespace prefixes cannot contain the character "ˆ".). Check the log messages...
org.jdom.IllegalNameException: The name "lnêäûð6ûonxßvâisÿˆïqwòb-ûfåàãwcû-kžîžìeî" is not legal for JDOM/XML Namespace prefixs: Namespace prefixes cannot contain the character "ˆ".
...
...
---------------------------------------------------------------------------------------------------
Exception in thread "main" org.jdom.IllegalNameException: The name "home" is not legal for JDOM/XML attributes: XML names cannot begin with the character "h".
at org.jdom.Attribute.setName(Attribute.java:361)
at org.jdom.Attribute.<init>(Attribute.java:228)
at org.jdom.Attribute.<init>(Attribute.java:276)
at org.jdom.DefaultJDOMFactory.attribute(DefaultJDOMFactory.java:93)
at org.jdom.input.SAXHandler.startElement(SAXHandler.java:544)
at org.ccil.cowan.tagsoup.Parser.push(Parser.java:794)
at org.ccil.cowan.tagsoup.Parser.rectify(Parser.java:1061)
at org.ccil.cowan.tagsoup.Parser.stagc(Parser.java:1016)
at org.ccil.cowan.tagsoup.HTMLScanner.scan(HTMLScanner.java:388)
at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:449)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:453)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:851)
at com.rapidminer.operator.text.io.filereader.HTMLFileReader.readStream(HTMLFileReader.java:72)
at com.rapidminer.operator.text.io.filereader.AbstractFileReader.readFile(AbstractFileReader.java:37)
at com.rapidminer.operator.text.io.FileDocumentInputIterator.next(FileDocumentInputIterator.java:94)
at com.rapidminer.operator.text.io.FileDocumentInputIterator.next(FileDocumentInputIterator.java:43)
at com.rapidminer.operator.text.io.AbstractDocumentInputOperator.doWork(AbstractDocumentInputOperator.java:228)
at com.rapidminer.operator.Operator.execute(Operator.java:833)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
at com.rapidminer.operator.Operator.execute(Operator.java:833)
at com.rapidminer.Process.run(Process.java:925)
at com.rapidminer.Process.run(Process.java:848)
at com.rapidminer.Process.run(Process.java:807)
at com.rapidminer.Process.run(Process.java:802)
at com.rapidminer.Process.run(Process.java:792)
at Filter.filter(PornFilter.java:84)
at Filter.main(PornFilter.java:128)
I encountered a problem in RM text processing plugin.
The program was working fine before but failed for some text files with non ascii characters.
The setup is using "Process Documents from Files" operator, what's in that operator are:
Transform Cases -> Tokenize -> Filter Stopwords -> Stem -> Filter Tokens (by Length)
Is it a bug in the text processing plugin or sth wrong with my setup/program? Thanks.
--------------------------------------------------------------------------------------------------
SEVERE: Process failed: operator cannot be executed (The name "lnêäûð6ûonxßvâisÿˆïqwòb-ûfåàãwcû-kžîžìeî" is not legal for JDOM/XML Namespace prefixs: Namespace prefixes cannot contain the character "ˆ".). Check the log messages...
org.jdom.IllegalNameException: The name "lnêäûð6ûonxßvâisÿˆïqwòb-ûfåàãwcû-kžîžìeî" is not legal for JDOM/XML Namespace prefixs: Namespace prefixes cannot contain the character "ˆ".
...
...
---------------------------------------------------------------------------------------------------
Exception in thread "main" org.jdom.IllegalNameException: The name "home" is not legal for JDOM/XML attributes: XML names cannot begin with the character "h".
at org.jdom.Attribute.setName(Attribute.java:361)
at org.jdom.Attribute.<init>(Attribute.java:228)
at org.jdom.Attribute.<init>(Attribute.java:276)
at org.jdom.DefaultJDOMFactory.attribute(DefaultJDOMFactory.java:93)
at org.jdom.input.SAXHandler.startElement(SAXHandler.java:544)
at org.ccil.cowan.tagsoup.Parser.push(Parser.java:794)
at org.ccil.cowan.tagsoup.Parser.rectify(Parser.java:1061)
at org.ccil.cowan.tagsoup.Parser.stagc(Parser.java:1016)
at org.ccil.cowan.tagsoup.HTMLScanner.scan(HTMLScanner.java:388)
at org.ccil.cowan.tagsoup.Parser.parse(Parser.java:449)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:453)
at org.jdom.input.SAXBuilder.build(SAXBuilder.java:851)
at com.rapidminer.operator.text.io.filereader.HTMLFileReader.readStream(HTMLFileReader.java:72)
at com.rapidminer.operator.text.io.filereader.AbstractFileReader.readFile(AbstractFileReader.java:37)
at com.rapidminer.operator.text.io.FileDocumentInputIterator.next(FileDocumentInputIterator.java:94)
at com.rapidminer.operator.text.io.FileDocumentInputIterator.next(FileDocumentInputIterator.java:43)
at com.rapidminer.operator.text.io.AbstractDocumentInputOperator.doWork(AbstractDocumentInputOperator.java:228)
at com.rapidminer.operator.Operator.execute(Operator.java:833)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:709)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
at com.rapidminer.operator.Operator.execute(Operator.java:833)
at com.rapidminer.Process.run(Process.java:925)
at com.rapidminer.Process.run(Process.java:848)
at com.rapidminer.Process.run(Process.java:807)
at com.rapidminer.Process.run(Process.java:802)
at com.rapidminer.Process.run(Process.java:792)
at Filter.filter(PornFilter.java:84)
at Filter.main(PornFilter.java:128)
0
Answers
this seems to be a encoding problem. Did you try to use another encoding type? It can be set with the expert parameter "encoding".
If this does not help can you please post a short process that helps us to reproduce the error? How to post a process is described here: http://rapid-i.com/rapidforum/index.php/topic,4654.0.html
Furthermore is it possible to also send some part of the data that produced the error? Without it, the error is hard to reproduce.
Best,
Nils
I have exactly the same problem and this is a part of the data that produced the error :
http://www.4shared.com/rar/AEL4eo-k/textmining.html
thank you.
what kind of encoding do the html files have? If I open them they look like which is no valid HTML.
Best,
Nils
Hi,
I could able to excute the process through its .rmp file. I am gettin below error when I try to run the process
<em class="error">The operator class 'IntentsOperator' is unknown.</em>
com.rapidminer.io.process.XMLImporter.attribute_not_found_unknown
com.rapidminer.io.process.XMLImporter.attribute_not_found_unknown
com.rapidminer.io.process.XMLImporter.attribute_not_found_unknown
com.rapidminer.io.process.XMLImporter.attribute_not_found_unknown
com.rapidminer.io.process.XMLImporter.attribute_not_found_unknown
com.rapidminer.io.process.XMLImporter.attribute_not_found_unknown
com.rapidminer.io.process.XMLImporter.attribute_not_found_unknown
com.rapidminer.io.process.XMLImporter.attribute_not_found_unknown
<em class="error">The output port <var>intents</var> is unknown at operator <var>intents</var>.</em>
-- ADDING MACROS--
test : test
No filename given for result file, using stdout for logging results!
Process C:\Users\vc126m\Documents\rapidminercommandexecutor\.RapidMiner5\repositories\Local Repository\processes\intents test.rmp starts
Process failed: The dummy operator intents (replacing IntentsOperator) cannot be executed.
Here: Process[1] (Process)
subprocess 'Main Process'
==> +- intents[1] (dummy)
Process not successful
341 [Thread-3] INFO org.eclipse.jetty.util.log - Logging initialized @5395ms
384 [Thread-3] INFO spark.embeddedserver.jetty.EmbeddedJettyServer - == Spark has ignited ...
385 [Thread-3] INFO spark.embeddedserver.jetty.EmbeddedJettyServer - >> Listening on 0.0.0.0:4567
387 [Thread-3] INFO org.eclipse.jetty.server.Server - jetty-9.3.6.v20151106
421 [Thread-3] INFO org.eclipse.jetty.server.ServerConnector - Started ServerConnector@554404d{HTTP/1.1,[http/1.1]}{0.0.0.0:4567}
421 [Thread-3] INFO org.eclipse.jetty.server.Server - Started @5476ms
Process finished with exit code 1
Hi,
your RM engine does not find the extension with your custom operator "intents". So you need to be sure that this extension is loaded as well.
Best,
Martin
Dortmund, Germany
Hello mschmitz, thank you very much for your reply. Would you please explain me in details how can i add that extensions.
Thanks,
venkat
Hello, Could you please help me with how to add the extensions. I am using RM version 5.3.013.
So it sounds like you created your own extension and compiled it? Is it a JAR file? if so, you will have to place it into your /.RapdiMiner/extensions directory and then restart RapidMiner. I'm not sure where that is in v5.3.
Did you use the developer's guide to make the extension? https://docs.rapidminer.com/developers/
I added rapidminer.jar file in ./RM/extensions directory. But still getting the same
May 24, 2017 2:27:20 PM com.rapidminer.io.process.XMLImporter addMessage
INFO: <em class="error">The operator class 'entity-extract' is unknown.</em>
May 24, 2017 2:27:20 PM com.rapidminer.io.process.XMLImporter parseOperator
INFO: The parameter 'host_name' is unknown for operator 'entity extract' (" dummy ")."
May 24, 2017 2:27:20 PM com.rapidminer.io.process.XMLImporter parseOperator
INFO: The parameter 'host_port' is unknown for operator 'entity extract' (" dummy ")."
May 24, 2017 2:27:20 PM com.rapidminer.io.process.XMLImporter parseOperator
INFO: The parameter 'path' is unknown for operator 'entity extract' (" dummy ")."
May 24, 2017 2:27:20 PM com.rapidminer.io.process.XMLImporter parseOperator
INFO: The parameter 'groupid' is unknown for operator 'entity extract' (" dummy ")."
May 24, 2017 2:27:20 PM com.rapidminer.io.process.XMLImporter parseOperator
INFO: The parameter 'libraryname' is unknown for operator 'entity extract' (" dummy ")."
May 24, 2017 2:27:20 PM com.rapidminer.io.process.XMLImporter parseOperator
INFO: The parameter 'languagename' is unknown for operator 'entity extract' (" dummy ")."
May 24, 2017 2:27:20 PM com.rapidminer.io.process.XMLImporter parseOperator
INFO: The parameter 'Username' is unknown for operator 'entity extract' (" dummy ")."
May 24, 2017 2:27:20 PM com.rapidminer.io.process.XMLImporter parseOperator
INFO: The parameter 'password' is unknown for operator 'entity extract' (" dummy ")."
May 24, 2017 2:27:20 PM com.rapidminer.io.process.XMLImporter addMessage
INFO: <em class="error">The input port <var>write</var> is unknown at operator <var>entity extract</var>.</em>
May 24, 2017 2:27:20 PM com.rapidminer.io.process.XMLImporter addMessage
INFO: <em class="error">The output port <var>entityextract</var> is unknown at operator <var>entity extract</var>.</em>
May 24, 2017 2:27:20 PM CommandLine serverLoop
SEVERE: -- ADDING MACROS--
May 24, 2017 2:27:20 PM CommandLine serverLoop
SEVERE: test : test
May 24, 2017 2:27:20 PM com.rapidminer.tools.WrapperLoggingHandler log
INFO: No filename given for result file, using stdout for logging results!
May 24, 2017 2:27:20 PM com.rapidminer.Process run
INFO: Process C:\Users\vc126m\Documents\rapidminercommandexecutor\.RapidMiner5\repositories\Local Repository\processes\entityextract.rmp starts
May 24, 2017 2:27:20 PM CommandLine serverLoop
SEVERE: Process failed: The dummy operator entity extract (replacing entity-extract) cannot be executed.
com.rapidminer.operator.UserError: The dummy operator entity extract (replacing entity-extract) cannot be executed.
at com.rapidminer.operator.DummyOperator.doWork(DummyOperator.java:88)
at com.rapidminer.operator.Operator.execute(Operator.java:867)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:375)
at com.rapidminer.operator.Operator.execute(Operator.java:867)
at com.rapidminer.Process.run(Process.java:949)
at com.rapidminer.Process.run(Process.java:873)
at com.rapidminer.Process.run(Process.java:832)
at com.rapidminer.Process.run(Process.java:827)
at com.rapidminer.Process.run(Process.java:817)
at CommandLine.serverLoop(CommandLine.java:153)
at CommandLine.results(CommandLine.java:202)
at CommandLine.main(CommandLine.java:220)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:147)
May 24, 2017 2:27:20 PM CommandLine serverLoop
SEVERE: Here: Process[1] (Process)
subprocess 'Main Process'
+- Read CSV[1] (Read CSV)
==> +- entity extract[1] (dummy)
May 24, 2017 2:27:20 PM CommandLine serverLoop
SEVERE: Process not successful
I wonder if there's a Studio versioning issue. V5.3 is pretty old, have you tried it with v7.5?