Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Operator 'Get Pages' not running on AI Hub
Hi
I have a process running on an AI Hub where I have the operator 'Get Pages' (ext. Web Mining) embedded.
When I run the process in RM Studio everything is fine.
When I run the process on AI Hub but started it from the RM Studio ('Run Process on AI Hub'), everything is fine.
But when I kick off the web-service I created, the operator 'Get-Pages' seems to make trouble. Other web-services are running. And when I disable 'Get Pages' the web-service is running as well. So I strongly believe it has something to do with how the process runs on AI Hub.
This is the error message which I get on running the web-service:
The funny thing is that I found out is that if I run the process out of the repository on AI Hub, it runs successfully. But if I test the web-service, it does not work.
This is the process I used for testing. When I disable the operator 'Get Pages' everything works fine.
I don't know how to proceed.
Thanks for all the help!
Best
Mathis
I have a process running on an AI Hub where I have the operator 'Get Pages' (ext. Web Mining) embedded.
When I run the process in RM Studio everything is fine.
When I run the process on AI Hub but started it from the RM Studio ('Run Process on AI Hub'), everything is fine.
But when I kick off the web-service I created, the operator 'Get-Pages' seems to make trouble. Other web-services are running. And when I disable 'Get Pages' the web-service is running as well. So I strongly believe it has something to do with how the process runs on AI Hub.
This is the error message which I get on running the web-service:
de.rapidanalytics.ejb.service.ServiceDataSourceException Error executing process /home/bot/test_pages for service test_pages: com.rapidminer.operator.web.io.MultiThreadedCookieManager cannot be cast to com.rapidminer.operator.web.io.MultiThreadedCookieManager<br>
The funny thing is that I found out is that if I run the process out of the repository on AI Hub, it runs successfully. But if I test the web-service, it does not work.
This is the process I used for testing. When I disable the operator 'Get Pages' everything works fine.
<?xml version="1.0" encoding="UTF-8"?><process version="9.10.001"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.10.001" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="2001"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="retrieve" compatibility="9.10.001" expanded="true" height="68" name="Retrieve step_3_urls_after_python_short" width="90" x="112" y="136"> <parameter key="repository_entry" value="/home/user/some_table_with_urls"/> </operator> <operator activated="true" class="web:retrieve_webpages" compatibility="9.7.000" expanded="true" height="68" name="Get Pages" width="90" x="447" y="136"> <parameter key="link_attribute" value="links"/> <parameter key="random_user_agent" value="true"/> <parameter key="connection_timeout" value="10000"/> <parameter key="read_timeout" value="10000"/> <parameter key="follow_redirects" value="true"/> <parameter key="accept_cookies" value="original server"/> <parameter key="cookie_scope" value="global"/> <parameter key="request_method" value="GET"/> <parameter key="delay" value="none"/> <parameter key="delay_amount" value="1000"/> <parameter key="min_delay_amount" value="0"/> <parameter key="max_delay_amount" value="500"/> </operator> <connect from_op="Retrieve step_3_urls_after_python_short" from_port="output" to_op="Get Pages" to_port="Example Set"/> <connect from_op="Get Pages" from_port="Example Set" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
I don't know how to proceed.
Thanks for all the help!
Best
Mathis
Tagged:
0
Best Answer
-
methusi Member Posts: 5 Learner IFor the ones wondering - I could fix my problem by taking another route. Instead of calling a web service I schedule the process with the schedule API:
POST to server/executions/schedule with the corresponding headers and body
In the body, I do not set an execution time and force=true - this immediately starts the execution.0
Answers
However, if it is run as a webservice then it doesn't run on a JobAgent, but on the Server itself.