"Problems connecting operators in R5 (Java Application)"

KarahedraKarahedra Member Posts: 6 Contributor II
edited June 2019 in Help
Hello,

i'm completely new to RapidMiner, so i'm sorry if i'm asking about something obvious.
I need to download and process html pages in a Java application, following part of the R4.6 tutorials i managed to put together some of the operators i need (also if i'm not sure of having done it the right way), but i can't figure out how to connect them.

Here is the code, i used the text and web plugins

public Miner(List<Vulnerability> datasourcelist) {
RapidMiner.init();
Process process = new Process();
process.getRootOperator().setParameter(ProcessRootOperator.PARAMETER_LOGFILE, "log");
Operator op;
ExecutionUnit u;
int counter=0;
try {
for (Vulnerability vuln:datasourcelist){
for (String ref:vuln.getRefs()){
process.getRootOperator().addSubprocess(counter);
u = process.getRootOperator().getSubprocess(counter);
op = OperatorService.createOperator("get_webpage");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("random_user_agent", "true");
op.setParameter("url", ref);
u.addOperator(op);
op = OperatorService.createOperator("extract_html_text_content");
op.setEnabled(true);
op.setExpanded(true);
u.addOperator(op);
op = OperatorService.createOperator("tokenize");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("mode", "specify characters");
op.setParameter("characters", ".:");
u.addOperator(op);
op = OperatorService.createOperator("filter_tokens_by_content");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("condition","matches");
op.setParameter("string", "[a-z]");
op.setParameter("regular_expression", "[a-zA-Z]");
u.addOperator(op);
op = OperatorService.createOperator("write_csv");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("csv_file", "test_csv.csv");
u.addOperator(op);
counter++;
}
}
System.out.println(process.getRootOperator().createProcessTree(0));
process.run();
} catch (OperatorCreationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (OperatorException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
And this is the error i get

com.rapidminer.operator.UserError: No data was deliverd at port extract_html_text_content.document.
at com.rapidminer.operator.ports.impl.AbstractPort.getData(AbstractPort.java:78)
I haven't given any input to the process since all the data should come from get webpage operators.


Thanks
Andrea
Tagged:

Answers

  • haddockhaddock Member Posts: 849 Maven
    Hi there,

    4.6 was much loved and has now retired, which is a mixed blessing for you, as the Web Mining and Text crunching plugins have also been updated and are now called Extensions. There are non-trivial architectural differences which you should look into. Time to upgrade I fear!

  • KarahedraKarahedra Member Posts: 6 Contributor II
    Probably i haven't been really clear in my explanation, i already use R5 (or at least i try to :P). I've tried to go with the 4.6 tutorials just because I'm unable to read German, but that left me unable to understand how some things should be done and quite doubtful about the correctness of the ones i managed to put together
  • haddockhaddock Member Posts: 849 Maven
    Cool, I'd take one of the RM 5.00 plugins apart to see how it can be done, and invest in Sebastian's paper on the subject of extensions; but there are many ways to ...

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    if you want to use RapidMiner API, you should be aware, that there has been many changes between 4.x and 5.0! We dropped the implicit data pass through and replaced it by the explicit flow layout, and this has some impact on the api, as well. Operators now need to be delivered with the single data objects by getting the port and setting the data there.
    After the great success of the Extension White Paper (It even outperforms the Free Webinar regarding the profit) I'm going to write an Integration White paper. But I wouldn't wait for it...If you take a look here in the forum how long it took me for writing the first one...

    Greetings,
      Sebastian
  • KarahedraKarahedra Member Posts: 6 Contributor II
    Hello,
    this is what i needed, thanks.
    Apparently i become more shortsighted than usual since i didn't notice connection and receive methods of ports, now my test code seems to be working just fine (i missed your paper as well)

    Thanks again for your help and for the ready answers
  • poppop Member Posts: 21 Maven
    Hi Sebastian,

    I bought the white paper and found it very useful, but my interest is more in integration. I just want to bring my support to this integration White paper. I will definitely buy it.
    Thanks for the great job!
Sign In or Register to comment.