The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Problems connecting operators in R5 (Java Application)"
Hello,
i'm completely new to RapidMiner, so i'm sorry if i'm asking about something obvious.
I need to download and process html pages in a Java application, following part of the R4.6 tutorials i managed to put together some of the operators i need (also if i'm not sure of having done it the right way), but i can't figure out how to connect them.
Here is the code, i used the text and web plugins
Thanks
Andrea
i'm completely new to RapidMiner, so i'm sorry if i'm asking about something obvious.
I need to download and process html pages in a Java application, following part of the R4.6 tutorials i managed to put together some of the operators i need (also if i'm not sure of having done it the right way), but i can't figure out how to connect them.
Here is the code, i used the text and web plugins
And this is the error i get
public Miner(List<Vulnerability> datasourcelist) {
RapidMiner.init();
Process process = new Process();
process.getRootOperator().setParameter(ProcessRootOperator.PARAMETER_LOGFILE, "log");
Operator op;
ExecutionUnit u;
int counter=0;
try {
for (Vulnerability vuln:datasourcelist){
for (String ref:vuln.getRefs()){
process.getRootOperator().addSubprocess(counter);
u = process.getRootOperator().getSubprocess(counter);
op = OperatorService.createOperator("get_webpage");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("random_user_agent", "true");
op.setParameter("url", ref);
u.addOperator(op);
op = OperatorService.createOperator("extract_html_text_content");
op.setEnabled(true);
op.setExpanded(true);
u.addOperator(op);
op = OperatorService.createOperator("tokenize");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("mode", "specify characters");
op.setParameter("characters", ".:");
u.addOperator(op);
op = OperatorService.createOperator("filter_tokens_by_content");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("condition","matches");
op.setParameter("string", "[a-z]");
op.setParameter("regular_expression", "[a-zA-Z]");
u.addOperator(op);
op = OperatorService.createOperator("write_csv");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("csv_file", "test_csv.csv");
u.addOperator(op);
counter++;
}
}
System.out.println(process.getRootOperator().createProcessTree(0));
process.run();
} catch (OperatorCreationException e) {
// TODO Auto-generated catch block
e.printStackTrace();
} catch (OperatorException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I haven't given any input to the process since all the data should come from get webpage operators.
com.rapidminer.operator.UserError: No data was deliverd at port extract_html_text_content.document.
at com.rapidminer.operator.ports.impl.AbstractPort.getData(AbstractPort.java:78)
Thanks
Andrea
0
Answers
4.6 was much loved and has now retired, which is a mixed blessing for you, as the Web Mining and Text crunching plugins have also been updated and are now called Extensions. There are non-trivial architectural differences which you should look into. Time to upgrade I fear!
if you want to use RapidMiner API, you should be aware, that there has been many changes between 4.x and 5.0! We dropped the implicit data pass through and replaced it by the explicit flow layout, and this has some impact on the api, as well. Operators now need to be delivered with the single data objects by getting the port and setting the data there.
After the great success of the Extension White Paper (It even outperforms the Free Webinar regarding the profit) I'm going to write an Integration White paper. But I wouldn't wait for it...If you take a look here in the forum how long it took me for writing the first one...
Greetings,
Sebastian
this is what i needed, thanks.
Apparently i become more shortsighted than usual since i didn't notice connection and receive methods of ports, now my test code seems to be working just fine (i missed your paper as well)
Thanks again for your help and for the ready answers
I bought the white paper and found it very useful, but my interest is more in integration. I just want to bring my support to this integration White paper. I will definitely buy it.
Thanks for the great job!