The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Java integration - web mining: Get Page"
Dear All,
I'm interested in using RM's capability in my own Java program. I want to use the 'Web Mining' extension available in RM in order to retrieve web pages and do further processing afterwards.
Through my program I can initialize RM already. Then I created the 'Get Page' operator and set up all the required parameters. OK, so now my problem is, how can I retrieve the result of 'Get Page' operator? In RM5 I can see that the output of 'Get Page' is a Document. But unfortunately, I cannot find Document even in the javadoc. Below is the my code so far.:
Thank you!
I'm interested in using RM's capability in my own Java program. I want to use the 'Web Mining' extension available in RM in order to retrieve web pages and do further processing afterwards.
Through my program I can initialize RM already. Then I created the 'Get Page' operator and set up all the required parameters. OK, so now my problem is, how can I retrieve the result of 'Get Page' operator? In RM5 I can see that the output of 'Get Page' is a Document. But unfortunately, I cannot find Document even in the javadoc. Below is the my code so far.:
Can anyone help me on how to obtain the result from 'Get Page' operator?
RapidMiner.init();
try
{
// Load dataset
Operator op = OperatorService.createOperator("web:extract_html_text_content");
op.setEnabled(true);
op.setExpanded(true);
op.setParameter("random_user_agent", "true");
op.setParameter("url", "http://news.google.com");
IOContainer container = op.apply( new IOContainer() );
}
Thank you!
Tagged:
0
Answers
two things to note:
1) To be allowed to use RapidMiner for your project, it needs to be published under the same license as RapidMiner, which is the AGPL.
2) Document is class from the Text plugin, so you will need to add that as well.
And we have a Development forum for these questions, so I moved your thread
Regards,
Marco