The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Web mining Rapidminer robot_filter"
Hello all,
I don't if it is the right place to post my request. I need to know how you ( a Rapid Miner user who uses it as a web miningusage tool)- when you're importing your web log file- do to set your robot_filter file.
It works when I type in my robot_filter file just [g|G]oogle for example. However I don't really want to do so for a thousand different bots...
So I've tried to find a list which I can paste in my file. On this website http://www.robotstxt.org/db/all.txt they offer the possibility to download the robots list in a .txt format .
But apparently RapidMiner doesn't like it, i got many errors due to bad characters and wrong enclosure...
So what do I have to do in order to have a proper robots list which can be read by rapidminer ?
Thank you in advance,
Antoine
I don't if it is the right place to post my request. I need to know how you ( a Rapid Miner user who uses it as a web miningusage tool)- when you're importing your web log file- do to set your robot_filter file.
It works when I type in my robot_filter file just [g|G]oogle for example. However I don't really want to do so for a thousand different bots...
So I've tried to find a list which I can paste in my file. On this website http://www.robotstxt.org/db/all.txt they offer the possibility to download the robots list in a .txt format .
But apparently RapidMiner doesn't like it, i got many errors due to bad characters and wrong enclosure...
So what do I have to do in order to have a proper robots list which can be read by rapidminer ?
Thank you in advance,
Antoine
Tagged:
0
Answers
what does RapidMiner complain about in detail? Unfortunately I'm not too familiar with the web mining operators, but I assume the file must consists of regular expressions? Then you would need to escape special characters of regular expressions, you will find some advice on this on google.
Greetings,
Sebastian