The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Loop operator issues
Hello, everyone
i want to crawl a web with loop operator. the website defines first page is the latest pages. but the information i want to get is the first five page of the newest.
can loop operator in rapidminer do iteration backwards?
i want to crawl a web with loop operator. the website defines first page is the latest pages. but the information i want to get is the first five page of the newest.
can loop operator in rapidminer do iteration backwards?
Tagged:
0
Best Answers
-
rfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 UnicornHello @rur68,For the sake of simplicity, I'll be using this url instead of the one you provide:You are iterating numbers 1, 2 and 3. Right?Well, with number 1, that URL becomes:With number 2, that URL becomes:With number 3, that URL becomes:If you have a known page number (e.g., 500), then you may use the Generate Macro operator, giving a name to the macro generated (like: "calculated_page", with the following code:
<b>500</b> + 1 - eval(%{iteration})
That way, with number 1 you will get:With number 2, that URL becomes:With number 3, that URL becomes:However, that is for a known number. If you are seeking for an unknown number, (e.g., some 1000 new results come every day and you want to crawl those), then you might be out of luck (but people in this community is amazing, they might come up with a solution) and I would recommend you to use something not-so-rapidminer-ish like httrack on UNIX machines (Linux, Mac) to grab an updated copy of the site and use indexes or other tricks under the sleeve to handle these as files.Word of caution: httrack and other site crawlers might be prohibited in your country, your mileage may vary.Hope this helps, if I can come up with a better solution, then I'll be back to this thread.All the best,Rodrigo.6 -
rfuentealba RapidMiner Certified Analyst, Member, University Professor Posts: 568 UnicornGreat, glad it helped!
5
Answers
here is the example process. i want to get the newest three pages of the website, but the result of the process is the latest three pages.