The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Google Scholar Citation Extraction
sgenzer
Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
Hello RapidMiners -
So today I had the task to extract and organize content from a Google Scholar query. Google does a very good job preventing you from scraping/crawling so you have to start "old school" by going to each page of your search and saving the html as a text file. Once you do that, you can clean it all up and organize, etc... I did a search for the keyword "rapidminer" (of course), saved the first 100 pages (tedious but not too bad), and then used the attached process to clean it all up. Maybe some of you will find this useful?
Scott
Tagged:
2
Comments
Would you please give us the xml version of this model ?
I found some problems to run it in Rapidminer 8.2.001
hi @puserc - the XML is there in the attachment to the article. An ".rmp" file in RapidMiner is exactly the same as the XML you see.
I know, the problem is that I couldn't run directly, there are some issues for some nodes. That's why I've asked for the XML version.
just open that .rmp in any text editor - copy and paste the XML into RapidMiner XML panel. That should do the trick.
Hi @sgenzer
I am a new learner of RapidMiner and have the same task. I want to extract the Google Citations. I have run through the tutorial of RapidMiner for a bigenner level learning. Can you please explain me a little more for a head start that how have you built the process. It will be a great help for me.
I am also keen to learn the text mining in depth on RapidMiner for extracting information from published research articles. Can you or anyone else pleae also advise me some good learning resources?
Thanks in anticipation
Mudassar
19316071@student.westernsydney.edu.au