The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Extracting text from Youtube[SOLVED]"
geschwader
Member Posts: 16 Contributor II
I am trying to use "Get Pages" operator to extract some text from Youtube pages, but get the following error:
Everything is fine with my Internet connection and I can watch YT with my browser.
Everything is fine with my Internet connection and I can watch YT with my browser.
Tagged:
0
Answers
can you please post a example process according to http://rapid-i.com/rapidforum/index.php/topic,4654.0.html ?
If i just use the Get Pages operator everything works fine for me.
Best,
Nils
http://usic.org.ua/upload/8f4ea2c9c93f56c4d624d97b63a81b8c4ebad2f3/Links.csv
Thank you in advance for your help.
there seems to be a problem in your .csv file. The last entry has a white space at the beginning which is not allowed. If you remove the white space the process works fine.
Best,
Nils
it seems you named attributes using Eastern-Europe/Cyrillic character set, which gives a very funky "Youtube_extract.csv" at the end :-)
in the last line of your CSV there's this blank space mentionned by Nils before the "http://", which caused an error for me too. I removed the last empty line too.
have a nice day!
Alex
http://usic.org.ua/upload/2abf7ee83587eab166bf9e956e38b95df362fea4/Links2.csv
And here what I get:
http://usic.org.ua/upload/e5a607d60d4d88f4ed9a43e678ea03d2b8ba718a/Youtube_extract.csv
No pages extracted.
maybe could you explain what you're trying to do? :-)
But I simply doesn't have page extracted!
It reads the "links_to_check.xls" which contains the raw URL list to save
Then saves each page in HTML in a directory for further use.
Maybe isn't the coding very elegant but it works for what i needed.
Some parts of the code were taken here and there.
Alex
PS : I just started to learn RapidMiner coding 2 days ago. Ask Nils for complicated questions :-)
______________________________________________________
You can try it, it works
there is a problem with your process. The "Read CSV" result set has no attribute called "Links2" but an attribute called "Links". Change the "link attribute" attribute from "Get Pages" from Links2 to Links and it should work.
But still there should be an error instead of just shown an empty result set..
*edit* With the next update an error will be thrown if the selected attribute does not exists.
Best,
Nils
Thank you for your support.