The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Website-Content into one cell

ds139ds139 Member Posts: 1 Learner III
edited November 2018 in Help

Hello everyone,

I want to use textmining methods on the lyrics of a website.

What I have now is:

                                                                               

 Artist  Song  Lyrics
 The Killers   Mr. Brightside   http://lyrics.html 

 

What I do want is:                                                                                 

 Artist  Song  Lyrics
 The Killers   Mr. Brightside    Coming out of my cage and I'm doing just fine... 

 

You know what I mean?  The Lyrics are written within a <p></p> and I want the whole string into one single cell - 

I do know, that I need "Retrieve", "Get Pages" and "Process Documents to Data" (inside: "Extract Content", and the I don't know any further,...)

 

Which Operator manages it, that the content within the <p> is put into one cell

I hope someone can help me, because I need the Lyrics for further processings

Thank you

 

Answers

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    I think you want "Cut Document" rather than (or in addition to) "Extract Content" in this case.  After you have retrieved the pages using "Get Pages" and then created your text documents using "Data to Documents" you can use Cut Document and then specify the region of the html that you want to extract using either Xpath (if the lyrics are in a named element) or some kind of regex query.

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.