The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Text Mining

Chris85Chris85 Member Posts: 2 Contributor I
edited June 2019 in Help
Hello,

I´m writting my Diploma thesis about "structured and unstructured data in Business Intelligence". I need some information about the Text Mining-Plugin! How many languages are supported? Can I export the analyzed text?and who?like a XML? Is there an Text Mining Example?How many formats (pdf xls) are supported?
Thanks
Best regards
Chris

Answers

  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    How many languages are supported?
    In principle every language is supported which a) can be represented by characters at all and b) which consists of words which can be detected by some separation character or mechanism. There are, however, some specific operators (preprocessing steps) within the Text Extension which supports a fixed set of languages. For example the step 'stemming' which is supported for German, English, French, Spanish, Portuguese, Italian, Romanian, Dutch, Swedish, Norwegian, Danish, Finnish, Russian, Hungarian, Turkish. But not every process needs stemming and hence there is more often no language restriction at all. That is - in my opinion - one of the major advantages of a statistical approach compared to linguistic approaches.

    Can I export the analyzed text?
    Yes. (Dude! Read our marketing materials!  ;) )

    like a XML?
    In almost every format you can think of. With RapidAnalytics also in XSL which can then again be transformed by XSLT. Or you simply use some of the RapidMiner operators like "Write" which also directly support XML. Or you write into a database. Or...

    How many formats (pdf xls) are supported?
    Again: Read our marketing materials!

    You name the format. I will say 'Yes. This is supported.' In rare cases I would have to answer: 'Huh? Never heard of this one...'  ;D

    Cheers,
    Ingo
  • Chris85Chris85 Member Posts: 2 Contributor I
    thanks
    Can I send you my Text about RapidMinder and you can give me a feedback?
    Thanks
    Best regards
    Chris
  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi again,

    sure. You can send it here if you like or to contact@rapid-i.com

    Cheers,
    Ingo
  • andrea11andrea11 Member Posts: 18 Contributor II
    Ingo Mierswa wrote:

    Hi,

    In principle every language is supported which a) can be represented by characters at all and b) which consists of words which can be detected by some separation character or mechanism. There are, however, some specific operators (preprocessing steps) within the Text Extension which supports a fixed set of languages. For example the step 'stemming' which is supported for German, English, French, Spanish, Portuguese, Italian, Romanian, Dutch, Swedish, Norwegian, Danish, Finnish, Russian, Hungarian, Turkish. But not every process needs stemming and hence there is more often no language restriction at all. That is - in my opinion - one of the major advantages of a statistical approach compared to linguistic approaches.
    Can I ask you where the language packages? I'm using the Tokenize operator but i don't know where to find the "Italian" tokenizator...
    Thanks!
  • colocolo Member Posts: 236 Maven
    Hi Andrea,

    there is no language restriction for the existing tokenizer. Ingo referred to the stemmer when naming the supported languages. The tokenizer should work for Italian language. Do you experience problems when trying to tokenize Italian text?

    Regards
    Matthias
  • andrea11andrea11 Member Posts: 18 Contributor II
    Hi, thanks for your reply!
    I've not tried it, because i've got to use the Text:Filter Stopwords (English) Operator that is in english... Maybe there's an italian verion of Text:Filter Stopwords?

    Thanks!
  • colocolo Member Posts: 236 Maven
    Hi,

    having a look at the operators it seems there is no Italian stopword filter. In this case you will probably have to use the "Filter Stopwords (Dictionary)" operator that allows you to define your own stopwords. Maybe there is some public list available somewhere for common Italian stopwords!?

    Regards
    Matthias
  • andrea11andrea11 Member Posts: 18 Contributor II
    I was looking for something like it, but i could not find anything...Then, right now i'll test my project with english text and english stopwords filter. When i'll have time, i'll think if is possible to create a list from myself .. Thanks!
  • nabilophone11nabilophone11 Member Posts: 11 Contributor II
    Hi every body

    Please can you tel me how to insert a list of word ( about 600) automaticly in rapidminer 5.1.11, i find just a manual way with " Generate attribut "

    thanks for help


    N
Sign In or Register to comment.