The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

load text files

ghina84ghina84 Member Posts: 5 Contributor II
edited October 2019 in Help

Goodmorning everybody,

from the documentation I found on the website I cannot understand which operator I should use to load a serie of text files (.txt or .xml).

Can you help me please?

Thank you,

Laura

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Laura,
    surprisingly you should use an operator called "TextInput". You can specify directories, where the texts are read from, in the parameter texts. Each directory listed there is searched for text files and each text file becomes an example. A directory has to contain all examples of one label, since the directory structure is used for labeling the data.

    Greetings,
      Sebastian
  • ghina84ghina84 Member Posts: 5 Contributor II

    surprisingly I already tried it...

    but instead of gettin a matrix like this:

    rows=documents
    columns=terms

    I get a matrix like this:

    rows=id
    columns=documents (i.e. each attribute is one ENTIRE document)

    is it normal?...
  • DPierreDPierre Member Posts: 1 Learner II
    Where can I find the TextInput operator?
  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @DPierre this is an old thread. Try downloading the Text Processing extension from the marketplace and then using "Read Document". There is a good set of tutorials on the Academy for this: https://academy.rapidminer.com/courses/text-and-web-mining-with-rapidminer

    Scott
Sign In or Register to comment.