The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

how to read documents with their file names in exampleset

LeiLei Member Posts: 12 Learner I
I would like to read some document files from a folder (not read all files in folder).  The file names which will be read are saved in an excel file. 

The read document operator can be used to read file by giving file name. I can use read excel operator to load file name file to exampleset, and get each file name. My question is how to use obtained file name from exampleset to pass read document operator. 

Is there anyone who can help me for this question?

Thank you very much.
Tagged:

Best Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted
    Hi!

    I hope I understand your description correctly.

    You have file names in an Excel file. You can use Read Excel to use this.
    Then you could use Loop Values with the file name attribute and inside the loop Read Document. 

    The "iteration macro" (loop_value by default) contains the current file name. You can include the contents of a macro with Generate Attributes and using the macro syntax %{loop_value}. 

    Regards,
    Balázs
  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted
    Hi!

    RapidMiner has different kinds of objects passed around in the process, marked by the color of the connection and the connection ports.

    You connected the incoming input of Loop Values - an example set (data table) - to the file input of Read Document. This won't work. You have the loop_value macro inside the loop so you can use that as the file name. Just enter %{loop_value} as the file parameter of Read Document.

    The output of Read Document is a document object, not an example set. If you want to add an attribute (like the file path with Generate Attributes), you will need to convert the document to an example set. How to do this depends on your use case. For example, you would use one of the Extract operators.

    Regards,
    Balázs

Answers

  • LeiLei Member Posts: 12 Learner I
    Hi, Balazs,

    Your answer is very helpful.
    I followed your suggestion, but got a problem alert: "Your connection is producing the wrong type of data. Try changing the starting point of the connection". There is other problem in my rmp file, I think.

    I upload my rmp file here. Could you help me to find which mistake I have made?
Sign In or Register to comment.