The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Use pdf file name as attribute
Hello everyone
I want to do some simple Text Mining using pdf files in RM but I'm a little stuck right now.
I created a process using the loop files and process document operator for reading in several pdf files.
As I have a lot of files to analyze, which I also want to compare, I would like to create an attribute which includes the file name to keep track of everything.
I enabled macros and tried to include the file name by generating a new attribute.
The problem is that the generated attribute only consists of the file name of the last file I uploaded and not the name of the corresponding document. How can I ensure that the attribute value is the respective file name of the document?
Or is there a way to just include the metadata_file as an attribute?
I included my process and the first 5 files I want to read.
I would really appreciate every help, thank you already in advance!
I want to do some simple Text Mining using pdf files in RM but I'm a little stuck right now.
I created a process using the loop files and process document operator for reading in several pdf files.
As I have a lot of files to analyze, which I also want to compare, I would like to create an attribute which includes the file name to keep track of everything.
I enabled macros and tried to include the file name by generating a new attribute.
The problem is that the generated attribute only consists of the file name of the last file I uploaded and not the name of the corresponding document. How can I ensure that the attribute value is the respective file name of the document?
Or is there a way to just include the metadata_file as an attribute?
I included my process and the first 5 files I want to read.
I would really appreciate every help, thank you already in advance!
0
Best Answer
-
jwpfau Employee-RapidMiner, Member Posts: 303 RM EngineeringHi Veronika,
yes, you can select Process → Synchronize Meta Data with Real Data.
But then you have to run it once to populate the Meta Data.
Greetings,
Jonas
1
Answers
couldn't you throw out the surplus metadata attributes with
Select Attributes
type exclude attributes
attribute filter type: subset
select subset: select the metadata fields that you don't need
Greetings,
Jonas
thank you for your answer!
I'm not sure what exactly you mean, because the metadata attributes don't show up in the select attributes operator.
Is there a way to turn metadata into "real" data?
Greetings
Veronika
thank you very much, now it works!
Greetings
Veronika