The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"No files showing up in
Hi All,
Recently I've started using RapidMiner 5. I want to process a set of .doc files. I've a collection of .doc and .txt files in a folder. So I've added "Process Documents from Files" operator. While adding the directory, I tried to see the files inside it. But there is no file showing up inside the directory. I've also tried with only .txt file in a directory. Still, there is no files showing up in the directory. Please guide me.
Recently I've started using RapidMiner 5. I want to process a set of .doc files. I've a collection of .doc and .txt files in a folder. So I've added "Process Documents from Files" operator. While adding the directory, I tried to see the files inside it. But there is no file showing up inside the directory. I've also tried with only .txt file in a directory. Still, there is no files showing up in the directory. Please guide me.
Tagged:
0
Answers
Could you post the XML of the process?
regards
Andrew
the parameter "text directories" allows just selecting directories, not single files. So you either have to organize the important files in directories in a reasonable way or you have to use the "file pattern" parameter to read only certain documents from the specified directory. If you leave the default value as it is (*), all documents will be imported.
Regards
Matthias
@awchisholm
The following is my xml report:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.009">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.1.009" expanded="true" name="Process">
<process expanded="true" height="-20" width="-50">
<operator activated="true" class="text:process_document_from_file" compatibility="5.1.002" expanded="true" height="76" name="Process Documents from Files" width="90" x="121" y="116">
<list key="text_directories"/>
<process expanded="true">
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
</process>
</operator>
<connect from_op="Process Documents from Files" from_port="example set" to_port="result 1"/>
<connect from_op="Process Documents from Files" from_port="word list" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
@colo
I'm not accessing single file. For accessing single file, I tried Read Document and was able to read the content. But my case is to read all the documents inside a folder. The folder is having .doc, .xls, .xlsx and .txt
I even tried to have a single .txt file inside a folder. This time default value (*) should work. But still, the file inside the folder is not recognized.
I made a few changes - it works for me - you will need to create two folders c:\temp\class1 and c:\temp\class2 to put your files in. regards
Andrew
Created two folders with each having 1 .doc file. Added the "Process Documents from Files". Copied your xml and changed the key and value of parameter tag. Executed but not getting any result.
I've got the following problem/warning message in the bottom:
Mandatory input missing at port Process Documents from Files.document1
2 fix options:
Connect to Process Documents from Files.document
Insert operator generating Document...
Location:
Process Documents from Files.document1
The following is the log:
Sep 2, 2011 10:21:21 AM INFO: No filename given for result file, using stdout for logging results!
Sep 2, 2011 10:21:21 AM INFO: Process starts
Sep 2, 2011 10:21:22 AM INFO: Loading initial data.
Sep 2, 2011 10:21:22 AM INFO: Saving results.
Sep 2, 2011 10:21:22 AM INFO: Process finished successfully after 0 s
What I'm still surprised is, why I'm not seeing any files inside the folders? I clicked on "text directories" -> given a "class name" and selected a directory. In this stage, if I go further inside the folder, I see no files!!!
Sorry, this was a silly remark... The work around consisting in editing directly XML (see post from awchisholm) works ! I only forgot to validate my XML modifications...
Once again, sorry !