The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"[SOLVED] Processs documents from files"
Hello Everyone.
I am trying to process several documents with the "Process Documents from files" operator. In the first case all files where on the same directory and everything went perfect. In the second case files are inside sub-folders so I didn´t get any results.
After investigating I am trying with the "Loop file" Operator. In the sub-process of the loop operator I have 2 more operators.
1. Provide Macro as log value
2. Process documents from files
I don´t get any errors but I don´t get any output either. If I place a breakpoint after "Process documents from file" , I can see that it process the first directory correctly but still can get the output.
Here is an example:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.012">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.012" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="loop_files" compatibility="5.3.012" expanded="true" height="94" name="Loop Files" width="90" x="246" y="120">
<parameter key="directory" value="C:\Users\ojuarez\httrack\Curacao\www.lacuracaonline.com\guatemala\productos\audio-y-video\televisores"/>
<parameter key="recursive" value="true"/>
<parameter key="iterate_over_subdirs" value="true"/>
<process expanded="true">
<operator activated="true" class="provide_macro_as_log_value" compatibility="5.3.012" expanded="true" height="94" name="Provide Macro as Log Value" width="90" x="112" y="120">
<parameter key="macro_name" value="file_name"/>
</operator>
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.001" expanded="true" height="76" name="Process Documents from Files" width="90" x="179" y="345">
<list key="text_directories">
<parameter key="archivos" value="%{file_path}"/>
</list>
<parameter key="extract_text_only" value="false"/>
<process expanded="true">
<connect from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log" compatibility="5.3.012" expanded="true" height="76" name="Log" width="90" x="514" y="120">
<parameter key="filename" value="C:\Users\ojuarez\httrack\log_1"/>
<list key="log">
<parameter key="filename" value="operator.Provide Macro as Log Value.value.macro_value"/>
</list>
</operator>
<connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Log" to_port="through 1"/>
<connect from_op="Provide Macro as Log Value" from_port="through 2" to_op="Process Documents from Files" to_port="word list"/>
<connect from_op="Process Documents from Files" from_port="example set" to_port="out 2"/>
<connect from_op="Log" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="252"/>
<portSpacing port="sink_out 3" spacing="0"/>
</process>
</operator>
<connect from_op="Loop Files" from_port="out 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I am trying to process several documents with the "Process Documents from files" operator. In the first case all files where on the same directory and everything went perfect. In the second case files are inside sub-folders so I didn´t get any results.
After investigating I am trying with the "Loop file" Operator. In the sub-process of the loop operator I have 2 more operators.
1. Provide Macro as log value
2. Process documents from files
I don´t get any errors but I don´t get any output either. If I place a breakpoint after "Process documents from file" , I can see that it process the first directory correctly but still can get the output.
Here is an example:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.012">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.012" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="loop_files" compatibility="5.3.012" expanded="true" height="94" name="Loop Files" width="90" x="246" y="120">
<parameter key="directory" value="C:\Users\ojuarez\httrack\Curacao\www.lacuracaonline.com\guatemala\productos\audio-y-video\televisores"/>
<parameter key="recursive" value="true"/>
<parameter key="iterate_over_subdirs" value="true"/>
<process expanded="true">
<operator activated="true" class="provide_macro_as_log_value" compatibility="5.3.012" expanded="true" height="94" name="Provide Macro as Log Value" width="90" x="112" y="120">
<parameter key="macro_name" value="file_name"/>
</operator>
<operator activated="true" class="text:process_document_from_file" compatibility="5.3.001" expanded="true" height="76" name="Process Documents from Files" width="90" x="179" y="345">
<list key="text_directories">
<parameter key="archivos" value="%{file_path}"/>
</list>
<parameter key="extract_text_only" value="false"/>
<process expanded="true">
<connect from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="log" compatibility="5.3.012" expanded="true" height="76" name="Log" width="90" x="514" y="120">
<parameter key="filename" value="C:\Users\ojuarez\httrack\log_1"/>
<list key="log">
<parameter key="filename" value="operator.Provide Macro as Log Value.value.macro_value"/>
</list>
</operator>
<connect from_op="Provide Macro as Log Value" from_port="through 1" to_op="Log" to_port="through 1"/>
<connect from_op="Provide Macro as Log Value" from_port="through 2" to_op="Process Documents from Files" to_port="word list"/>
<connect from_op="Process Documents from Files" from_port="example set" to_port="out 2"/>
<connect from_op="Log" from_port="through 1" to_port="out 1"/>
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="252"/>
<portSpacing port="sink_out 3" spacing="0"/>
</process>
</operator>
<connect from_op="Loop Files" from_port="out 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
Please check the process attached. I changed
operator to "Process Documents from Data". For text
analysis you should activate some tokenizing inside.
Happy mining,
Frank
I was missing the Append operator in the process top level, after the loop files operator. Once I added it everything worked as expected.
I am really grateful for your help, it was driving me crazy!