How to get index of particular Example Set?
Hey Guys,
I am new to Rapid Miner.I am following the below steps for my process :
1. I have a PDF file that I am parsing using ReadPDF option
2.Store the IOO object in the repository.
3.I retieve the IOO object from repository.
4. My tables in PDF get stored as example sets.
5.Using select option i read the Example Set with index 13.
6. Using filter i filter data in that Example set.
I want to provide the index to the Select option dynamically based on a condition. Condition would be select the Example Set having maximum matching attributes.
I am JAVA developer.Please let me know if we can do this using JAVA or any other way.
Please guide me as to how can i go about this.
Your help is greatly appreciated.
Best Answer
-
sgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
ok thx @jaya_darne - I just put a bunch of Handle Exceptions in. It's probably not the most elegant solution but it gets the job done.
<?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
<parameter key="logfile" value="C:\Users\1026445\.RapidMiner\repositories\Local Repository\logs\08_09_2017_logs.log"/>
<parameter key="resultfile" value="D:\1026445\Project Internal\CISCO- Jaya\Rapid Miner\L1_test.res"/>
<process expanded="true">
<operator activated="true" class="pdf_table_extraction:pdf2exampleset_operator" compatibility="0.1.004" expanded="true" height="68" name="Read PDF Table" width="90" x="45" y="34">
<parameter key="filename" value="/Users/sgenzer/Desktop/Package Qual report R25070FQ.pdf"/>
</operator>
<operator activated="true" class="loop_collection" compatibility="7.6.001" expanded="true" height="82" name="Loop Collection" width="90" x="179" y="34">
<process expanded="true">
<operator activated="true" class="handle_exception" compatibility="7.6.001" expanded="true" height="82" name="Handle Exception" width="90" x="45" y="34">
<process expanded="true">
<operator activated="true" class="branch" compatibility="7.6.001" expanded="true" height="82" name="Branch (3)" width="90" x="45" y="34">
<parameter key="condition_type" value="attribute_available"/>
<parameter key="condition_value" value="Test Condition"/>
<parameter key="expression" value="if(contains([Stress Test],"Convection"),TRUE,FALSE)"/>
<process expanded="true">
<connect from_port="condition" to_port="input 1"/>
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
<process expanded="true">
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Test Condition exists</description>
</operator>
<connect from_port="in 1" to_op="Branch (3)" to_port="condition"/>
<connect from_op="Branch (3)" from_port="input 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<process expanded="true">
<connect from_port="in 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="handle_exception" compatibility="7.6.001" expanded="true" height="82" name="Handle Exception (2)" width="90" x="179" y="34">
<process expanded="true">
<operator activated="true" class="branch" compatibility="7.6.001" expanded="true" height="82" name="Branch (4)" width="90" x="112" y="34">
<parameter key="condition_type" value="attribute_available"/>
<parameter key="condition_value" value="Stress Test"/>
<parameter key="expression" value="if(contains(Stress,"Convection"),TRUE,FALSE)"/>
<process expanded="true">
<connect from_port="condition" to_port="input 1"/>
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
<process expanded="true">
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Stress Test exists</description>
</operator>
<connect from_port="in 1" to_op="Branch (4)" to_port="condition"/>
<connect from_op="Branch (4)" from_port="input 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<process expanded="true">
<connect from_port="in 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="handle_exception" compatibility="7.6.001" expanded="true" height="82" name="Handle Exception (3)" width="90" x="313" y="34">
<process expanded="true">
<operator activated="true" class="filter_examples" compatibility="7.6.001" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34">
<list key="filters_list">
<parameter key="filters_entry_key" value="Stress Test.contains.Convection S"/>
<parameter key="filters_entry_key" value="Stress Test.contains.Temperature"/>
<parameter key="filters_entry_key" value="Stress Test.contains.High Temperature"/>
</list>
<parameter key="filters_logic_and" value="false"/>
<description align="center" color="transparent" colored="false" width="126">Stress Test contains Convection and Temperature and High Temp</description>
</operator>
<connect from_port="in 1" to_op="Filter Examples" to_port="example set input"/>
<connect from_op="Filter Examples" from_port="example set output" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<process expanded="true">
<connect from_port="in 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="handle_exception" compatibility="7.6.001" expanded="true" height="82" name="Handle Exception (4)" width="90" x="447" y="34">
<process expanded="true">
<operator activated="true" class="branch" compatibility="7.6.001" expanded="true" height="82" name="Branch (5)" width="90" x="112" y="34">
<parameter key="condition_type" value="min_examples"/>
<parameter key="condition_value" value="1"/>
<process expanded="true">
<connect from_port="condition" to_port="input 1"/>
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
<process expanded="true">
<portSpacing port="source_condition" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_input 1" spacing="0"/>
<portSpacing port="sink_input 2" spacing="0"/>
</process>
<description align="center" color="transparent" colored="false" width="126">Number of Examples is &#8805;1</description>
</operator>
<connect from_port="in 1" to_op="Branch (5)" to_port="condition"/>
<connect from_op="Branch (5)" from_port="input 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
<process expanded="true">
<connect from_port="in 1" to_port="out 1"/>
<portSpacing port="source_in 1" spacing="0"/>
<portSpacing port="source_in 2" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
</process>
</operator>
<connect from_port="single" to_op="Handle Exception" to_port="in 1"/>
<connect from_op="Handle Exception" from_port="out 1" to_op="Handle Exception (2)" to_port="in 1"/>
<connect from_op="Handle Exception (2)" from_port="out 1" to_op="Handle Exception (3)" to_port="in 1"/>
<connect from_op="Handle Exception (3)" from_port="out 1" to_op="Handle Exception (4)" to_port="in 1"/>
<connect from_op="Handle Exception (4)" from_port="out 1" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read PDF Table" from_port="collection of pdf data tables as example sets" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>Scott
0
Answers
hello @jaya_darne - welcome to the rapidminer user community. We are happy you are here.
I'm not really sure what you mean by "Using select option i read the Example Set with index 13". Can you help me understand a little better what you're trying to do? What would also be helpful is if you paste your RapidMiner XML process in a reply to this thread using the </> feature in the toolbar. Attaching the PDF would also be helpful if possible.
Thanks.
Scott
I want to write a solution in JAVA where i would be using this process to read multiple PDF files which are similar in nature. Post reading the PDF i want that the process should pick up the Example Set dynamically based on the attributes in the tables.
ah ok @jaya_darne I think I understand. It is hard to test without the PDF but perhaps you can look at this and tweak it for your purposes? The secret here is to use the Loop Collection operator.
Scott
Hey thank you so much..this is a great break through.. can we select more than one attribute values like Stress test and Test condition ? currently we are only using Stress test..
Also can we give a filter on the row values ? Like stress test having row values
hmm sure. How is this?
Scott
Hey i am getting an error when i am trying to run yesterday's solution. I had tried the same on my own as well but was getting the same error. I am using RM 7.6.001. Attaching the errors.
hello @jaya_darne - ok. Not too surprised. It's not happy because the Filter Examples expects examples even when we send it none. I really cannot test it myself further without the PDF in hand. I need to see how it cycles.
Scott
Hey attaching the PDF file for your reference.
Hey thank you so much for the solution..!!
If i need to use this IOO object in java for further processing,how do i access the object?
I am aware that we have a RM jar and the process function is used to run the process we define.
hello @jaya_darne - glad it worked for you. Don't know the answer to your Java question. Hope someone else can chime in.
Scott
Suppose i want to extract two tables from the PDF. One is on pg 2 and the other table on pg 3. Giving similar process as we did before. How can we achieve this?