Iterating through a flat list *.dcm/*.tag file pairs and applying some processing on each pair
Hi,
I have a flat list of files like this:
IM001.dcm
IM001.tag
IM002.dcm
IM002.tag
...
I'd like to iterate over this list and apply some processing on each *.dcm/*.tag pair, i.e., inside the "Loop Files" operator I'd like to have access to (IM001.dcm, IM001.tag), (IM002.dcm, IM002.tag), etc.
In Python this is easy but I'd like to learn how to do such file manipulation in RM
Is this possible?
Ralph
Best Answers
-
yyhuang Administrator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
That is also easy with 'loop files' operator in RapidMiner. You can use regex to pick the list of names for .dcm files. And inside the loop you do something 'magic' to create the corresponding '.tag' file name with the Macro creations.
With the newly created file name for .tag you can do whatever you want to load, extract info from tag.
The number of iteration here is n/2, instead of n for python
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.1.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="concurrency:loop_files" compatibility="8.1.000" expanded="true" height="82" name="Loop Files" width="90" x="916" y="34">
<parameter key="directory" value="C:\Users\YuanyuanHuang\Documents\RMCommunity\testLoopFiles"/>
<parameter key="filter_type" value="regex"/>
<parameter key="filter_by_regex" value=".*dcm"/>
<parameter key="enable_macros" value="true"/>
<process expanded="true">
<operator activated="true" class="image:read_image" compatibility="7.0.000" expanded="true" height="68" name="Read Image" width="90" x="246" y="34"/>
<operator activated="true" class="generate_macro" compatibility="8.1.000" expanded="true" height="82" name="Generate Macro" width="90" x="380" y="34">
<list key="function_descriptions">
<parameter key="tag_file_name" value="concat(prefix(%{file_name},index(%{file_name},".")),".tag")"/>
</list>
</operator>
<connect from_port="file object" to_op="Read Image" to_port="file"/>
<connect from_op="Read Image" from_port="output" to_op="Generate Macro" to_port="through 1"/>
<connect from_op="Generate Macro" from_port="through 1" to_port="output 1"/>
<portSpacing port="source_file object" spacing="0"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Loop Files" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>1 -
yyhuang Administrator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
@ralph_brecheise the error message told you there is missing extensions. You will need at least "operator toolbox", "converters" extensions from marketplace.
Any questions, please let us know.
1
Answers
Yes, in the Loop Files operator you can filter out by Regex, just enable the regex and do something like this as your filter .*.tag|.*.dcm
Hi yyhuang,
Thanks for the quick reply! Your solution makes a lot of sense.
I'll give a try. Unfortunately, I couldn't load your process because I don't have the "Read Image" operator but the overall idea is clear to me.
Ralph
I added a 2nd macro inside the "Generate Macro" operator called %{dcm_file_name} so now I should be able to use both in any downstream operators.
However, I'd like to process the dcm/tag pairs using the "Execute Python" operator. Can I access the macro variables from there? The documentation doesn't seem to mention macros.
Ralph
Hi @ralph_brecheise
Thanks for the followup! The 'read image' is an operator from IMMI, image mining extension. Since you used dcm file, I tought it could be an image....
http://www.burgsys.com/image-analysis-software.php
The link of a solid Image Mining extension for RapidMiner. The burgsys released it under the AGPL license free of charge.
Tutorial doc can be found in the downloaded folder and you can manually install the downloaded jar file for IMMI extension following this link
Cheers,
YY
Hi,
Thanks for the IMMI tip! I'll look into it.
Do you have any suggestions about that pair-wise processing issue I sneaked into my previous message?
Thanks!
Ralph
@ralph_brecheise, the community search is your friend! Lots of pairwise related posts here: https://community.rapidminer.com/t5/forums/searchpage/tab/message?advanced=false&allow_punctuation=false&q=pairwise
Thanks for the link but not every post that contains the word "pair-wise" addresses my question. I did search but could not find anything specific.
A more concrete suggestion would be appreciated.
Ralph
Hi @ralph_brecheise
The following example shows how to use macro together with you python scripts. Credit goes to @JEdward
In the scripts it is using the evolutionary optimize methods to search for the best hyper-parameter setup for the python Random Forest.
The whole process is computational intensive, and need about 5 min to finish on my lappie.
Cheers,
YY
Great yyhuang! I'll try that approach and give you a heads up if I get it working.
Ralph
Hi YY
I'm getting the following error when loading the example process XML. Looks like we're on different versions of RM (I'm using 8.1). Any chance you have an example that's more compatible? Or am I missing an extension?
Cheers, Ralph
always about six ways to do anything like this in RapidMiner! Here's another approach (N.B. you will need the Operator Toolbox from the Marketplace):
Scott