Generate multiple new attribute names for a single attribute
Hello Community:
I want to create a new attribute by changing several attribute names within another attribute. Data set:
StudentID | Major1 |
1 | Studio Arts |
2 | Cinematic Arts |
3 | Museum Studies |
4 | Business |
5 | Creative Writing |
6 | Liberal Studies |
I want to create a new attribute Major2 with Studio Arts, Cinematic Arts, and Museum Studies listed as Arts. I want Business, Creative Writing, and Liberal Studies listed as Nonarts. I wrote six lines in Generate Attributes, but it only performs the last change.
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve GenerateAttributeDataset" width="90" x="112" y="187">
<parameter key="repository_entry" value="//F Drive Repository/NASAD HEADS Survey/GenerateAttributeDataset"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="8.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="187">
<list key="function_descriptions">
<parameter key="Major2" value="replaceAll(Major1,"Studio Arts","Arts")"/>
<parameter key="Major2" value="replaceAll(Major1,"Cinematic Arts","Arts")"/>
<parameter key="Major2" value="replaceAll(Major1,"Museum Studies","Arts")"/>
<parameter key="Major2" value="replaceAll(Major1,"Business","Nonarts")"/>
<parameter key="Major2" value="replaceAll(Major1,"Creative Writing","Nonarts")"/>
<parameter key="Major2" value="replaceAll(Major1,"Liberal Studies","Nonarts")"/>
</list>
</operator>
<connect from_op="Retrieve GenerateAttributeDataset" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
BUT, if I don't create a new attribute, and just re-code Major1, it works.
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve GenerateAttributeDataset" width="90" x="112" y="187">
<parameter key="repository_entry" value="//F Drive Repository/NASAD HEADS Survey/GenerateAttributeDataset"/>
</operator>
<operator activated="true" class="generate_attributes" compatibility="8.0.001" expanded="true" height="82" name="Generate Attributes" width="90" x="313" y="187">
<list key="function_descriptions">
<parameter key="Major1" value="replaceAll(Major1,"Studio Arts","Arts")"/>
<parameter key="Major1" value="replaceAll(Major1,"Cinematic Arts","Arts")"/>
<parameter key="Major1" value="replaceAll(Major1,"Museum Studies","Arts")"/>
<parameter key="Major1" value="replaceAll(Major1,"Business","Nonarts")"/>
<parameter key="Major1" value="replaceAll(Major1,"Creative Writing","Nonarts")"/>
<parameter key="Major1" value="replaceAll(Major1,"Liberal Studies","Nonarts")"/>
</list>
</operator>
<connect from_op="Retrieve GenerateAttributeDataset" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
<connect from_op="Generate Attributes" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
I'd like to create the new attribute and keep the old one, rather than over-write the old one. How can I do that?
Thanks!
Best Answer
-
sgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
hi @Bigblackchair (ok that's the best username I've seen in a while ) - I'd do it with a joined lookup table rather than Generate Attributes:
<?xml version="1.0" encoding="UTF-8"?><process version="8.0.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.0.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve lookup data set" width="90" x="45" y="85">
<parameter key="repository_entry" value="//RapidMiner OneDrive/random community stuff/lookup data set"/>
</operator>
<operator activated="true" class="retrieve" compatibility="8.0.001" expanded="true" height="68" name="Retrieve lookup table" width="90" x="45" y="187">
<parameter key="repository_entry" value="//RapidMiner OneDrive/random community stuff/lookup table"/>
</operator>
<operator activated="true" class="join" compatibility="8.0.001" expanded="true" height="82" name="Join" width="90" x="179" y="85">
<parameter key="join_type" value="left"/>
<parameter key="use_id_attribute_as_key" value="false"/>
<list key="key_attributes">
<parameter key="Major1" value="Major1"/>
</list>
</operator>
<connect from_op="Retrieve lookup data set" from_port="output" to_op="Join" to_port="left"/>
<connect from_op="Retrieve lookup table" from_port="output" to_op="Join" to_port="right"/>
<connect from_op="Join" from_port="join" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>ExampleSets attached.
Scott0
Answers
Scott: Perfect! Thanks, Bigblackchair