Generating a data set for testing
Hello,
Computer engineer student here, new to data science but what I want is fairly simple in notion but I couldn't find the right operators to do it yet or maybe I have and don't know how to use them, so here we go:
1.I have 22 attributes, 20 of which I want them to be integers that very from 0.2 to 2.8 depending on the attribute (the first 2 are just strings).
2.Is there a way to generate with dependency on what was generate before, need an example to explain better, lets say we have one example with attribute 1 that generated 1.4 that's, 0.4 above average for that specific attribute, so the next one, attribute 2, will generate 0.9 (0.5 which is the average for that attribute + the difference from the one before 0.4 so 0.5+0.4) making the generation pseudo-random.
<?xml version="1.0" encoding="UTF-8"?><process version="8.1.001">
<operator activated="true" class="generate_data_user_specification" compatibility="8.1.001" expanded="true" height="68" name="Generate Data by User Specification" width="90" x="313" y="136">
<list key="attribute_values">
<parameter key="ID" value="NOMINAL"/>
<parameter key="Name" value="NOMINAL"/>
<parameter key="P1" value="REAL"/>
<parameter key="P2" value="REAL"/>
<parameter key="P3" value="REAL"/>
<parameter key="P4" value="REAL"/>
<parameter key="P5" value="REAL"/>
<parameter key="P6" value="REAL"/>
<parameter key="P7" value="REAL"/>
<parameter key="P8" value="REAL"/>
<parameter key="P9" value="REAL"/>
<parameter key="P10" value="REAL"/>
<parameter key="P11" value="REAL"/>
<parameter key="P12" value="REAL"/>
<parameter key="P13" value="REAL"/>
<parameter key="P14" value="REAL"/>
<parameter key="P15" value="REAL"/>
<parameter key="P16" value="REAL"/>
<parameter key="P17" value="REAL"/>
<parameter key="P18" value="REAL"/>
<parameter key="P19" value="REAL"/>
<parameter key="P20" value="REAL"/>
</list>
<list key="set_additional_roles">
<parameter key="ID" value="id"/>
<parameter key="Name" value="label"/>
</list>
</operator>
</process>
I am definitely doing something wrong :smileysad:
Best Answer
-
kypexin RapidMiner Certified Analyst, Member Posts: 291 Unicorn
Hi @pettudor
2.Is there a way to generate with dependency on what was generate before, need an example to explain better,
lets say we have one example with attribute 1 that generated 1.4 that's, 0.4 above average for that specific attribute
I am a bit confused with the description.
The answer for the first part is yes, there is an operator 'Generate attributes' that allows you to construct new attributes based on already existing ones, and that's pretty easy. You even may do some aggregations so that you can generate new attributes based not only on existing previous values, but also using such aggregated values like mean, median, sum etc etc.
The second part though is confusing. You say this first attribute woul have value = 1.4 for some certain example, but what exactly this value is based upon? You need either to generate the first attribute pseudo-randomly, or base its values on already existing data.
Could you please clarify?
2
Answers
hi @pettudor welcome to the community. So first I want to say CONGRATULATIONS - you're the first "newbie" I have seen in a long while who actually read the directions and posted their XML process with their first post.
So back to your question....so I'm not sure if you have 22 attributes from your own data set, or you want to create 22 attributes from random data. If it's the former, just use the "Add Data" wizard in the Repository panel and go through the steps:
If you want to create random data, use the "Generate Data" operator rather than the "Generate Data by User Specification"":
The default for this is to create six attributes: five "regular" attributes of real numbers, and one "label" attribute with real numbers:
You can then modify these with other operators to make them strings, integers, etc...:
Let me know if that makes sense.
Scott
So after the generation of one attribute of 100 random examples I just used the operator generate attribute, gave it a dependency formula and bobs your uncle I have what I want.
Added the code, such an easy task in reprospect :catfrustrated:
Must thank you all for the patience of reading this mess of a post, have a great day.