The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
nominal to binominal in large DataSEts
with hello , i have problem to preparing my dataset.
i work in tehran traffic transaction database .
this data include this attributes:
iD،HighWayCode,day,AirCondition,TrafficType,Time
this data set is over 530000 records.
i decide to work on association rule mining with this dataset . for example fp-growth
this attributes to work with this alghoritm(ARM) must convert to binominal.
day ,aircondition ,traffictype ,successfully converted to binominal in rapidminer .
but when converting HighWayCode to binominal crashed.
i read data from database -select attribute-nominaltobinomial-write to database
can any one help me to solve this problem????plz
i mentioned that attribute HighWayCode is 1000 record
i work in tehran traffic transaction database .
this data include this attributes:
iD،HighWayCode,day,AirCondition,TrafficType,Time
this data set is over 530000 records.
i decide to work on association rule mining with this dataset . for example fp-growth
this attributes to work with this alghoritm(ARM) must convert to binominal.
day ,aircondition ,traffictype ,successfully converted to binominal in rapidminer .
but when converting HighWayCode to binominal crashed.
i read data from database -select attribute-nominaltobinomial-write to database
can any one help me to solve this problem????plz
i mentioned that attribute HighWayCode is 1000 record
0
Answers
I'm afraid more information is needed to provide any help here.
Please post your process xml (to get that, select the xml tab over your RapidMiner process and copy&paste the contents) and the error message from the log. And - if possible - a sample line of data which leads to the crash would be very useful.
Regards,
Marco
when run this code memory usage go to very high and rapidminer hanged.
examle row:1،Sunday,12:00-1:00,Fluent,cloudy
this xml code :
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="363" width="827">
<operator activated="true" class="read_database" expanded="true" height="60" name="Read Database" width="90" x="45" y="30">
<parameter key="connection" value="1"/>
<parameter key="query" value="SELECT * FROM dbo.BOZ where id>=250000 and id<550000"/>
</operator>
<operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="182" y="18"/>
<operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="372" y="22">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="CodeBozorgRah"/>
</operator>
<operator activated="true" class="write_database" expanded="true" height="60" name="Write Database" width="90" x="645" y="91">
<parameter key="connection" value="1"/>
<parameter key="table_name" value="BOZ2"/>
<parameter key="overwrite_mode" value="overwrite first, append then"/>
</operator>
<connect from_op="Read Database" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_op="Write Database" to_port="input"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>
CodeBozorgRah=HighWayCode
for each different string in a polynominal attribute a new attribute is created if you're converting it to binominal ("attribute = 1", "attribute = 2", etc), for large data sets with thousands of different entries the result will be really large. Therefore you may either increase the memory available for RapidMiner (see this) or use a different learning scheme (see the example processes in the samples repository).
Regards,
Marco