nominal to binominal in large DataSEts

dehghan-v · May 2011

with hello , i have problem to preparing my dataset.
i work in tehran traffic transaction database .
this data include this attributes:
iD،HighWayCode,day,AirCondition,TrafficType,Time

this data set is over 530000 records.

i decide to work on association rule mining with this dataset . for example fp-growth
this attributes to work with this alghoritm(ARM) must convert to binominal.
day ,aircondition ,traffictype ,successfully converted to binominal in rapidminer .
but when converting HighWayCode to binominal crashed.

i read data from database -select attribute-nominaltobinomial-write to database

can any one help me to solve this problem????plz

i mentioned that attribute HighWayCode is 1000 record

Marco_Boeck · May 2011

Hi,

I'm afraid more information is needed to provide any help here.
Please post your process xml (to get that, select the xml tab over your RapidMiner process and copy&paste the contents) and the error message from the log. And - if possible - a sample line of data which leads to the crash would be very useful.

Regards,
Marco

dehghan-v · May 2011

hi
when run this code memory usage go to very high and rapidminer hanged.

examle row:1،Sunday,12:00-1:00,Fluent,cloudy

this xml code :

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input>
<location/>
</input>
<output>
<location/>
</output>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="363" width="827">
<operator activated="true" class="read_database" expanded="true" height="60" name="Read Database" width="90" x="45" y="30">
<parameter key="connection" value="1"/>
<parameter key="query" value="SELECT *
FROM dbo.BOZ
where id>=250000 and id<550000"/>
</operator>
<operator activated="true" class="select_attributes" expanded="true" height="76" name="Select Attributes" width="90" x="182" y="18"/>
<operator activated="true" class="nominal_to_binominal" expanded="true" height="94" name="Nominal to Binominal" width="90" x="372" y="22">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="CodeBozorgRah"/>
</operator>
<operator activated="true" class="write_database" expanded="true" height="60" name="Write Database" width="90" x="645" y="91">
<parameter key="connection" value="1"/>
<parameter key="table_name" value="BOZ2"/>
<parameter key="overwrite_mode" value="overwrite first, append then"/>
</operator>
<connect from_op="Read Database" from_port="output" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Binominal" to_port="example set input"/>
<connect from_op="Nominal to Binominal" from_port="example set output" to_op="Write Database" to_port="input"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
</process>
</operator>
</process>

CodeBozorgRah=HighWayCode

Marco_Boeck · June 2011

Hi,

for each different string in a polynominal attribute a new attribute is created if you're converting it to binominal ("attribute = 1", "attribute = 2", etc), for large data sets with thousands of different entries the result will be really large. Therefore you may either increase the memory available for RapidMiner (see this) or use a different learning scheme (see the example processes in the samples repository).

Regards,
Marco

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

nominal to binominal in large DataSEts

Answers