Tipical Workflow for Associationanalysis / Classifikation

SunnyLotusFlowe · June 2010

Hi all,

this is my first post in this forum!

I have a general Question: I want to know, which Operators are tipically used in the Associationanalysis and which Operators are tipically used in the Classifikation (for preprocessing and so on). It would be nice to hear some experiences about that.

greetings

Lotus

IngoRM · June 2010

Hi,

welcome to RapidMiner and this forum.

Well, it is a bit hard to answer this in general since the operators, especially those for preprocessing, will mainly depend on the format of your data. For the actual modeling step, you will find the operators used for association rule mining in "Modeling" - "Association and Itemset Mining" and those for classification learning in "Modeling" - "Classification and Regression".

For preprocessing, things are harder to answer. For assocation rule mining, often the operator "Pivot" has to be used to transform transaction data into a basket data format. "Nominal to Binominal" is also a hot candidate. For classification learning, it mainly depends on your data format and the capabilities of the learning scheme. Sometimes you have to discretize your data or transform it into a numerical format before a specific learner can be applied. You can find many examples in the Sample Repository of RapidMiner 5 and also with our new Community Extension on myExperiment.org.

Actually: most of the fun in data mining derives from the fact to define the best preprocessing process for your current task. RapidMiner (and its extensions) now provide about 800 different operators for this - we would not do that if they would not be necessary from time to time

In this sense: have fun. Cheers,
Ingo

SunnyLotusFlowe · June 2010

oh this sounds usefull to me. thx alot for the information

greetings

Lotus

SunnyLotusFlowe · June 2010

Hi there,
I have looked a little bit further on the Pivot:

u mean the (De-) Pivoting does the following tranformation (i just want to be sure that i have understand what u had mean):
Articles are 'A', 'B' and 'C'

ID | Transaktion -> ID | A | B | C
1 | A,C 1 | 1 | 0 | 1
2 | B 2 | 0 | 1 | 0

Is that correct?

greetings

SunnyLotusFlower

IngoRM · June 2010

Hi,

Is that correct?

Almost

If you really have such a comma separet format, you would not need a Pivot-Operator but could simply use the operator "Split".

A real Pivoting would transform the data set:

ID | Transaktion
1 | A
1 | C
2 | B

to the data set

ID | A | B | C
1 | 1 | 0 | 1
2 | 0 | 1 | 0

As you can see, the number of examples have also changes and there might exist more than one example per ID before the transformation.

Cheers,
Ingo

SunnyLotusFlowe · June 2010

aha ok i understand the underlying idea of pivoting.

furthermore in the literature i have read of mining quantitative Assoc Rules. i have seen that RapidMiner support a lot of Discretization -techniques. But i dont get if all the 3 techniques are supported.

i mean the static discretization / dynamic discretization and the distance based

Discretize by Binning and Discretize by Size sould be the static approaches . i think at least

greetings

SunnyLotusFlower

SunnyLotusFlowe · June 2010

hello there,

i found this Operator Discretize by Entropy. I suppose that this has no use in Association Rule Matters. What do i need minimized-entropy intervall in Mining assoc Rules ?

greetings Lotus

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Tipical Workflow for Associationanalysis / Classifikation

Answers