The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Tipical Workflow for Associationanalysis / Classifikation
SunnyLotusFlowe
Member Posts: 37 Contributor II
in Help
Hi all,
this is my first post in this forum!
I have a general Question: I want to know, which Operators are tipically used in the Associationanalysis and which Operators are tipically used in the Classifikation (for preprocessing and so on). It would be nice to hear some experiences about that.
greetings
Lotus
this is my first post in this forum!
I have a general Question: I want to know, which Operators are tipically used in the Associationanalysis and which Operators are tipically used in the Classifikation (for preprocessing and so on). It would be nice to hear some experiences about that.
greetings
Lotus
0
Answers
welcome to RapidMiner and this forum.
Well, it is a bit hard to answer this in general since the operators, especially those for preprocessing, will mainly depend on the format of your data. For the actual modeling step, you will find the operators used for association rule mining in "Modeling" - "Association and Itemset Mining" and those for classification learning in "Modeling" - "Classification and Regression".
For preprocessing, things are harder to answer. For assocation rule mining, often the operator "Pivot" has to be used to transform transaction data into a basket data format. "Nominal to Binominal" is also a hot candidate. For classification learning, it mainly depends on your data format and the capabilities of the learning scheme. Sometimes you have to discretize your data or transform it into a numerical format before a specific learner can be applied. You can find many examples in the Sample Repository of RapidMiner 5 and also with our new Community Extension on myExperiment.org.
Actually: most of the fun in data mining derives from the fact to define the best preprocessing process for your current task. RapidMiner (and its extensions) now provide about 800 different operators for this - we would not do that if they would not be necessary from time to time
In this sense: have fun. Cheers,
Ingo
greetings
Lotus
I have looked a little bit further on the Pivot:
u mean the (De-) Pivoting does the following tranformation (i just want to be sure that i have understand what u had mean):
Articles are 'A', 'B' and 'C'
ID | Transaktion -> ID | A | B | C
1 | A,C 1 | 1 | 0 | 1
2 | B 2 | 0 | 1 | 0
Is that correct?
greetings
SunnyLotusFlower
If you really have such a comma separet format, you would not need a Pivot-Operator but could simply use the operator "Split".
A real Pivoting would transform the data set:
ID | Transaktion
1 | A
1 | C
2 | B
to the data set
ID | A | B | C
1 | 1 | 0 | 1
2 | 0 | 1 | 0
As you can see, the number of examples have also changes and there might exist more than one example per ID before the transformation.
Cheers,
Ingo
furthermore in the literature i have read of mining quantitative Assoc Rules. i have seen that RapidMiner support a lot of Discretization -techniques. But i dont get if all the 3 techniques are supported.
i mean the static discretization / dynamic discretization and the distance based
Discretize by Binning and Discretize by Size sould be the static approaches . i think at least
greetings
SunnyLotusFlower
i found this Operator Discretize by Entropy. I suppose that this has no use in Association Rule Matters. What do i need minimized-entropy intervall in Mining assoc Rules ?
greetings Lotus