The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Mining CAR'S with FP-Growth (Urgent)"
choose_username
Member Posts: 33 Contributor II
Hi there,
i wanted to know if it is possible to mine rules with the right hand side only is a specific attribute.
i have a Data set with people and 1 Attribute shows if a special person earns more or less that 50k dollars.
1 Class is <=50k and 1 is >=50k
is it possible to mine assoc rules with FP-Growth like
Occupation = manager /\ Marital-Status= married => >=50k
the this attribute is called 'class'
for example with regular expressions?
It is really urgent! . if it is possible what do i have to write in the reg expression field.
thanks in advance
User
i wanted to know if it is possible to mine rules with the right hand side only is a specific attribute.
i have a Data set with people and 1 Attribute shows if a special person earns more or less that 50k dollars.
1 Class is <=50k and 1 is >=50k
is it possible to mine assoc rules with FP-Growth like
Occupation = manager /\ Marital-Status= married => >=50k
the this attribute is called 'class'
for example with regular expressions?
It is really urgent! . if it is possible what do i have to write in the reg expression field.
thanks in advance
User
Tagged:
0
Answers
Table is for example:
Occupation | Marital-Status | Relationship | Earning
manager married husband >=50k
Cleaner divorced Not-in-Family <=50k
Cleaner separated Not-in-Family >=50k
i need association rules like the following two
Occupation = manager /\ Marital-Status= married /\ Relationship=husband => >=50k
Occupation = Cleaner /\ Marital-Status= divorced /\ Relationship = Not-in-Family => >=50k
The right side have to be only >=50 or <=50
_____________________
but i get with FP-Growth
Occupation = manager /\ Marital-Status= married => Relationship=husband
Occupation = Cleaner /\ Relationship = Not-in-Family => Marital-Status= divorced
Relationship = Not-in-Family => Marital-Status= divorced, Occupation = Cleaner
i get different attribute on the right side. but i need only the earning attribute being on the right side
greetings
User
There are probably other ways to do this, but here is a way of doing it with a Groovy script.. The script converts rules to examples if they contain the 'target' string.
and in the Target-string i shall place the desired name that shall be on the right side?
Am i correct ?
greetings
User
Script1.groovy:12:expecting anything but "\n", got it anyway @ line 12, column 16: 1 error.
but the groovy scripts just take the assoc rules and filter em. if there is no rules extracted which has my desired right side, then no rule is processed by groovy script.
Did i get something wrong?
my problem is that the fp-growth shall extract those rules, and he doesnt.
You wish to do classification, using a rule learner.
@ Haddock
Nice script, yes it runs without errors, and as an added bonus its fast.
Result
JRIP rules:
===========
(weeks worked in year >= 46) and (dividends from stocks >= 1) and (sex = Male) and (capital gains >= 7688) => label=50000+ (101.0/10.0)
(weeks worked in year >= 49) and (dividends from stocks >= 1) and (sex = Male) and (age >= 35) and (major occupation code = Executive admin and managerial) and (education = Bachelors degree(BA AB BS)) => label=50000+ (62.0/13.0)
(weeks worked in year >= 48) and (dividends from stocks >= 1) and (sex = Male) and (age >= 37) and (major occupation code = Professional specialty) and (instance weight = 1504.5) => label=50000+ (55.0/9.0)
(weeks worked in year >= 49) and (major occupation code = Executive admin and managerial) and (sex = Male) and (education = Masters degree(MA MS MEng MEd MSW MBA)) => label=50000+ (78.0/18.0)
(weeks worked in year >= 46) and (dividends from stocks >= 1) and (sex = Male) and (capital losses >= 1887) => label=50000+ (40.0/11.0)
(weeks worked in year >= 50) and (dividends from stocks >= 1) and (sex = Male) and (num persons worked for employer >= 6) and (wage per hour = 0) and (own business or self employed = 0) and (instance weight >= 1011.69) and (education = Bachelors degree(BA AB BS)) => label=50000+ (38.0/6.0)
(weeks worked in year >= 51) and (major occupation code = Professional specialty) and (sex = Male) and (age >= 32) and (education = Prof school degree (MD DDS DVM LLB JD)) => label=50000+ (48.0/11.0)
(weeks worked in year >= 46) and (capital gains >= 7298) and (capital gains >= 9562) => label=50000+ (77.0/14.0)
(weeks worked in year >= 46) and (major occupation code = Professional specialty) and (education = Doctorate degree(PhD EdD)) => label=50000+ (71.0/29.0)
(weeks worked in year >= 48) and (sex = Male) and (age >= 33) and (education = Bachelors degree(BA AB BS)) and (detailed household and family stat = Spouse of householder) => label=50000+ (37.0/17.0)
(weeks worked in year >= 51) and (age >= 35) and (sex = Male) and (major occupation code = Executive admin and managerial) and (major industry code = Manufacturing-nondurable goods) and (age >= 39) => label=50000+ (20.0/2.0)
(weeks worked in year >= 49) and (dividends from stocks >= 1) and (num persons worked for employer >= 6) and (age >= 35) and (education = Masters degree(MA MS MEng MEd MSW MBA)) and (full or part time employment stat = Children or Armed Forces) => label=50000+ (23.0/7.0)
(weeks worked in year >= 39) and (age >= 35) and (sex = Male) and (num persons worked for employer >= 5) and (education = Bachelors degree(BA AB BS)) and (detailed occupation recode = 2) => label=50000+ (18.0/5.0)
(weeks worked in year >= 46) and (age >= 35) and (sex = Male) and (major occupation code = Professional specialty) and (detailed occupation recode = 4) and (marital stat = Married-civilian spouse present) => label=50000+ (40.0/13.0)
=> label=- 50000 (19292.0/675.0)
Number of Rules : 15
but the problem is that my desired class will not appear in that Conclusion-overview, because it is very rare in the Dataset.
the the name of the attribute i wish to conclude on is 'A_B = High'
Can i use the 'must contain' - field in the FP-Growth Operator to force the workflow containing that attribute?
____________________________________________
These are the itemsets i get from FP-Growth:
Size| Support | Item 1 | Item 2
1 0.248 A_B = High
2 0.953 A_B = High Capital_loss
Is the nullpointer coming from the Itemset that only got 1 item containing? Does the createAssocRules-Operator
have problems with this?
Is it possible to filter the first row out ?
greetings
User
I think the relation can be identified using decision tree where we select label attribute is 'Earning' in the first post.If there is any relation it will reflect in the tree as Haddock said.
By
Ratheesan
Finding this relation is not association rule learning, but classification.