The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Connecting Process Documents WordList with FP-Growth FrequentItemSets

imkeimke Member Posts: 12 Learner III
edited October 2019 in Help

Hello,

I was doing text mining and have a result from Process Documents with the term frequencys and a result from FP-Growth with the itemset frequencys. So I have a word, let's say cookie, which occures in my document 150 times. I get this Information from the WordList (Process Documents). For this example I have this frequently-occurring itemsets:

 

Size   Support   Item 1     Item 2

1         0.150       cookie  

2         0.051       cookie     milk

2         0.017       cookie     marshmallow

2         0.023       cookie     chocolate

2         0.012       cookie     crackers

2         0.014       cookie     strawberries

2         0.011       cookie     raspberries

 

And now I want to combine the two tables to know the percentage of how often cookies and milk occure together, when cookie is written in the text. And which is the absolute frequency from the occurence of cookie and milk. At the moment the Support only shows it in relation to the whole dataset...

Thanks for helping

Best Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Solution Accepted

    Hi @imke,

    isnt this how confidence is defined in association rules? If yes, just use Create Association Rules on the FP-Growth result.

     

    For details on the AR-Metrics see: https://community.rapidminer.com/t5/RapidMiner-Studio-Knowledge-Base/How-To-Interpret-the-Results-of-Create-Association-Rules/ta-p/32107

     

    BR,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Solution Accepted

    Hi,

     

    it should be: Number of Documents which contains token A / All Documents in the corpus.

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • imkeimke Member Posts: 12 Learner III

    Hi Martin,

    actually I have written that explination a few times but I missunderstood that. Thank's for your quick answer now it is clear. But I have an other question now:

    The Help Explination says:

    "The support supp(X) of an itemset X is defined as the proportion of transactions in the data set which contain the itemset."

    Does that mean: 'Document Occurences (WordList)' devided by the 'number of textes' ?

    And which is the difference between "Total Occurences" and "Document Occurences" in the WordList from Process Documents?

    Thank you again!

     

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist

    Hi,

     

    it should be #documentsHavingTokenA/AllDocuments

     

    Best,

    Martin

    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • imkeimke Member Posts: 12 Learner III

    Hi @mschmitz

    what do you ean with: #documentsHavingTokenA/AllDocuments?

    Is there a hyperlink missing?

    Thank's

    Imke

  • imkeimke Member Posts: 12 Learner III

    Hi @mschmitz,

    thank you!

    I have some more questions to the FP-Growth and the Confidence:

    When I look at my results from FP-Growth I have more itemsets then with the Create Association Rules operator, although my min Confidence is at 0.001. Why is it like that?

    And why does the Create Association Rules changes the order of the itemsets?

    I had milk -> cookie with FP-Growth and now I have cookie -> milk. Does the Create Association Rules operator choose the order with the higher confidence?

    Thank’s a lot!

    Imke

Sign In or Register to comment.