The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Join step in SPM for post-processing
Hi Folks,
I recently ran GSP to identify frequent sequential patterns in a dataset where I would like to run some post-processing on the results.
I have a strong suspicion that some of my resulting frequent sequences are nested within parent sequences (i.e., sequence <a, b, c> and sequence <b, c, d> are actually a part of the same sequence <a, b, c, d>.)
How can I either: 1) visually inspect the resulting patterns to identify which rows of my dataset were included as a part of each frequent sequence or 2) run a post-processing step that joins daughter sequences such that only the parent sequence remains as a part of the results.
In other words, how do I:
1) Print the results of the GSP analysis such that I can review the rows from my dataset that were identified as a part of each frequent sequence therefore allowing me to anectdotally identify and eliminate subsequences that are a part of parent sequences.
2) Run a post-processing step that joins the daughter sequences before running the same process that I had previously written to identify whether the sequences meet the appropriate criterion (support, etc.,). This is following advice by Perrera and Colleagues, (2008) <Apparently I am too novice to link the article (or even leave the URL). Therefore, the title of the article is: "Clustering and Sequential Pattern Mining of Online Collaborative Learning Data" in IEEE>. The intention of this join step is to eliminate subsequences from the results so that only parent sequences remain. To quote Perrera et al., "A sequence s1 joins with s2 if the subsequence obtained by dropping the first item of s1 is the same as the subsequence obtained by dropping the last item of s2. For example, <a; b; c; d> is a 4-sequence candidate of the 3-sequences <a; b; c> and <b; c; d>." (p 766).
Any advice/guidance is appreciated. If RapidMiner is not an appropriate tool for such an analysis, I am happy to receive direction with advise using R or another tool.
Furthermore, if this question has already been answered elsewhere, please accept my apologies by linking the appropriate page.
Thank you!
Joel
Joel
Tagged:
0