The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Extract decision tree from Bray-curtis heatmap dendrogram"
I am performing microbiome study, and have already generated (using another program) a heatmap with dendrograms for clustering samples based on bacterial genus using Bray-Curtis dissimilarity, but I'd like to get the decision tree. I know RapidMiner has a decision tree model, but it must use k-means which is different from Bray-Curtis, and I want to preserve the Bray-Curtis clustering. I wonder if it's possible to load my dendrogram into RapidMiner and have it extract the Bray-Curtis decision tree? Thank you very much.
Tagged:
0
Answers
Hi @jamie_slk,
If you are doing clustering analysis with microbiome data, can you please share some test data?
First thing, the 'tree' from heatmap may NOT be a 'decision tree'. It is a visulization of your Hierarchical cluster model. If you can get the clustering label out of another program. You can build predictive models (e.g. decision tree, or random forest, or SVM) to find the splits and decision rules that are used for clustering.
Regarding to the dissimilarity measure, do you want to use jaccard instead of Bray-Curtis? Jaccard index is computed as 2B/(1+B), where B is Bray–Curtis dissimilarity [ref]. Bray–Curtis and Jaccard indices are rank-order similar, but, Jaccard index is metric, and probably should be preferred instead of the default Bray-Curtis which is semimetric [ref]. RapidMiner core has an operator for Hierachical clustering (Agglomerative Clustering) with jaccard similarity on numerical data.
My process used peerj32 data from https://peerj.com/articles/32/#supplemental-information
You have to install R scripts extension, and operator toolbox extension from marketplace to run it.
The proces will call R for BC dissmilarities and clustering
Process code:
Cheers,
YY