Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Extract decision tree from Bray-curtis heatmap dendrogram"
I am performing microbiome study, and have already generated (using another program) a heatmap with dendrograms for clustering samples based on bacterial genus using Bray-Curtis dissimilarity, but I'd like to get the decision tree. I know RapidMiner has a decision tree model, but it must use k-means which is different from Bray-Curtis, and I want to preserve the Bray-Curtis clustering. I wonder if it's possible to load my dendrogram into RapidMiner and have it extract the Bray-Curtis decision tree? Thank you very much.
Tagged:
0
Answers
Hi @jamie_slk,
If you are doing clustering analysis with microbiome data, can you please share some test data?
First thing, the 'tree' from heatmap may NOT be a 'decision tree'. It is a visulization of your Hierarchical cluster model. If you can get the clustering label out of another program. You can build predictive models (e.g. decision tree, or random forest, or SVM) to find the splits and decision rules that are used for clustering.
Regarding to the dissimilarity measure, do you want to use jaccard instead of Bray-Curtis? Jaccard index is computed as 2B/(1+B), where B is Bray–Curtis dissimilarity [ref]. Bray–Curtis and Jaccard indices are rank-order similar, but, Jaccard index is metric, and probably should be preferred instead of the default Bray-Curtis which is semimetric [ref]. RapidMiner core has an operator for Hierachical clustering (Agglomerative Clustering) with jaccard similarity on numerical data.
My process used peerj32 data from https://peerj.com/articles/32/#supplemental-information
You have to install R scripts extension, and operator toolbox extension from marketplace to run it.
The proces will call R for BC dissmilarities and clustering
Process code:
Cheers,
YY