The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Select column with non-zero value
Hi everybody!
I've calculated TF-IDF with "Process document from data" and I found a matrix that have a word in every column and a body for every row and every cell of the matrix cointains TF-IDF's value. Now I filter by cluster, creates with k.means, and I want to see only columns with values non-zero. I firstly thought to do a sum of every column's value (with Aggregate) and take only those with sum greater than zero, but I also think that it's a mistake do the sum of TF-IDF and all the analysis would be distorted, so can you please tell me a solution to filter only columns with at least one value different from zero?
Thanks you so much!
I've calculated TF-IDF with "Process document from data" and I found a matrix that have a word in every column and a body for every row and every cell of the matrix cointains TF-IDF's value. Now I filter by cluster, creates with k.means, and I want to see only columns with values non-zero. I firstly thought to do a sum of every column's value (with Aggregate) and take only those with sum greater than zero, but I also think that it's a mistake do the sum of TF-IDF and all the analysis would be distorted, so can you please tell me a solution to filter only columns with at least one value different from zero?
Thanks you so much!
0
Answers
If you don't want to use that approach, you would need to loop over each cluster, do an Aggregation using the Max function and remove those attributes that have a max value of zero.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
thank you for your answer! I found the cluster centroid output, as you suggested, but i don't really understand the value of every cell, can you explain me, please? I attach the screen of my results.
I noticed you have a lot of clusters. This can sometimes make interpretation difficult, you should probably also think about whether you have a need for this many distinct clusters. Or you could try another approach beyond k-means such as LDA analysis.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Dortmund, Germany