Options

# Select column with non-zero value

Hi everybody!

I've calculated TF-IDF with "Process document from data" and I found a matrix that have a word in every column and a body for every row and every cell of the matrix cointains TF-IDF's value. Now I filter by cluster, creates with k.means, and I want to see only columns with values non-zero. I firstly thought to do a sum of every column's value (with Aggregate) and take only those with sum greater than zero, but I also think that it's a mistake do the sum of TF-IDF and all the analysis would be distorted, so can you please tell me a solution to filter only columns with at least one value different from zero?

Thanks you so much!

I've calculated TF-IDF with "Process document from data" and I found a matrix that have a word in every column and a body for every row and every cell of the matrix cointains TF-IDF's value. Now I filter by cluster, creates with k.means, and I want to see only columns with values non-zero. I firstly thought to do a sum of every column's value (with Aggregate) and take only those with sum greater than zero, but I also think that it's a mistake do the sum of TF-IDF and all the analysis would be distorted, so can you please tell me a solution to filter only columns with at least one value different from zero?

Thanks you so much!

0

## Answers

1,635UnicornIf you don't want to use that approach, you would need to loop over each cluster, do an Aggregation using the Max function and remove those attributes that have a max value of zero.

Lindon Ventures

Data Science Consulting from Certified RapidMiner Experts

9Learner Ithank you for your answer! I found the cluster centroid output, as you suggested, but i don't really understand the value of every cell, can you explain me, please? I attach the screen of my results.

1,635UnicornI noticed you have a lot of clusters. This can sometimes make interpretation difficult, you should probably also think about whether you have a need for this many distinct clusters. Or you could try another approach beyond k-means such as LDA analysis.

Lindon Ventures

Data Science Consulting from Certified RapidMiner Experts

3,524RM Data ScientistDortmund, Germany