RapidMiner 9.7 is Now Available
Lots of amazing new improvements including true version control! Learn more about what's new here.
Set of unique strings - ways to organize and structure, group related elements?
Hi. I need to automate a task.
I have a list of strings (where each line is a keyword, a search querie) that goes like this
cord to connect laptop to tv
how do i connect my laptop to my tv
cable to connect laptop to tv
how to connect laptop to smart tv
connect laptop to tv hdmi windows 10
Each of these strings is unique, as in none of them is an exact match to any other but most of them can be grouped by topic and most of the topics can be further split into subtopics and so on. That's what I want to do. And I want as many different ways of grouping as possible to see all the possible ways that these elements relate to each other. I would like to extract any information that would help to organize and structure this data set.
I already know how to calculate word frequency for my lists in RM. That's a start. I can use the most frequent relevant tokens as topic candidats for manual grouping. But I'm not sure where to go from there if I want to do it automatically. The problem is that all the examples of clustering that I find online are dealing with documents and not with lists of strings and I don't think that any of those techiques can be used in my case.