"Terminology on Data Sampling"

shaihuludshaihulud Member Posts: 20 Maven
edited May 2019 in Help

simple question:

I have a scenario where i use cluster analysis to sample a set of data into different groups of homogenous entities. Then i extract one entity of each group as a representative. What is the terminology on that? I would call it something like Data Sampling, but googling data sampling wasn much successfull... For example "sampling" (wikipedia) seems to be concentrated on investigation on populations and such.

However, searching in this forum i think that sampling might nevertheless what i am looking for.I would appreciate any help on the terminology and also if somebody could advise some literatur on that topic.




  • el_chiefel_chief Member Posts: 63 Maven
  • shaihuludshaihulud Member Posts: 20 Maven
    ive already read about clustersampling,but it seemed to be just a subclassof what i am looking for.first of all because its set on population data differentiating between geografical and such criteria,while my focus is on any kind of data sets including objects with attributes.furthermore i dont think that clusteranalysis is the only technique to group data. what woujd be the supertopic of cluster sampling?
  • el_chiefel_chief Member Posts: 63 Maven
    The word you are looking for may be Stratified or Quota.

    Quota is a subset of Stratified, but it makes sure that the sample proportions are similar to the population proportions of groups.

  • shaihuludshaihulud Member Posts: 20 Maven
    oki this sounds much better, but why is it always about population???
    Population is not the only kind of data that needs to be analysed.. I am a littled bit puzzled about that..
  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Another hint might be "prototypes" for each group. At least this is something which can be used in a quite fashion and everybody got the idea and it is frequently used by many clustering people. Another term describing this might be "relevance vector" coming from the Relevance Vector Machine which concentrates on the prototypical points for each class instead of the points describing the borders like it is done by Support Vector Machines.

Sign In or Register to comment.