Options

Beginner's question

kmkim319kmkim319 Member Posts: 5 Contributor I
edited November 2018 in Help

Hi, I'm learning data mining and how to use rapid miner these days. The first thing I want to ask you is that what are the six techniques for categorizing analysis? I know "decision-tree", "rule-induction", "k-nearest neighbors", "neural network", and "naive bayes". I know only five techniques and not even sure whether they are correct. The second question is how can I exploit the data from http://archive.ics.uci.edu/ml/machine-learning-databases/ to apply the six different techniques? I would feel really thankful if you gave me one example with six techniques by using a data from that website. I hope I can get your reply as soon as possible

Tagged:

Answers

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Hi there and welcome to the community!

     

    I am not sure exactly what reference you have for the idea that there are exactly 6 types of machine learning approaches to categorization problems.  There are many different ways of counting machine analytics techniques, so I don't think there are necessarily only 6 out there.  Here's a helpful list of 10 different algorithms, for example: https://www.analyticsvidhya.com/blog/2015/08/common-machine-learning-algorithms/

     

    In terms of an actual example process in RapidMiner, there are a host of guided tutorials on the website that you should check out that include example processes and data.  While they don't use the michaine learning databases you reference, they are very helpful for learning how to do things in RapidMiner (and don't miss the link at the top to download the data and the processes): https://rapidminer.com/training/videos/

     

    There is one in particular that you might want to watch regarding the classic "Titanic" survival dataset and how to use RapidMiner to approach it: https://rapidminer.com/resource/rapidminer-advanced-analytics-demonstration/

     

    I hope this is helpful!

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    kmkim319kmkim319 Member Posts: 5 Contributor I

    Thank you for your kind reply :-) I really appreciate it.

    I found out what are the six techniques for categorization:

    1. decision tree

    2. rule induction

    3. k-NN

    4. naive bayesian

    5. artificial neural network

    6. support vector machine

     

    And I also thank you for the link you've given me. However, the thing is that my homework from my university is to use a data from here http://archive.ics.uci.edu/ml and design how the each technique works by using RapidMiner. The real problem is that I know what kind of operator I have to use for each technique, but I just can't figure out how to export the data from that link to RapidMiner. If the data from that link is CSV or Excel format, then I can just use Read CSV or Read Excel operator. But I can't find that kinds of data formats from that link.

     

    If you see the attached photo, none of them relates to CSV or Excel format. If not, then how can I use the data set to use in RapidMiner?

    dm.png 44.9K
  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    Actually, the iris.data file in that library is simply a csv file with a different extension.  You can download it and then import it using the read csv operator normally.  I checked several of the other directories there and the pattern appears to be consistent---files that have a .data extension are really just csv files.

     

    Note that in that version of the iris dataset, there are no column names.  The other file in that same library location called "iris.names" is a text file that supplies the column names for that dataset.  So if you want to use RapidMiner with these exact files then you will need to first import the data from the .data file and then rename the attributes based on their given names.

     

    But if you want an even easier way to use the famous iris dataset, it is actually included in the sample data included with RapidMiner!  Take a look at the screenshot--you should be able to find it by going to the "Samples" folder in your repository panel and then you can load it by dragging it into your process pane.

     

    iris dataset.PNG

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • Options
    kmkim319kmkim319 Member Posts: 5 Contributor I

    Again, really appreciate your help.

    I still don't fully understand how to use the .data format. So I just brought all the things from .data and paste them on Excel. I separated all of them by using text spliting function of Excel, just like the attached photo called "excel". So the good thing is that I brought the data from that link and apply on the RapidMiner by using the Read Excel operator.

     

    The problem now I'm facing is the one I attached called "problem". What I'm trying to do is "notWorkingLikeThis" and my main goal is to do "myGoal".

    From the book("notWorkingLikeThis"), I'm supposed to select attribute, if I set the attribute filter type as all. However, if you look at "problem", the from attribute and to attribute are blank even though I set my attribute filter type as all. I don't know why...

     

    I'll use the Iris dataset with a different technique. Thank you for the advice.

     

    P.S: Right now, I'm working on Decision Tree with the dataset of German Credit data from that link I've shown you.

     

    excel.pngexcelmyGoal.jpgmyGoalnotWorkingLikeThis.jpgnotWorkingLikeThisproblem.pngproblem

  • Options
    Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn

    I'm glad you got the data imported, but all that cutting and pasting in Excel isn't necessary!  What I meant was that if you simply download the .data file, then add .csv to the end of the file name, you would be able to use the "Read csv" operator just fine to read in the data (since it is in fact a csv file, this of course won't work if the underlying data is not already a csv).  This would work for the the iris dataset or the German credit default dataset.

     

    What book are you using?  The "replace dictionary" operator is designed to systematically replace the string contents of an attribute or set of attributes with that contained in another file (the "dictionary").  Without seeing your actual process it is difficult to determine what you are trying to do or what the failure is.  To use this operator you need to define the attribute you are going to replace (to) and the one you are going to replace with (from). If you are following an exercise, there should be additional detail regarding the substitution that is being attempted.  The help text also includes much more information about how these operators are designed to be used.

     

      Good luck with the rest of your assignment!

     

     

     

     

     

     

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
Sign In or Register to comment.