Decision tree is a classification modelling or prediction modelling?

koknayayakoknayaya Member Posts: 20 Contributor I
edited June 2020 in Help
Hi, i really hope someone could explain this to me.

Currently I'm doing my project to classify the crime according to Premises, Place, and Time.

I'm using normal decision tree and W-J48 pruned tree operator. The accuracy is 70%. 

However, I'm confused with my project title. Am I doing classification technique, using decision tree prediction?

Is decision tree a classification modelling or prediction modelling?

is this correct for my project? --->
Technique: classification technique
Modelling: prediction modelling

Best Answers

Answers

  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hi @koknayaya

    From what I understand, you have a dataset with output labels related to crime that are based on different attributes like place, time etc. So, first thing here is, this is a supervised learning and classification problem. Are you using cross validation on your dataset if not you can try a 5 fold cross validation. Also, you can look at performance metrics like Kappa, RMSE & AUC with accuracy as these will give you more understanding compared to accuracy. 

    The main difference between classification and prediction is that, if you already know the outputs of all the samples then you are just trying to classifying them. If you have a new sample which doesn't have any output label and you are trying to predict that then its a prediction.

    Thanks 
    Varun
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • koknayayakoknayaya Member Posts: 20 Contributor I
    Hi @varunm1 ! hmm I'm a bit confused now..haha.

    However, this is the results of my project. I'm using the column OFNS_DESC (Crime type) for my label.
    .


  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited January 2019
    Hi @koknayaya

    Yes, this shows that you are trying to classify the data set as you already have labels(outputs) for all the samples and you are just trying to look at the classification performance (by comparing original label with predicted label). Its a minor difference between classification and prediction. If its pure prediction you just use the trained decision tree on a new sample with all the attributes(variables or columns) but this time you dont have any known label(output) to compare with like you wont have the first column in screenshot but you will have second column.

    You can ask if you have more doubts.

    Thanks,
    Varun
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • koknayayakoknayaya Member Posts: 20 Contributor I
    thank your for your answer @varunm1 !

    my main research objective is classification of crime. but in the decision tree, there are the "prediction" column.

    thats why im getting confused my project is fall in which category. classification or prediction.

    or it is classification of crime using predictive modelling?

  • koknayayakoknayaya Member Posts: 20 Contributor I
    this is my process 
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    Hi @koknayaya

    Here is the thing, You are trying to classify don't get confused because of prediction column. 

    You can observe from you process that you are splitting the data into training and testing sets. Here the training set will included Output labels( In your case OFNS_DESC). This training set when fed to Decision tree algorithm tries to generate a model that best fits for the training data. Now you are trying to see how this model is performing on test data so that you are feeding a test data set. So when you feed a test dataset to "Apply Model operator" it will remove output label "OFNS_DESC" and try to predict the outputs based on Trained decision tree model. That is the reason you have predicted labels column. Now for performance it will check the original labels of testing dataset and the predicted values to see how your model is performing on  the samples (Test Data) it never saw during training. 

    So I suggest you can go with classification rather than predictive modeling as predictive modeling has classification as part of it. For better understanding, please go through below link. Why I suggested you go with classifcation is that some times people gets confused that predictive modeling needs probabilistic bayesian methods an predict something unknown in future example weather forecasting and online advertisements etc.

    https://www.techleer.com/articles/120-decision-tree-algorithm-for-a-predictive-model/

    Thanks,
    Varun
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • koknayayakoknayaya Member Posts: 20 Contributor I
    Hi @Telcontar120

    yes, I want to help the police in decision making for predictive policing  :#
  • koknayayakoknayaya Member Posts: 20 Contributor I
    Alright, thank you so much @varunm1 :) 
Sign In or Register to comment.