Its an update to the post I made before
So I have been experimenting and want to make a model for finding the economic viability for a rental property.
I cleaned my data and arrived at the below final data pool
Now what i want to do is find the economic viability of the property after taking into consideration the reviews, overall satisfaction and date listed for the property.
Now I am not sure how to make a dependent or "Label" variable that maps the economic viability of the property to use for modelling.
I considered making an IF-THEN function :
if(listed_year == "Old" && overall_satisfaction<3 && reviews<5,"Not Viable",if(listed_year == "Old" && overall_satisfaction>=3 && reviews>=5,"Viable",if(listed_year == "New" && overall_satisfaction<3 && reviews<3,"Not Viable",if(listed_year == "New" && overall_satisfaction>=3 && reviews>=3,"Viable","Not Viable"))))
In which basically what I tried to do as an example was that if a property is Old, has a satisfaction rating below 3 and reviews less than 5 made it not viable.
Sadly, what it does is churns out a model with 99% accuracy and 1 Kappa and 0.99 AUC which is sorta not possible and indicates the model is overfitting to the data pool.
I would love some inputs on how to tackle this. I am open to hopping on zoom or teams calls as well to discuss and learn this from some of the masters here.