Modelling for economic Viablility

RustyboltcutterRustyboltcutter Member Posts: 2 Newbie
Hey Guys,
Its an update to the post I made before
So I have been experimenting and want to make a model for finding the economic viability for a rental property.
I cleaned my data and arrived at the below final data pool
Now what i want to do is find the economic viability of the property after taking into consideration the reviews, overall satisfaction and date listed for the property.
Now I am not sure how to make a dependent or "Label" variable that maps the economic viability of the property to use for modelling.
I considered making an IF-THEN function :

if(listed_year == "Old" && overall_satisfaction<3 && reviews<5,"Not Viable",if(listed_year == "Old" && overall_satisfaction>=3 && reviews>=5,"Viable",if(listed_year == "New" && overall_satisfaction<3 && reviews<3,"Not Viable",if(listed_year == "New" && overall_satisfaction>=3 && reviews>=3,"Viable","Not Viable"))))

In which basically what I tried to do as an example was that if a property is Old, has a satisfaction rating below 3 and reviews less than 5 made it not viable.

Sadly, what it does is churns out a model with 99% accuracy and 1 Kappa and 0.99 AUC which is sorta not possible and indicates the model is overfitting to the data pool.

I would love some inputs on how to tackle this. I am open to hopping on zoom or teams calls as well to discuss and learn this from some of the masters here.



  • Options
    lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn
    Interesting and challenging project, Rusty ! 

    First, the results you get are totally logic (99% accuracy) : 
    You have defined an explicit expression for your label according to your explanatory attributes ( reviews, bedrooms, metro etc.).
    In practice during the modelling phase, the machine learning algorithms just find this expression.
    In other words you have yourself define your model ! no need of machine learning algorithms.

    Now I will describe how I see the things : 

    The first thing is to collect /measure your label ( the "economic viability") independently of your others attributes and I insist on the word " independently".
    How can you define the "economic viability" ?
    I would say that it is a kind of ROI of the initial investment (the price of the property) for example it can be the averaged ratio between the rents collected during a given period and the initial investment. 

    I will give you two examples : 
    case 1  : 
    The  owner 1 bought the property_1  1,000,000 € in 2010
    Then he collected  the following rents :  
     -  20,000€ in 2011
     - 18,000€ in 2012
     - 25,000€ in 2013
     - 10,000€ in 2014 (for example due to the  absence of a tenant for several months in this particular year)
     - 23,000€ in 2015

    Thus the annual averaged collected rent is (20000 + 18000 + 25000 + 10000 + 23000)/5 = 19200€/year

    thus the ROI or ratio = 19200 / 1000000 = 1,92 %

    So your label value for this particular property will be 1,92 %

     case 2  : 
    The  owner 2 bought the property_2  250,000 € in 2010
    Then he collected  the following rents :  
     -  8,000€ in 2011
     - 7,000€ in 2012
     - 6,500€ in 2013
     - 8500€ in 2014 
     - 8000€ in 2015

    Thus the annual averaged collected rent is (8000 + 7000 + 6500 + 8000 + 8000)/5 = 7600 € / year

    thus the ROI or ratio = 7600 / 250000 = 3,04 %

    So your label value for this particular property will be 3.04 %

    This brings us to the second point : I think that we can not reduce an investment to only a binary problem  ("viable" OR "not viable").
    I think your problem is a regression problem  : your label is a continuous variable.(1,92 % / 3,04 %, ....etc)
     your label can take any value.

    Once you've collected and calculated your label for each of the properties, then you can launch the regression machine learning algorithms which will try to establish the relationship between your label and your other attributes

    Nota : You can turn on this regression problem into a classification problem by defining different ranges of "economic viability"
    for example : 
     - between 0.1% and 2% : economic viability = "very low"
     - between 2% and 4.5 % : economic viability = "low"
     - between 4.5% and 7% : economic viability = "moderate"
     - between 7% and 9.5 % : economic viability = "high"
     - highest than 9.5%  : economic viability = "very high"
    In this case you have a 5-class classification problem and you will in this case use machine learning classification algorithm for modelling.

    I hope I'm clear about all this notions but feel free to ask additional questions ! 




Sign In or Register to comment.