Label new data given historical data distributions

asav_yuasav_yu Member Posts: 15 Maven
I have a list of products that are to be sold this year and I run a model to price these products. I know everything about them including location they will be sold at and condition (1-10).

I want to do a forecast of average sale price for next year. I have a list of products for next year but I do not know the location or the condition of these products. How can I add location and condition info assuming same distribution for next year as this year. Product condition follows normal distribution and for location I got info like 20% are sold at this location etc.

I got ideas on how to do it in excel but wondering if there is a more scientific way to do it in RM? Appreciate your help!!


  • Options
    yyhuangyyhuang Administrator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data Scientist
    edited September 2019
    Hi @asav_yu,

    suppose you are talking about predicting sale price of vintage/second-hand products, you can use similarity analysis to create the labels of new data.

    Your feature set includes sold locations, condition, maker, descriptions, etc. By similarity measurements, you could find the substitute products with small distance to the target product. For one target product in next year, you may get 3-5 nearby "neighbors" (by similarity) sold in this year. Then take a weighted average as the estimation.


Sign In or Register to comment.