Options

Generating Label Data for Financial Time Series

devonkyledevonkyle Member Posts: 4 Contributor I
edited November 2018 in Help
I'm working on creating a number of  financial time series data with directional BUY HOLD SELL labels for Supervised learning. Can anyone suggest a method of generating the signals/labels for my training data other than to visually chart the data and find the tops and bottoms of the charts and manually code them in. My streams are not trying to predict say - the closing price of tomorrow - but to generate BUY HOLD SELL signals only. There must be a way to generate these training data signals automatically somehow... Any suggestions ?

Answers

  • Options
    MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Generating good and realistic training data will always require a lot of work. The first option you stated is clear: label the data manually. Another option may be to define well the scenarios in which you want to sell/buy/hold and generate some datasets for each scenario. This will need a lot of fiddling with random variables, tweaking etc.

    Best regards,
    Marius
  • Options
    devonkyledevonkyle Member Posts: 4 Contributor I
    Thanks Marius
    I thought someone out there might know of some tools avail for this type of work. I actually own software called Trading Solutions (I would never recommend this software - very expensive and very buggy!!) and one of the few clever things it provides is the ability to provide a time series dataset and thenTrading Solutions will generate what they call an optimized signal including all the BUY HOLD SELL recommendations used for modeling / comparing out of sample data signals with It does a very nice jobs doing with - automatically finding all the peaks and valleys within the data. . Unfortunately, when exporting the Optimized Signal to Excel/CVS, it does not include/maintain these signals in the data making the software virtually useless to me..
    I guess I just need graph the data first, find the corresponding date time for the peaks/valleys and manually generate my labels for training. I very slow and tedious process when dealing with large datasets (i.e. 5 - 10 years)
    D
Sign In or Register to comment.