Disappearing attributes

NoelNoel Member Posts: 82 Maven
Hi All-

I have a process (attached) where, prior to the data stream getting piped into an Optimize Selection (Evolutionary) operator, there are 23 attributes including an index called "anch_dt" (this is a time series analysis). The first breakpoint I set in the process is at this point. Once the data gets piped into the Optimize Selection (Evolutionary) operator, however, only 14 attributes are present and the index "anch_dt" is one that is missing. I set a second breakpoint at this point in the process.

Where did the rest of the data go?

Any help would be greatly appreciated. I've been banging my head against this obstacle and I suspect mental blindness is preventing me from seeing the obvious error.

Best, Noel

Best Answers

Answers

  • hughesfleming68hughesfleming68 Member Posts: 323 Unicorn
    edited April 2019
    Hi Noel, I am assuming that you are trying to predict "anch_dt". If not and you feel it is essential to your process then you need to exclude it from the Optimize Selection. The Optimize selection from your description is working the way it should. Let me look at your process. This may be as simple as setting a label.
  • David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research
    Hi,

    the thing is, that the goal of the "Optimize Selection (Evolutionary)" operator is to remove attributes. So when your running the process, it directly starts to evaluate the performance of you model on that subset and compares it to others (that's the evolutionary part).

    The problem is, that your windowing relies on the complete subset, which then might break if either the indices or label attribute are missing. From what I have seen from your process and tried out with some sample date, it should be sufficient to assign special roles (label and ID) to the anch_dt and ccc_bonds_stw_differentiated attributes.

    I hope that helps,
    David



  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited April 2019
    @David_A quick question. Is it normal that attributes were filtered out before the breakpoint? I see that the breakpoint is set to before, is this breakpoint related to the internal process in optimize selection and not before feature selection?

    Thanks
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • hughesfleming68hughesfleming68 Member Posts: 323 Unicorn
    edited April 2019
    Hi Noel, I can't find your horizon attribute ccc_bonds_stw_differentiated. If you set your horizon attribute to ccc_bonds_stw then your process runs with your missing attributes. Set the windowing operator correctly and that should help.
  • NoelNoel Member Posts: 82 Maven
    Hi Alex- As always, thank you for your help. I made the horizon attribute change you suggested, thanks. Unfortunately, I'm still getting the "Attribute not found" error (because anch_dt is missing). Error screen cap below.

    As you said, I'm trying to predict "ccc_bonds_stw" and since the data are all time series, I'm windowing and using "anch_dt" as the index. Apologies if I'm missing something or misunderstood, but I'm still stuck.

    Off topic, I took your suggestion and bot Marcos Lopez de Prado's book. I'm still at the beginning, but it looks like it'll be helpful.

    -Noel
    --
  • hughesfleming68hughesfleming68 Member Posts: 323 Unicorn
    Hi Noel, I have your process working but I am out of the office for a couple of hours. I will upload it when I am back in.
  • NoelNoel Member Posts: 82 Maven
    Super, thanks. Beautiful day here in CT. Hope you're getting a taste of it wherever you are.
  • NoelNoel Member Posts: 82 Maven
    Fantastic! Thanks, Alex, I'm good to go.

    If you wouldn't mind one more question, though... I definitely should've tried moving the windowing operator outside the optimize selection, but why didn't the original process work? Based on the settings, was 14 the max number of attributes for the Optimize Selection operator? Also, do indices not find their way into the operator or was it just a coincidence in this case?

    Have a great night and thanks again.
    -Noel
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited April 2019
    The algorithm is filtering out that attribute. The feature selection algorithm is treating your indices as a regular attribute as it is not set to any special role. 14 is not a maximum number, based on your data 14 features are useful. The number of features selected by optimize selection depends on their relevance to prediction, if they are not useful they will be removed.

    Hope this clarifies :smile:
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

  • NoelNoel Member Posts: 82 Maven
    @varunm1- Thanks.  Not to beat a dead horse (that’s a terrible expression), the reduction in attributes seemed to happen immediately — upstream of the cross validation. So to me, a novice, it appears that the optimize selection operator hasn’t even started its work. If this is the case (and it very well might not be), what caused the attribute reduction?
  • NoelNoel Member Posts: 82 Maven
    @varunm1- I appreciate your help.
  • varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited April 2019
    @Noel I tested your process and I requested @David_A suggestion in my previous comment to know how the breakpoint works for this operator. From my preliminary understanding once it passes your filter examples operator it getting a reduction by optimize selection.
    Regards,
    Varun
    https://www.varunmandalapu.com/

    Be Safe. Follow precautions and Maintain Social Distancing

Sign In or Register to comment.