The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

influence of adding last index - time series data

ThiruThiru Member Posts: 100 Guru
edited August 2020 in Help
dear all, im working on a time series data. refer the enclosed process.

1. currently - Im generating features using 'process windows' and extract aggregate as sub process. The  extracted features  are given to train my machine learning model.
2.  Ive noticed -  by choosing yes for 'adding last index to windows attribute' in  the parameter of process windows operator, improves the performance of the model drastically.  i.e. from 67% accuracy to 97% accuracy. Ive noticed the difference is adding one extra column in the generated features column.  I' m not able to get this point of how this influence the performance of the model.  

 Is it correct to consider this performance of 97% & can anyone help to understand the role of adding last index. thanks.

regds
thiru

Tagged:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    be careful that you do not overtrain your model on dates. It can easily happen, that you learn something like "february was good", which is a rule you do not want to use.

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • ThiruThiru Member Posts: 100 Guru
    @mschmitz

    thanks for your reply.  I understand -  this additional column just " a - date - value repeating for every window size" in this case. (correct me if i'm wrong. )     I assume it over trains here. I do not know for sure. 
     Btw, what is the use of  having this parameter in 'process windows' operator  and can you throw some insight in how it determines or improves the performance of the time series model? . thanks.

    regds
    thiru
  • ThiruThiru Member Posts: 100 Guru
    @mschmitz

    The operator ' Process windows" or  'Windowing"   -  previously had  the parameter "add last index in windows attribute'.   now in the current version 9.8.001 - that option is not available.  

    For the same data and process - i was getting accuracy of 67%, 
     But now im  getting -  97.8%   ( Now i have no option of using  - 'add last index' ).  

    Im not sure  im going through the right thing.  could you please re confirm. thanks.

    thiru
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi @Thiru ,
    its hard to diagnose this without seeing the process and such. I think we changed the parameters of windowing a bit, since you always want to have the last index. Since it is usually a special attribtue its ignored anyway for learning. Maybe you change it later on to regular?

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
Sign In or Register to comment.