Prediction for next orders, any ideas?
Dear Community!
I have a .csv file with 100.000 rows and 439 columns. This spreadsheet represents the customers' habits for using a specific service. For each rows there is an ID for every customer and every transaction date with the following format: 1 for Monday, 2 for Tuesday... etc. I need to predict the next date of transaction for every customer, using these past records.
Here's an example for the format of the database:
customer_id transaction1 transaction2 ... transaction438
1 1 2 3 4 5 6 7 ... 745 746 747
2 2 7 16 20 21 23 28 ... 412
3 1 2 3 4 5 6 7 ... 285 322
4 5 7 8 12 14 19 21 ... 924 925 926
Any ideas what model should I use for this prediction for the best accuracy?
NOTE: The database have lots of missing values depends on the frequency of ordering.
Answers
This looks like some sort of sales projection analysis. I would look at the process I shared here: http://community.rapidminer.com/t5/RapidMiner-Studio/How-to-get-forecast-values-of-future-from-time-series-data/m-p/37698
You would need to do a bit of missing value replacements using the Replace Missing Values operator and need to install the Series extension from our marketplace. Is there seasonality involved?
It is a homework at the university, we are learning the basics of RapidMiner. We needed to do similar examples earlier, but there was a label column for the learning database, but this time I have no clue, how I could predict the possible outcome without that special column. I thinked about some sort of pattern analysis, or converting the database to a range from 1 to 7 to simplify the problem, but I couldn't move along to a real solution.
I think seasonality doesn't matter, because it's just an example.
If it's sales, you could sum up the values and do a Total Sales per month or week? You can use the dates as your ID and then the Total Sales as you Label.
Because the database contains the days of transaction in a code format, not the quantity, making totals is not possible or making sense.
AH! Did you try the Generalized Sequential Patterns operator?