"Issues with processing data and clustering operators"
Hi,
I am making a project on Rapidminer for the Kaggle Walmart Customer Trip type prediction but I want to use Clustering Algorithm instead of Prediction to find the maximum and minimum sales based on days and the departments making the maximum and minimum sales. I am using the same data set used in the Kaggle competition.
I am new to data analytics and am trying to understand the operators to reach my result but I am unable to proceed ahead with the process. Please have a look at the process flow in the attachment and help me out by letting me know where am I going wrong.
Dataset: https://www.kaggle.com/c/walmart-recruiting-trip-type-classification/data
Regards,
Naman
Answers
@naman_sharma where is your clustering operator? Could you please share the process xml code or rmp file?
@dang Please see the attached rmp file. I tried using k-means for clustering but its taking too much time to complete the process. In 2 hours it completed just 14% of the process.
@naman_sharma the process you shared has no clustering operator attached. Please attach the dataset you used, I don't want to answer a survey from Walmart to unlock the dataset.
@Thomas_Ott I have attached the dataset.
@naman_sharma the process runs in about 40 seconds on my machine, so it might be a problem with memory or the type of license you have.
I'm not familiar with this dataset but I noticed that the "visit numbers" attribute is on a huge scale (from 5 to 20,000 or so). That'll skew the results a bit and you might want to think about normalizing that if it makes sense.