The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Newbe to Rapidminer: Want to apply unsupervised outlier detection

Saurabh_Sawant_24Saurabh_Sawant_24 Member Posts: 7 Learner II
Hi,
I am new to rapidminer and data science and want to apply unsupervised outlier detection on 10 million transactional dataset.
 want to understand ideal data processing steps and alogrithms to perform the activity. Thanks. 
0
0 votes

Fixed and Released · Last Updated

Comments

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    There are many different outlier detection methods available, and your description is pretty vague about the nature of the task you want to accomplish.  Also the best type of outlier detection often depends on the nature of the relationships in the data and what exactly constitutes an outlier in your view. 
    To get started, I'd download the free Anomaly Detection extension and take a look at the operators they have.  One simple method you could start with would be the HBOS approach, which is non-parametric and pretty simple.  It looks to identify outliers based purely on ranges.  If that doesn't suit your needs you may want to look at some density-based measures like LOF.
    Without more detailed information, it is hard to make a more concrete recommendation.  One word of warning, though, with 10MM records you might want to take a sample first and try a few approaches to see what they are like.  Otherwise you might have to wait a long time for RapidMiner to process that many records, depending on the hardware resources you have available!

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @Saurabh_Sawant_24 - if you're new at this, I'd recommend trying the "Outliers" option in Auto Model. Super fast and simple.



    Scott
  • Saurabh_Sawant_24Saurabh_Sawant_24 Member Posts: 7 Learner II
    edited December 2018
    @sgenzer
    Tried auto model and followed the steps as suggested but giving an error "No key attributes are specified. Please adjust the parameter 'key attributes'" in the Results section 
    Please help.
  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @Saurabh_Sawant_24 can you please be more specific on how/where you are seeing this message? What are you doing before you get this?

    Scott

  • Saurabh_Sawant_24Saurabh_Sawant_24 Member Posts: 7 Learner II
    @sgenzer 

    Please see the attached error.


  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    edited December 2018
    that looks like a good 'ol bug to me. Pushing to Bug Reporting cc @IngoRM and requested log file from user via DM.
  • IngoRMIngoRM Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,
    Can you please share some more information about your data?  How many columns do you have and what are their types?  I am not able to reproduce this error so I am assuming for now there is something specific in your data set which Auto Model chokes on - which should be obviously not the case.
    Many thanks for your support,
    Ingo
Sign In or Register to comment.