The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Preparing data for mining

Emelie_tornkvist123Emelie_tornkvist123 Member Posts: 3 Learner I
Hi!
I am a beginner of this and I am doing my final thesis and would like to test out machine Learning on my process. I have 49 customers data that consist of one measurepoint for heating each day * 638 Days. These are all depended on outside temperature. The goal is to detect if the measurepoints are lower then expected then there is a fault. How do i set rules for this? Do I only do the attributes and see hos the ML is clustering them? I have done my own calculations so  it can recieve measurepoints that are lower then expected but I would like to see if ML also can do that? Hope someone can help me with this? My personal opinion is that this is not suitable for machinelearning since I caqn do my own calculations on this fairly easy and extract this faults and also I only have one attribute to split on and thats the temperature and a ekvation on how the measureponits should be related to outdoor temperature.
Hoping someone can help me on this :)

Best Answer

  • Telcontar120Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Solution Accepted
    It sounds like you have a simple rule for determining whether there is a fault or not.  If that is the case, then indeed you do not need machine learning.  You can simply code that rule and compare to it.
    If you want to try the machine learning approach you can then take that rule and turn into your label using Set Role and then go for a prediction using a simple learner like Decision Tree.  But I don't think it really gets you anything beyond your rule in this case.
    If the measurements are all independent and the performance outcome can vary day by day even for the same customer (which is what it sounds like) then you would probably just structure your dataset as 638 separate examples.
    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts

Answers

  • Emelie_tornkvist123Emelie_tornkvist123 Member Posts: 3 Learner I
    I can add that I am wondering how I present 49 different customers and 638 different measurpoints on 638 Days. Do I list this 50 cusomers in a row (up and down after eachother) or one on one from right to left?
  • Emelie_tornkvist123Emelie_tornkvist123 Member Posts: 3 Learner I
    Thank you Telcontar120! That really helped me :)
Sign In or Register to comment.