The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Fraud Detection - Rule Weighting and Thresholds
Hi.
I am looking for some guidance with the following scenario. I have a labeled sample set of data (labeled true/ false) of fraudulent and not fraudulent transactions.
I also have a set of rules that hit/executed for each transaction. I have two types of rules, rules that are fraud indicators and another set that are indicators of valid transactions. I have a very simple weighting assigned at the moment. If a fraud indicator is met, I add 1 point. If a valid indicator is met, I subtract 1 point (-1). If the rule condition is not met, 0 points are assigned. I also have a cumulative score generated, but that is what I need help with...better weighting on each rule and a thresholds.
My question is as follows:
How do I go about feeding the below sample set to RapidMiner and having RapidMiner do the following:
1. Assign a weight (score) to each rule
2. Establish a cumulative score threshold for "valid", "undetermined", and "fraud".
Here is a sample of data in delimited form. Do I need to reformat the data to make it easier to work with in RapidMiner. If so, what format, boolean ?
Also, what steps /components would I use to produce the desired results.
Thanks in advance for any help and guidance. Also, if there are any tutorials or post describing this please let me know. I looked but did not find anything applicable.
TRAN_ID,IS_FRAUD,rule1,rule2,rule3,rule4,rule6,rule5,rule7,rule8,rule9,rule10,pos_rule0,pos_rule1,pos_rule2,pos_rule3,pos_rule7,pos_rule4,pos_rule5,pos_rule6,ModelScore
A00023141,FALSE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,-1,-2
A00023142,FALSE,0,0,0,0,0,0,0,0,0,0,0,-1,0,0,0,-1,0,-1,-3
A00023143,FALSE,0,0,0,0,1,1,0,0,0,0,0,0,0,-1,-1,-1,-1,-1,-4
A00023144,FALSE,0,1,0,0,1,1,0,0,0,0,0,0,0,-1,-1,-1,-1,-1,-3
A00023145,FALSE,0,0,0,0,0,0,0,0,0,0,0,-1,0,0,0,0,0,0,-1
A00023146,FALSE,0,1,0,0,1,1,0,0,0,0,0,0,0,0,-1,-1,0,-1,0
A00023147,FALSE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,-1,-1,0,-3
A00023148,FALSE,0,0,0,0,0,0,0,0,0,0,-1,0,-1,-1,0,-1,-1,-1,-7
A00023149,FALSE,0,1,0,0,0,0,0,0,0,0,0,0,-1,-1,-1,-1,-1,0,-5
A00023150,FALSE,0,0,0,0,0,0,0,0,0,0,-1,0,-1,-1,-1,-1,-1,0,-7
A00023151,FALSE,0,0,0,0,0,0,1,0,0,0,0,-1,0,0,0,0,0,-1,-1
A00023152,FALSE,0,1,0,0,0,0,0,0,0,0,0,0,-1,-1,0,-1,-1,-1,-5
A00023153,FALSE,0,0,0,0,0,0,0,0,0,0,0,0,-1,-1,0,-1,-1,0,-5
A00023154,FALSE,0,1,0,0,0,0,0,0,0,0,0,0,-1,-1,0,-1,-1,-1,-5
A00023155,FALSE,0,1,0,0,1,0,0,0,0,0,0,0,0,-1,-1,-1,-1,-1,-4
A00023156,TRUE,1,1,0,0,1,1,0,1,0,1,0,0,0,0,0,0,0,0,6
A00023157,TRUE,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
A00023158,TRUE,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,3
A00023159,TRUE,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,3
A00023160,TRUE,1,1,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,5
A00023161,TRUE,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,2
A00023162,TRUE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,0,-1
A00023163,TRUE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,0,-1
I am looking for some guidance with the following scenario. I have a labeled sample set of data (labeled true/ false) of fraudulent and not fraudulent transactions.
I also have a set of rules that hit/executed for each transaction. I have two types of rules, rules that are fraud indicators and another set that are indicators of valid transactions. I have a very simple weighting assigned at the moment. If a fraud indicator is met, I add 1 point. If a valid indicator is met, I subtract 1 point (-1). If the rule condition is not met, 0 points are assigned. I also have a cumulative score generated, but that is what I need help with...better weighting on each rule and a thresholds.
My question is as follows:
How do I go about feeding the below sample set to RapidMiner and having RapidMiner do the following:
1. Assign a weight (score) to each rule
2. Establish a cumulative score threshold for "valid", "undetermined", and "fraud".
Here is a sample of data in delimited form. Do I need to reformat the data to make it easier to work with in RapidMiner. If so, what format, boolean ?
Also, what steps /components would I use to produce the desired results.
Thanks in advance for any help and guidance. Also, if there are any tutorials or post describing this please let me know. I looked but did not find anything applicable.
TRAN_ID,IS_FRAUD,rule1,rule2,rule3,rule4,rule6,rule5,rule7,rule8,rule9,rule10,pos_rule0,pos_rule1,pos_rule2,pos_rule3,pos_rule7,pos_rule4,pos_rule5,pos_rule6,ModelScore
A00023141,FALSE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,-1,-2
A00023142,FALSE,0,0,0,0,0,0,0,0,0,0,0,-1,0,0,0,-1,0,-1,-3
A00023143,FALSE,0,0,0,0,1,1,0,0,0,0,0,0,0,-1,-1,-1,-1,-1,-4
A00023144,FALSE,0,1,0,0,1,1,0,0,0,0,0,0,0,-1,-1,-1,-1,-1,-3
A00023145,FALSE,0,0,0,0,0,0,0,0,0,0,0,-1,0,0,0,0,0,0,-1
A00023146,FALSE,0,1,0,0,1,1,0,0,0,0,0,0,0,0,-1,-1,0,-1,0
A00023147,FALSE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,-1,-1,0,-3
A00023148,FALSE,0,0,0,0,0,0,0,0,0,0,-1,0,-1,-1,0,-1,-1,-1,-7
A00023149,FALSE,0,1,0,0,0,0,0,0,0,0,0,0,-1,-1,-1,-1,-1,0,-5
A00023150,FALSE,0,0,0,0,0,0,0,0,0,0,-1,0,-1,-1,-1,-1,-1,0,-7
A00023151,FALSE,0,0,0,0,0,0,1,0,0,0,0,-1,0,0,0,0,0,-1,-1
A00023152,FALSE,0,1,0,0,0,0,0,0,0,0,0,0,-1,-1,0,-1,-1,-1,-5
A00023153,FALSE,0,0,0,0,0,0,0,0,0,0,0,0,-1,-1,0,-1,-1,0,-5
A00023154,FALSE,0,1,0,0,0,0,0,0,0,0,0,0,-1,-1,0,-1,-1,-1,-5
A00023155,FALSE,0,1,0,0,1,0,0,0,0,0,0,0,0,-1,-1,-1,-1,-1,-4
A00023156,TRUE,1,1,0,0,1,1,0,1,0,1,0,0,0,0,0,0,0,0,6
A00023157,TRUE,1,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,2
A00023158,TRUE,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,3
A00023159,TRUE,1,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,3
A00023160,TRUE,1,1,0,0,0,1,0,1,0,1,0,0,0,0,0,0,0,0,5
A00023161,TRUE,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,2
A00023162,TRUE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,0,-1
A00023163,TRUE,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,-1,0,0,-1
Tagged:
0