The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Denormalization of data
I have a data set with a list of transactions with the following attributes:
CustomerID, ProductID, ProductCount
I am trying to transform it into a format where each line corresponds to a CustomerID
and where there is an attribute for each product, like
CustomerID, Product_A_Total_Count, Product_B_Total_Count, Product_C_Total_Count
with counts summed up over all transactions the customer has made with the same product.
I have tried to use Pivot which produces a dataset with the desired structure,
but it does not aggregate the counts of multiple transactions of the same customer with the same product;
rather it picks the count of the last transaction.
Using the ProductCount as WEIGHT makes all the Product_X_COunt attributes disappear entirely from the result.
Am I using the wrong operator?
CustomerID, ProductID, ProductCount
I am trying to transform it into a format where each line corresponds to a CustomerID
and where there is an attribute for each product, like
CustomerID, Product_A_Total_Count, Product_B_Total_Count, Product_C_Total_Count
with counts summed up over all transactions the customer has made with the same product.
I have tried to use Pivot which produces a dataset with the desired structure,
but it does not aggregate the counts of multiple transactions of the same customer with the same product;
rather it picks the count of the last transaction.
Using the ProductCount as WEIGHT makes all the Product_X_COunt attributes disappear entirely from the result.
Am I using the wrong operator?
0
Answers
could you post you process please?
Ciao Sebastian
P.S. Copy and past the xml code of the process from rapidminer. Use the "insert code" button (#) in the forum editor to post code.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.0">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" expanded="true" name="Process">
<process expanded="true" height="325" width="212">
<operator activated="true" class="retrieve" expanded="true" height="60" name="Retrieve" width="90" x="45" y="75">
<parameter key="repository_entry" value="transact"/>
</operator>
<operator activated="true" class="pivot" expanded="true" height="76" name="Pivot" width="90" x="164" y="73">
<parameter key="group_attribute" value="CustomerID"/>
<parameter key="index_attribute" value="ProductID"/>
</operator>
<connect from_op="Retrieve" from_port="output" to_op="Pivot" to_port="example set input"/>
<connect from_op="Pivot" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
The Input table looks like
CustomerID ProductID ProductCount
1 A 4
2 B 5
1 A 6
1 B 7
The resulting Table is
RowNo CustomerID ProductCount_A ProductCount_B
1 1 6 7
2 2 ? 5
I need a 10 (6+4) instead of the 6...
you need to aggregate first. Take a look at this process, it shows you how to do this: Ciao Sebastian