The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Sampling or filtering transactions down to last X per customer
I have a large exampleset with every single transaction completed by each customer over a long time frame. I want to filter this down to the last X transactions that each customer did during their lifetime (e.g. up to the last 5 - if they did fewer than this then at least the ones they did).
I really don't want to have to loop through customer by customer as performance will be dire. Is there a smarter/simpler way of doing this? Each transaction has a timestamp and a client id.
Thanks in advance for any suggestions.
I really don't want to have to loop through customer by customer as performance will be dire. Is there a smarter/simpler way of doing this? Each transaction has a timestamp and a client id.
Thanks in advance for any suggestions.
0
Answers
I have a process that loops through each client ID, filters the exampleset down to just the transactions for that client and does a filter example range for each one but as you can imagine performance is pretty darn bad.
Any other possible workarounds? (e.g. a smart way of numbering the transaction sequence per client and then just filtering the whole lot for sequence numbers <= X)?
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.6.000" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.6.000" expanded="true" height="68" name="Retrieve Titanic" width="90" x="112" y="34">
<parameter key="repository_entry" value="//Samples/data/Titanic"/>
</operator>
<operator activated="true" class="operator_toolbox:group_into_collection" compatibility="2.4.000-SNAPSHOT" expanded="true" height="82" name="Group Into Collection" width="90" x="246" y="34">
<parameter key="group_by_attribute" value="Passenger Class"/>
<parameter key="group_by_attribute (numerical)" value=""/>
<parameter key="sorting_order" value="none"/>
</operator>
<operator activated="true" class="loop_collection" compatibility="9.6.000" expanded="true" height="82" name="Loop Collection" width="90" x="380" y="34">
<parameter key="set_iteration_macro" value="false"/>
<parameter key="macro_name" value="iteration"/>
<parameter key="macro_start_value" value="1"/>
<parameter key="unfold" value="false"/>
<process expanded="true">
<operator activated="true" class="sort" compatibility="9.6.000" expanded="true" height="82" name="Sort" width="90" x="45" y="34">
<parameter key="attribute_name" value="Passenger Fare"/>
<parameter key="sorting_direction" value="decreasing"/>
</operator>
<operator activated="true" class="filter_example_range" compatibility="9.6.000" expanded="true" height="82" name="Filter Example Range" width="90" x="179" y="34">
<parameter key="first_example" value="1"/>
<parameter key="last_example" value="3"/>
<parameter key="invert_filter" value="false"/>
</operator>
<connect from_port="single" to_op="Sort" to_port="example set input"/>
<connect from_op="Sort" from_port="example set output" to_op="Filter Example Range" to_port="example set input"/>
<connect from_op="Filter Example Range" from_port="example set output" to_port="output 1"/>
<portSpacing port="source_single" spacing="0"/>
<portSpacing port="sink_output 1" spacing="0"/>
<portSpacing port="sink_output 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Titanic" from_port="output" to_op="Group Into Collection" to_port="exa"/>
<connect from_op="Group Into Collection" from_port="col" to_op="Loop Collection" to_port="collection"/>
<connect from_op="Loop Collection" from_port="output 1" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Dortmund, Germany
Dortmund, Germany
Ah ok, good to know - I misinterpreted your process XML. Client is rather strict about installation of extensions as well and that one is not currently installed - will have to go through their process to get it so unfortunately there will be a lag before I can test it.
Thanks again.
Paul