The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Doing LinearRegression in a loop? [Solved]
Hi,
I'm having a problem trying to automate something across a dataset that works fine for subsets.
I want to generate linear regression gradients for the weekly sales of a bunch of products. My input data is of the form:
Product, Week, Quantity
"Product 1", "2012-03-02", 34
"Product 1", "2012-03-09", 72
"Product 2", "2012-03-02", 91
"Product 2", "2012-03-09", 27
etc.
I want to generate a resultset that looks like:
Product, Trend_Gradient
Product 1, 39.2
Product 2, 15.2
I have it working well enough for a dataset that contains only the one product's sales data but can't figure out how to loop across the dataset with each loop containing all the entries for one product. Essentially I want to apply the LinearRegression operator in an SQL "GROUP BY Product_ID" type of process.
Any tips?
This is the process I'm trying at the moment though something is wrong and it's probably the loop operator.
I'm having a problem trying to automate something across a dataset that works fine for subsets.
I want to generate linear regression gradients for the weekly sales of a bunch of products. My input data is of the form:
Product, Week, Quantity
"Product 1", "2012-03-02", 34
"Product 1", "2012-03-09", 72
"Product 2", "2012-03-02", 91
"Product 2", "2012-03-09", 27
etc.
I want to generate a resultset that looks like:
Product, Trend_Gradient
Product 1, 39.2
Product 2, 15.2
I have it working well enough for a dataset that contains only the one product's sales data but can't figure out how to loop across the dataset with each loop containing all the entries for one product. Essentially I want to apply the LinearRegression operator in an SQL "GROUP BY Product_ID" type of process.
Any tips?
This is the process I'm trying at the moment though something is wrong and it's probably the loop operator.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
<process expanded="true" height="762" width="685">
<operator activated="true" class="read_csv" compatibility="5.2.006" expanded="true" height="60" name="Read CSV" width="90" x="45" y="30">
<parameter key="csv_file" value="/home/user/Repots/SalesAllProducts/SalesByWeekAllProducts.csv"/>
<parameter key="column_separators" value=","/>
<parameter key="date_format" value="yyyy-MM-dd"/>
<parameter key="first_row_as_names" value="false"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="encoding" value="UTF-8"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="Product.true.polynominal.id"/>
<parameter key="1" value="Date.true.date.attribute"/>
<parameter key="2" value="Sold.true.numeric.label"/>
</list>
</operator>
<operator activated="true" class="loop_values" compatibility="5.2.006" expanded="true" height="94" name="Loop Values" width="90" x="246" y="75">
<parameter key="attribute" value="Product"/>
<process expanded="true" height="780" width="708">
<operator activated="true" class="series:moving_average" compatibility="5.1.002" expanded="true" height="76" name="Moving Average" width="90" x="45" y="30">
<parameter key="attribute_name" value="Sold"/>
<parameter key="window_width" value="4"/>
<parameter key="ignore_missings" value="true"/>
<parameter key="keep_original_attribute" value="false"/>
</operator>
<operator activated="true" class="series:replace_missing_series_values" compatibility="5.1.002" expanded="true" height="76" name="Replace Missing Values" width="90" x="179" y="30">
<parameter key="attribute_name" value="moving_average(Sold)"/>
<parameter key="replacement" value="next value"/>
</operator>
<operator activated="true" class="rename" compatibility="5.2.006" expanded="true" height="76" name="Rename" width="90" x="313" y="30">
<parameter key="old_name" value="moving_average(Sold)"/>
<parameter key="new_name" value="Sold"/>
<list key="rename_additional_attributes"/>
</operator>
<operator activated="true" class="set_role" compatibility="5.2.006" expanded="true" height="76" name="Set Role" width="90" x="447" y="30">
<parameter key="name" value="Sold"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="linear_regression" compatibility="5.2.006" expanded="true" height="94" name="Linear Regression" width="90" x="581" y="30"/>
<connect from_port="example set" to_op="Moving Average" to_port="example set input"/>
<connect from_op="Moving Average" from_port="example set output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Linear Regression" to_port="training set"/>
<connect from_op="Linear Regression" from_port="model" to_port="out 1"/>
<connect from_op="Linear Regression" from_port="exampleSet" to_port="out 2"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="sink_out 1" spacing="0"/>
<portSpacing port="sink_out 2" spacing="0"/>
<portSpacing port="sink_out 3" spacing="0"/>
</process>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Loop Values" to_port="example set"/>
<connect from_op="Loop Values" from_port="out 1" to_port="result 1"/>
<connect from_op="Loop Values" from_port="out 2" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Tagged:
0
Answers
you are missing a Filter Examples operator in the loop. Please see the attached process for an example.
Best, Marius
My Product ID is polynomial but other than that, your process is pretty straightforward to adapt.