The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Only minimum data can handled by k-means"
Hi All,
While trying with k-means algorithm, Rapidminer can able to process only 335mb not more that.
here is my XML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="5.3.013" expanded="true" height="60" name="Read CSV" width="90" x="45" y="75">
<parameter key="csv_file" value="/root/Desktop/rapidsimpletest/337mb.csv"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.3.013" expanded="true" height="94" name="Normalize" width="90" x="246" y="255"/>
<operator activated="true" class="k_means" compatibility="5.3.013" expanded="true" height="76" name="Clustering" width="90" x="447" y="120">
<parameter key="k" value="3"/>
<parameter key="measure_types" value="MixedMeasures"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Erro Description:
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Could not parse line 24 in input: com.rapidminer.tools.CSVParseException: Value quote misplaced at position 88. Last characters read: ormation,"
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Could not parse line 24 in input: com.rapidminer.tools.CSVParseException: Value quote misplaced at position 42. Last characters read: Archive,"
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Could not parse line 24 in input: com.rapidminer.tools.CSVParseException: Value quote misplaced at position 67. Last characters read: Home,,,,"
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Maximum number of warnings exceeded. Will display no further warnings.
Apr 09, 2014 4:02:06 PM com.rapidminer.gui.ProcessThread run
SEVERE: Process failed: Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
at java.lang.StringBuffer.append(StringBuffer.java:322)
at java.io.BufferedReader.readLine(BufferedReader.java:351)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at com.rapidminer.gui.tools.dialogs.wizards.dataimport.csv.LineReader.readLine(LineReader.java:55)
at com.rapidminer.operator.nio.model.CSVResultSet.readNext(CSVResultSet.java:149)
at com.rapidminer.operator.nio.model.CSVResultSet.next(CSVResultSet.java:195)
at com.rapidminer.operator.nio.model.DataResultSetTranslator.read(DataResultSetTranslator.java:148)
at com.rapidminer.operator.nio.model.AbstractDataResultSetReader.createExampleSet(AbstractDataResultSetReader.java:147)
at com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:52)
at com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:36)
at com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:126)
at com.rapidminer.operator.Operator.execute(Operator.java:867)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:375)
at com.rapidminer.operator.Operator.execute(Operator.java:867)
at com.rapidminer.Process.run(Process.java:949)
at com.rapidminer.Process.run(Process.java:873)
at com.rapidminer.Process.run(Process.java:832)
at com.rapidminer.Process.run(Process.java:827)
at com.rapidminer.Process.run(Process.java:817)
at com.rapidminer.gui.ProcessThread.run(ProcessThread.java:63)
Apr 09, 2014 4:02:06 PM com.rapidminer.gui.ProcessThread run
SEVERE: Here: Process[1] (Process)
subprocess 'Main Process'
==> +- Read CSV[1] (Read CSV)
+- Normalize[0] (Normalize)
+- Clustering[0] (k-Means)
I am trying 3 clusters with k-means. But the results are not proper.
Cluster Model
Cluster 0: 22292 items
Cluster 1: 1 items
Cluster 2: 1 items
Total number of items: 22294
could you please guys help me, where I am doing mistakes?
Thanks.
Venkat
While trying with k-means algorithm, Rapidminer can able to process only 335mb not more that.
here is my XML
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.013">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.3.013" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="5.3.013" expanded="true" height="60" name="Read CSV" width="90" x="45" y="75">
<parameter key="csv_file" value="/root/Desktop/rapidsimpletest/337mb.csv"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="normalize" compatibility="5.3.013" expanded="true" height="94" name="Normalize" width="90" x="246" y="255"/>
<operator activated="true" class="k_means" compatibility="5.3.013" expanded="true" height="76" name="Clustering" width="90" x="447" y="120">
<parameter key="k" value="3"/>
<parameter key="measure_types" value="MixedMeasures"/>
</operator>
<connect from_op="Read CSV" from_port="output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Clustering" to_port="example set"/>
<connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>
Erro Description:
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Could not parse line 24 in input: com.rapidminer.tools.CSVParseException: Value quote misplaced at position 88. Last characters read: ormation,"
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Could not parse line 24 in input: com.rapidminer.tools.CSVParseException: Value quote misplaced at position 42. Last characters read: Archive,"
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Could not parse line 24 in input: com.rapidminer.tools.CSVParseException: Value quote misplaced at position 67. Last characters read: Home,,,,"
Apr 09, 2014 4:01:53 PM com.rapidminer.tools.WrapperLoggingHandler log
WARNING: Read CSV: Maximum number of warnings exceeded. Will display no further warnings.
Apr 09, 2014 4:02:06 PM com.rapidminer.gui.ProcessThread run
SEVERE: Process failed: Java heap space
java.lang.OutOfMemoryError: Java heap space
at java.util.Arrays.copyOf(Arrays.java:2367)
at java.lang.AbstractStringBuilder.expandCapacity(AbstractStringBuilder.java:130)
at java.lang.AbstractStringBuilder.ensureCapacityInternal(AbstractStringBuilder.java:114)
at java.lang.AbstractStringBuilder.append(AbstractStringBuilder.java:535)
at java.lang.StringBuffer.append(StringBuffer.java:322)
at java.io.BufferedReader.readLine(BufferedReader.java:351)
at java.io.BufferedReader.readLine(BufferedReader.java:382)
at com.rapidminer.gui.tools.dialogs.wizards.dataimport.csv.LineReader.readLine(LineReader.java:55)
at com.rapidminer.operator.nio.model.CSVResultSet.readNext(CSVResultSet.java:149)
at com.rapidminer.operator.nio.model.CSVResultSet.next(CSVResultSet.java:195)
at com.rapidminer.operator.nio.model.DataResultSetTranslator.read(DataResultSetTranslator.java:148)
at com.rapidminer.operator.nio.model.AbstractDataResultSetReader.createExampleSet(AbstractDataResultSetReader.java:147)
at com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:52)
at com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:36)
at com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:126)
at com.rapidminer.operator.Operator.execute(Operator.java:867)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:375)
at com.rapidminer.operator.Operator.execute(Operator.java:867)
at com.rapidminer.Process.run(Process.java:949)
at com.rapidminer.Process.run(Process.java:873)
at com.rapidminer.Process.run(Process.java:832)
at com.rapidminer.Process.run(Process.java:827)
at com.rapidminer.Process.run(Process.java:817)
at com.rapidminer.gui.ProcessThread.run(ProcessThread.java:63)
Apr 09, 2014 4:02:06 PM com.rapidminer.gui.ProcessThread run
SEVERE: Here: Process[1] (Process)
subprocess 'Main Process'
==> +- Read CSV[1] (Read CSV)
+- Normalize[0] (Normalize)
+- Clustering[0] (k-Means)
I am trying 3 clusters with k-means. But the results are not proper.
Cluster Model
Cluster 0: 22292 items
Cluster 1: 1 items
Cluster 2: 1 items
Total number of items: 22294
could you please guys help me, where I am doing mistakes?
Thanks.
Venkat
Tagged:
0
Answers
How many rows and columns do your datasets have?
Best regards,
Marius
I have given 1 GB for main memory. My input file contains 143000 rows and 5 columns. The size of my input file is 337 MB.
I can able to process the 335MB.
Reg,
Venkat
You should definitely try to increase the amount of avilable memory.
Best regards,
Marius