The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"[SOLVED] java.lang.nullPointerException in simple text mining script"
Hello, I'm new to text mining and rapidminer, but I'm following a tutorial in "Practical Text Mining" and cant make a very simple script work. The process fails and returns the java.lang.nullpointerexception error. I'm running Mac OsX 10.6.8, Java 13.7.2, Rapidminer 5.2.006.
I'm using the Read Excel operator to load a simple three-column spreadsheet. The columns are ID, Year, and Abstract. Abstract contains the text I'm trying to mine. I've flagged ID as the id field, and Abstract is flagged as text on the import wizard. There are 901 examples in the example set, and the Read Excel operator is working because I see my data when hovering over the output node. It also looks correct going into the Process Document from Data (PDFD) operator at the exa node.
On the PDFD operator, create word vector is checked (TF-IDF), as is keep text. PDFD contains a subprocess: Transform Case and Tokenize. I've removed all other operators from the program in order to isolate PDFD as the problem. When I hover over the output node of PDFD, it says Examples=0 but still shows my 3 attribute names.
Here is the xml code:
Bill
I'm using the Read Excel operator to load a simple three-column spreadsheet. The columns are ID, Year, and Abstract. Abstract contains the text I'm trying to mine. I've flagged ID as the id field, and Abstract is flagged as text on the import wizard. There are 901 examples in the example set, and the Read Excel operator is working because I see my data when hovering over the output node. It also looks correct going into the Process Document from Data (PDFD) operator at the exa node.
On the PDFD operator, create word vector is checked (TF-IDF), as is keep text. PDFD contains a subprocess: Transform Case and Tokenize. I've removed all other operators from the program in order to isolate PDFD as the problem. When I hover over the output node of PDFD, it says Examples=0 but still shows my 3 attribute names.
Here is the xml code:
Here is the stack trace:
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.006">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true" height="251" width="413">
<operator activated="true" class="read_excel" compatibility="5.2.006" expanded="true" height="60" name="Read Excel" width="90" x="45" y="75">
<parameter key="excel_file" value="/Users/Bill/Desktop/Literature_Datsset_1994-2005.xls"/>
<parameter key="sheet_number" value="1"/>
<parameter key="imported_cell_range" value="A1:E902"/>
<parameter key="encoding" value="SYSTEM"/>
<parameter key="first_row_as_names" value="true"/>
<list key="annotations">
<parameter key="0" value="Name"/>
</list>
<parameter key="date_format" value=""/>
<parameter key="time_zone" value="SYSTEM"/>
<parameter key="locale" value="English (United States)"/>
<list key="data_set_meta_data_information">
<parameter key="0" value="ID.true.nominal.attribute"/>
<parameter key="1" value="YEAR.true.nominal.attribute"/>
<parameter key="2" value="JOURNAL.true.nominal.attribute"/>
<parameter key="3" value="ABSTRACT.true.text.attribute"/>
</list>
<parameter key="read_not_matching_values_as_missings" value="true"/>
<parameter key="datamanagement" value="double_array"/>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="5.2.002" expanded="true" height="76" name="Process Documents from Data" width="90" x="179" y="75">
<parameter key="create_word_vector" value="true"/>
<parameter key="vector_creation" value="TF-IDF"/>
<parameter key="add_meta_information" value="true"/>
<parameter key="keep_text" value="false"/>
<parameter key="prune_method" value="absolute"/>
<parameter key="prunde_below_percent" value="3.0"/>
<parameter key="prune_above_percent" value="30.0"/>
<parameter key="prune_below_absolute" value="3"/>
<parameter key="prune_above_absolute" value="55"/>
<parameter key="prune_below_rank" value="0.05"/>
<parameter key="prune_above_rank" value="0.05"/>
<parameter key="datamanagement" value="double_sparse_array"/>
<parameter key="select_attributes_and_weights" value="false"/>
<list key="specify_weights"/>
<process expanded="true" height="340" width="634">
<operator activated="true" class="text:transform_cases" compatibility="5.2.002" expanded="true" height="60" name="Transform Cases" width="90" x="59" y="109">
<parameter key="transform_to" value="lower case"/>
</operator>
<operator activated="true" class="text:tokenize" compatibility="5.2.002" expanded="true" height="60" name="Tokenize" width="90" x="169" y="110">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<operator activated="true" class="text:filter_stopwords_english" compatibility="5.2.002" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="297" y="111"/>
<operator activated="true" class="text:filter_by_length" compatibility="5.2.002" expanded="true" height="60" name="Filter Tokens (by Length)" width="90" x="456" y="104">
<parameter key="min_chars" value="2"/>
<parameter key="max_chars" value="55"/>
</operator>
<connect from_port="document" to_op="Transform Cases" to_port="document"/>
<connect from_op="Transform Cases" from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
<connect from_op="Filter Stopwords (English)" from_port="document" to_op="Filter Tokens (by Length)" to_port="document"/>
<connect from_op="Filter Tokens (by Length)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="k_means" compatibility="5.2.006" expanded="true" height="76" name="Clustering" width="90" x="45" y="165">
<parameter key="add_cluster_attribute" value="true"/>
<parameter key="add_as_label" value="false"/>
<parameter key="remove_unlabeled" value="false"/>
<parameter key="k" value="2"/>
<parameter key="max_runs" value="10"/>
<parameter key="determine_good_start_values" value="false"/>
<parameter key="measure_types" value="BregmanDivergences"/>
<parameter key="mixed_measure" value="MixedEuclideanDistance"/>
<parameter key="nominal_measure" value="NominalDistance"/>
<parameter key="numerical_measure" value="EuclideanDistance"/>
<parameter key="divergence" value="SquaredEuclideanDistance"/>
<parameter key="kernel_type" value="radial"/>
<parameter key="kernel_gamma" value="1.0"/>
<parameter key="kernel_sigma1" value="1.0"/>
<parameter key="kernel_sigma2" value="0.0"/>
<parameter key="kernel_sigma3" value="2.0"/>
<parameter key="kernel_degree" value="3.0"/>
<parameter key="kernel_shift" value="1.0"/>
<parameter key="kernel_a" value="1.0"/>
<parameter key="kernel_b" value="0.0"/>
<parameter key="max_optimization_steps" value="100"/>
<parameter key="use_local_random_seed" value="false"/>
<parameter key="local_random_seed" value="1992"/>
</operator>
<connect from_port="input 1" to_op="Read Excel" to_port="file"/>
<connect from_op="Read Excel" from_port="output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_op="Clustering" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
<connect from_op="Clustering" from_port="cluster model" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="source_input 2" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Thanks in advance for any help you can offer!
Stack trace:
------------
Exception: java.lang.NullPointerException
Message: null
Stack trace:
com.rapidminer.operator.nio.model.ExcelResultSetConfiguration.makeDataResultSet(ExcelResultSetConfiguration.java:275)
com.rapidminer.operator.nio.model.AbstractDataResultSetReader.createExampleSet(AbstractDataResultSetReader.java:127)
com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:52)
com.rapidminer.operator.io.AbstractExampleSource.read(AbstractExampleSource.java:36)
com.rapidminer.operator.io.AbstractReader.doWork(AbstractReader.java:123)
com.rapidminer.operator.Operator.execute(Operator.java:834)
com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
com.rapidminer.operator.Operator.execute(Operator.java:834)
com.rapidminer.Process.run(Process.java:925)
com.rapidminer.Process.run(Process.java:848)
com.rapidminer.Process.run(Process.java:807)
com.rapidminer.Process.run(Process.java:802)
com.rapidminer.Process.run(Process.java:792)
com.rapidminer.gui.ProcessThread.run(ProcessThread.java:63)
Bill
Tagged:
0
Answers
there was indeed a bug involved, should be fixed in the next release.
Regards,
Marco
NullPointerException is a RuntimeException . Runtime exceptions are critical and cannot be caught at compile time. They crash the program at run time if they are not handled properly. When a class is instantiated, its object is stored in computer memory. The NullPointerExceptions occur when you try to use a reference that points to no location in memory (null) as though it were referencing an object. These include:
Balmer
I had the same problem (RapidMiner 6.5.2). Because I use some attributes and it's confusing to have no names, I just tried to import csv excelsheets and it works (slowly) but without errors.
Cheers,
ME. Taillefer