Parsing Text.....Model Output
Sorry to bring this up again but I am trying to parse a small text file which is the output from a write as text operator. Could someone give me a few pointers on how to approach this? A link to a tutorial would also be very helpful.
The source is a weight table:-
23.06.2016 11:51:49 Results of ResultWriter 'Write as Text' [1]:
23.06.2016 11:51:49 1. Total number of Support Vectors: 148
Bias (offset): 0.16447
w[A-0] = -0.15679
w[B.B-0] = -0.09370
w[C-0] = -0.00203
w[D-0] = -0.01725
w[E-0] = 0.11334
w[F-0] = -0.10510
w[G-0] = 0.07406
w[H-0] = 0.11156
w[I-0] = 0.06108
w[IN-0] = 0.07957
w[J-0] = 0.20053
w[JP-0] = -0.00121
w[L-0] = 0.06061
w[M-0] = 0.17203
w[N-0] = 0.13760
w[QM-0] = 0.17374
w[V-0] = 0.07307
w[WM-0] = 0.08355
number of classes: 2
number of support vectors for class up: 74
number of support vectors for class down: 74
I am trying to it put into two columns with the following format and seem to be getting nowhere.
A -0.15679
B.B -0.09370
regards,
Alex
Best Answer
-
hughesfleming68 Member Posts: 323 Unicorn
I figured this out...finally. There are a couple of warnings but this works and sorts the weight table from LibSVM. There are a couple of warnings which I have not looked into that I would like some feedback on if anyone has the time to look but this is good enough for now.
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="7.1.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.1.001" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="text:read_document" compatibility="7.1.001" expanded="true" height="68" name="Read Document" width="90" x="112" y="136">
<parameter key="file" value="C:\Users\Alex Fleming\Desktop\WriteModelOutput.txt"/>
</operator>
<operator activated="true" class="text:process_documents" compatibility="7.1.001" expanded="true" height="103" name="Process Documents" width="90" x="380" y="187">
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="7.1.001" expanded="true" height="68" name="Tokenize" width="90" x="112" y="30">
<parameter key="mode" value="regular expression"/>
<parameter key="characters" value="["/>
<parameter key="expression" value="\n"/>
</operator>
<operator activated="true" class="text:filter_tokens_by_content" compatibility="7.1.001" expanded="true" height="68" name="Filter Tokens (by Content)" width="90" x="313" y="34">
<parameter key="string" value="w["/>
</operator>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_op="Filter Tokens (by Content)" to_port="document"/>
<connect from_op="Filter Tokens (by Content)" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<operator activated="true" class="text:wordlist_to_data" compatibility="7.1.001" expanded="true" height="82" name="WordList to Data" width="90" x="581" y="289"/>
<operator activated="true" class="select_attributes" compatibility="7.1.001" expanded="true" height="82" name="Select Attributes" width="90" x="782" y="391">
<parameter key="attribute_filter_type" value="single"/>
<parameter key="attribute" value="word"/>
</operator>
<operator activated="true" class="split" compatibility="7.1.001" expanded="true" height="82" name="Split" width="90" x="916" y="391">
<parameter key="split_pattern" value="="/>
</operator>
<operator activated="true" class="set_role" compatibility="7.1.001" expanded="true" height="82" name="Set Role" width="90" x="1050" y="391">
<parameter key="attribute_name" value="word_1"/>
<parameter key="target_role" value="id"/>
<list key="set_additional_roles">
<parameter key="word_2" value="label"/>
</list>
</operator>
<operator activated="true" class="rename" compatibility="7.1.001" expanded="true" height="82" name="Rename" width="90" x="1184" y="391">
<parameter key="old_name" value="word_1"/>
<parameter key="new_name" value="Name"/>
<list key="rename_additional_attributes">
<parameter key="word_2" value="Weight"/>
</list>
</operator>
<operator activated="true" class="sort" compatibility="7.1.001" expanded="true" height="82" name="Sort" width="90" x="1318" y="391">
<parameter key="attribute_name" value="Weight"/>
<parameter key="sorting_direction" value="decreasing"/>
</operator>
<connect from_op="Read Document" from_port="output" to_op="Process Documents" to_port="documents 1"/>
<connect from_op="Process Documents" from_port="word list" to_op="WordList to Data" to_port="word list"/>
<connect from_op="WordList to Data" from_port="example set" to_op="Select Attributes" to_port="example set input"/>
<connect from_op="Select Attributes" from_port="example set output" to_op="Split" to_port="example set input"/>
<connect from_op="Split" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Rename" to_port="example set input"/>
<connect from_op="Rename" from_port="example set output" to_op="Sort" to_port="example set input"/>
<connect from_op="Sort" from_port="example set output" to_port="result 1"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
</process>
</operator>
</process>1
Answers
Hello @hughesfleming68 (Alex)
What operator output is that ?
Does that operator have a "wei" port. That will be a weight port and you can then use the "Weight to Data" operator to convert it to a regular row/column table representation.
That may be an easier way
That would certainly be an easier way and that does work if I use the mySVM Linear operator as Sebastian Land very kindly pointed out to me last week. In my case, I would rather use libsvm for this particular process and the "wei" port is not available. If there is a way to get identical output from both linear learners, I don't know about it. Ideally, I would like to use the output from a "mod" port and use an operator that is more flexible than "write as text".
I am hoping that parsing might work but I am already anticipating problems with that. I am also looking at groovy scripting and ultimately writing an extention or modifying and existing one.
This would not be a big deal if it were one text file but I am generating more than 20 a day and then slicing them up the old fashioned way with a pen and paper. That 20 could easily become 40 or 50 so am willing to spend some time on this.
Lastly, many thanks to the Rapidminer team for building and supporting such an amazing tool!
regards,
Alex