The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Dears I need help I have log file as text file Contains about 500 line I need to count the numbers
Ahmedte1234
Member Posts: 3 Learner II
The lines on file as
1 Jan 10:00 the chassis normal status
1 Jan 10:30 log I'd lost
1 Jan 12:30 interface down
1 Jan 1:00 power off system
2 Jan 11:00 the high temperature
2 Jan 2:00 the user log in successfully
And alot of statements like that so some statements useful and some statements no.
So the output like that
Down appear 10 times
Power off appear 1 time
Interface down 3 times
And I need the algorithm to suggest the most words and how many appear in file. And also how to reduce with certain pattern.
Tagged:
0
Best Answer
-
varunm1 Member Posts: 1,207 UnicornYou can go through the tutorial of FP - Growth in RapidMiner. Type FP in search of rapidminer operators, then drag and drop the operator in your process, if you click on the operator you can see tutorials in the help window. You can see below screenshot. You can also see tutorial on academy here https://academy.rapidminer.com/learn/video/text-association-rules
Regards,
Varun
https://www.varunmandalapu.com/Be Safe. Follow precautions and Maintain Social Distancing
5
Answers
First, install " text processing" and "Web mining" extensions from marketplace in rapidminer. To count the repetition of words in your document, you first need to read your text file into RapidMiner. Then you can use the below XML code (click on show) to extract details about your data attach your text file instead of the one in this XML. To use this XML, you first need to copy the XML code from here and then open a new blank process in rapidminer, you need to enable XML window by going to VIEW --> Show Panel --> XML in menu bar of RapidMiner. Copy the code from here and paste it in XML window of rapidminer new process, then click the green tick mark which will show you the process as seen in below figure. Once you get this delete the retrieve files and attach your file imported into rapidminer. I also attached the result of the process based on some data you provided. The term occurances is giving you the number of times the word is repeated in your file. There are multiple community samples as well to understand how TF-IDF works
<?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
<parameter key="logverbosity" value="init"/>
<parameter key="random_seed" value="2001"/>
<parameter key="send_mail" value="never"/>
<parameter key="notification_email" value=""/>
<parameter key="process_duration_for_mail" value="30"/>
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve files" width="90" x="112" y="85">
<parameter key="repository_entry" value="//Local Repository/RapidMIner/files"/>
</operator>
<operator activated="true" class="nominal_to_text" compatibility="9.2.001" expanded="true" height="82" name="Nominal to Text" width="90" x="246" y="136">
<parameter key="attribute_filter_type" value="all"/>
<parameter key="attribute" value=""/>
<parameter key="attributes" value=""/>
<parameter key="use_except_expression" value="false"/>
<parameter key="value_type" value="nominal"/>
<parameter key="use_value_type_exception" value="false"/>
<parameter key="except_value_type" value="file_path"/>
<parameter key="block_type" value="single_value"/>
<parameter key="use_block_type_exception" value="false"/>
<parameter key="except_block_type" value="single_value"/>
<parameter key="invert_selection" value="false"/>
<parameter key="include_special_attributes" value="false"/>
</operator>
<operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="447" y="136">
<parameter key="create_word_vector" value="true"/>
<parameter key="vector_creation" value="TF-IDF"/>
<parameter key="add_meta_information" value="true"/>
<parameter key="keep_text" value="false"/>
<parameter key="prune_method" value="none"/>
<parameter key="prune_below_percent" value="3.0"/>
<parameter key="prune_above_percent" value="30.0"/>
<parameter key="prune_below_rank" value="0.05"/>
<parameter key="prune_above_rank" value="0.95"/>
<parameter key="datamanagement" value="double_sparse_array"/>
<parameter key="data_management" value="auto"/>
<parameter key="select_attributes_and_weights" value="false"/>
<list key="specify_weights"/>
<process expanded="true">
<operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="85">
<parameter key="mode" value="non letters"/>
<parameter key="characters" value=".:"/>
<parameter key="language" value="English"/>
<parameter key="max_token_length" value="3"/>
</operator>
<connect from_port="document" to_op="Tokenize" to_port="document"/>
<connect from_op="Tokenize" from_port="document" to_port="document 1"/>
<portSpacing port="source_document" spacing="0"/>
<portSpacing port="sink_document 1" spacing="0"/>
<portSpacing port="sink_document 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve files" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
<connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
<connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
<connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
</process>
</operator>
</process>
Hope this helps. Please inform if you are looking for a different thing.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing