The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Work on disease data
student_compute
Member Posts: 73 Contributor II
۵۱۲/۵۰۰۰
My friends used to work on text data before
Now I have a dataset containing 18 features and 106 samples. About the disease. With two classes
There are 79 samples of healthy specimens that do not have a disease. And 25 patients are sick.
And 2 samples are unknown.
I wanted to know if I should do normalization and pre-processing?
Should I do over sampeling, under sampeling?
Is this possible in the rapidminer?
Do you know the typical process for me?
Thankful
As always, I'm grateful to help you
Everyone's happy day
Tagged:
0
Answers
I can have a look at your data. Send your data as PM, I will answer you ASAP.
Cheers
Sven
Tanks
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
These are basic suggestions and more appropriate suggestions are given based on data. As we don't have data, you can try this.
Normalization: check if the data values have different ranges, for example (one feature has values between 1 and 10 and other feature has a value between 1000 and 10000) then you can normalize otherwise there is no necessity to do that.
Preprocessing: separate missing label samples from the data. Later you can use them to just predict values based on trained model. Use feature selection techniques if possible to see whether all 18 features are important or not.
Over or undersampling: First try without sampling and check how the models are working, if you feel that it is necessary to sample, i recommend smote for upsampling. As your data set is small, I guess downsampling is not a good Idea.
Build models using cross validation and add feature selection techniques inside this operator.
Finally, yes all these things are possible in rapidminer.
Hope this helps
Varun
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Thank you very much for all the friends in the posts
I changed my data. I actually picked up another data and collected 100 data.
But I still have 18 features. 23 unsuccessful data and 75 data with successful class and 2 data with unknown class
These data are about four ball sportsmen.
Can you help me with this type of data now?
Should I increase or decrease my data over sampling or under sampling? How does this work in RapidMiner?
Is my process correct?
I sent this sample. I created.
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_excel" compatibility="8.1.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
<parameter key="excel_file" value="C:\data.xlsx"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="replace_missing_values" compatibility="8.2.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="179" y="34">
<list key="columns"/>
</operator>
<operator activated="true" class="normalize" compatibility="8.2.000" expanded="true" height="103" name="Normalize" width="90" x="313" y="34"/>
<operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="447" y="34">
<parameter key="attribute_name" value="class"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="8.2.000" expanded="true" height="145" name="Cross Validation" width="90" x="581" y="34">
<process expanded="true">
<operator activated="true" class="k_nn" compatibility="8.2.000" expanded="true" height="82" name="k-NN" width="90" x="45" y="34">
<parameter key="k" value="3"/>
<parameter key="nominal_measure" value="JaccardSimilarity"/>
</operator>
<connect from_port="training set" to_op="k-NN" to_port="training set"/>
<connect from_op="k-NN" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="8.2.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_op="Read Excel" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Cross Validation" to_port="example set"/>
<connect from_op="Cross Validation" from_port="model" to_port="result 1"/>
<connect from_op="Cross Validation" from_port="example set" to_port="result 2"/>
<connect from_op="Cross Validation" from_port="performance 1" to_port="result 3"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
I will be grateful. Complete your friends
(I would say that I found myself in the forum, but I did not find a solution to my problem ..)
How should I choose a visa? And how to make a model with a neural network?
Thankful
I'm waiting for your help
Friends
Can not help me?
Your sample process looks fine. If you want to incorporate feature selection and upsampling, here is the process for you. You need to try different things (different algorithms, different parameter settings) and check if the performance improves. As your data set is small, I don't recommend downsampling. Again, as we don't have access to data our suggestions are based on approximation.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
Thank you so much for your guidance
How to use CNN or ANN to predict future data?
I tried everything. I did not achieve this
Can you help me this time?
Thank you very very veeeeery much
You need to install deep learning extension in rapidminer and go through the tutorials and apply in a similar fashion. There are many factors that impact algorithm performances and you need to go through the basic principle behind these algorithms to understand their functionality and methods to improve their performance.
Varun
https://www.varunmandalapu.com/
Be Safe. Follow precautions and Maintain Social Distancing
thank you
I installed the neural network
And I used it below
<?xml version="1.0" encoding="UTF-8"?><process version="8.2.000">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="8.2.000" expanded="true" name="Process">
<parameter key="encoding" value="SYSTEM"/>
<process expanded="true">
<operator activated="false" class="read_excel" compatibility="8.1.000" expanded="true" height="68" name="Read Excel" width="90" x="45" y="34">
<parameter key="excel_file" value="C:\data.xlsx"/>
<parameter key="encoding" value="SYSTEM"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="retrieve" compatibility="8.2.000" expanded="true" height="68" name="Retrieve Sonar" width="90" x="45" y="136">
<parameter key="repository_entry" value="//Samples/data/Sonar"/>
</operator>
<operator activated="true" class="replace_missing_values" compatibility="8.2.000" expanded="true" height="103" name="Replace Missing Values" width="90" x="179" y="34">
<list key="columns"/>
</operator>
<operator activated="true" class="normalize" compatibility="8.2.000" expanded="true" height="103" name="Normalize" width="90" x="313" y="34"/>
<operator activated="true" class="set_role" compatibility="8.2.000" expanded="true" height="82" name="Set Role" width="90" x="447" y="34">
<parameter key="attribute_name" value="class"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="optimize_selection" compatibility="8.2.000" expanded="true" height="103" name="Optimize Selection" width="90" x="514" y="136">
<process expanded="true">
<operator activated="true" class="concurrency:cross_validation" compatibility="8.2.000" expanded="true" height="145" name="Cross Validation" width="90" x="313" y="34">
<process expanded="true">
<operator activated="true" class="operator_toolbox:smote" compatibility="1.3.000" expanded="true" height="82" name="SMOTE Upsampling" width="90" x="45" y="34"/>
<operator activated="true" class="k_nn" compatibility="8.2.000" expanded="true" height="82" name="k-NN" width="90" x="179" y="34">
<parameter key="k" value="3"/>
</operator>
<connect from_port="training set" to_op="SMOTE Upsampling" to_port="exa"/>
<connect from_op="SMOTE Upsampling" from_port="ups" to_op="k-NN" to_port="training set"/>
<connect from_op="k-NN" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="8.2.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
<connect from_op="Performance" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_port="example set" to_op="Cross Validation" to_port="example set"/>
<connect from_op="Cross Validation" from_port="performance 1" to_port="performance"/>
<portSpacing port="source_example set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_performance" spacing="0"/>
</process>
</operator>
<operator activated="true" class="concurrency:cross_validation" compatibility="8.2.000" expanded="true" height="145" name="Cross Validation (2)" width="90" x="648" y="85">
<process expanded="true">
<operator activated="true" class="neural_net" compatibility="8.2.000" expanded="true" height="82" name="Neural Net" width="90" x="179" y="34">
<list key="hidden_layers"/>
<parameter key="training_cycles" value="100"/>
<parameter key="learning_rate" value="0.5"/>
<parameter key="momentum" value="0.4"/>
</operator>
<connect from_port="training set" to_op="Neural Net" to_port="training set"/>
<connect from_op="Neural Net" from_port="model" to_port="model"/>
<portSpacing port="source_training set" spacing="0"/>
<portSpacing port="sink_model" spacing="0"/>
<portSpacing port="sink_through 1" spacing="0"/>
</process>
<process expanded="true">
<operator activated="true" class="apply_model" compatibility="8.2.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="45" y="34">
<list key="application_parameters"/>
</operator>
<operator activated="true" class="performance_classification" compatibility="8.2.000" expanded="true" height="82" name="Performance (2)" width="90" x="179" y="34">
<list key="class_weights"/>
</operator>
<connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
<connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
<connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance (2)" to_port="labelled data"/>
<connect from_op="Performance (2)" from_port="performance" to_port="performance 1"/>
<connect from_op="Performance (2)" from_port="example set" to_port="test set results"/>
<portSpacing port="source_model" spacing="0"/>
<portSpacing port="source_test set" spacing="0"/>
<portSpacing port="source_through 1" spacing="0"/>
<portSpacing port="sink_test set results" spacing="0"/>
<portSpacing port="sink_performance 1" spacing="0"/>
<portSpacing port="sink_performance 2" spacing="0"/>
</process>
</operator>
<connect from_op="Retrieve Sonar" from_port="output" to_op="Replace Missing Values" to_port="example set input"/>
<connect from_op="Replace Missing Values" from_port="example set output" to_op="Normalize" to_port="example set input"/>
<connect from_op="Normalize" from_port="example set output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="Optimize Selection" to_port="example set in"/>
<connect from_op="Optimize Selection" from_port="example set out" to_op="Cross Validation (2)" to_port="example set"/>
<connect from_op="Cross Validation (2)" from_port="model" to_port="result 1"/>
<connect from_op="Cross Validation (2)" from_port="example set" to_port="result 3"/>
<connect from_op="Cross Validation (2)" from_port="performance 1" to_port="result 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
</process>
</operator>
</process>
But I do not know if my model is right or not?
And how to use ANN or CNN. I did not find an operator with these names ..!
Will it be more accurate with increasing the amount of learning rate and momentum of the neural network and therefore my model?
(One question: Does the number of hidden layers, inputs and outputs make its own operator, or can it be created by me as well?)
And (Is it possible to use zernik moment in RapidMiner?)
Please and ask for help and guidance.
Thankful
please I need your help, thanks!