"Question about Clustering Data before running a model"

blueearth · September 2012

Hi i have a problem with a biologic data and here i explain it with a simple example
I have 10 proteins that every two protein belong to one organism
for example protein 1&2 belongs to human 3 &4 belong to mouse and so on
I have Five organism which consist my label and my goal is making a model to predicts these five organism
but the problem is when i run this data every proteins is analyzed independently and the final result consist of 10 proteins which belongs to 5 organism while i every two proteins are linked together and they should be analyzed together .....what i want is every two proteins with same organism get into one group and then i get 5 groups which are classified by the organism of my protein get analyzed by model
i wanna know is there any way to cluster these proteins and similar data ?

I

MariusHelf · September 2012

Hi,

can you please post your table structure with one or two rows of example data, and the desired outcome?

andcanyoupleaseusedotsandlinebreaksotherwiseyourpostsareprettyhardtounderstand.

Best,
Marius

blueearth · September 2012

OH sorry

;D Here is a sample list

Type Cluster Length Weight Isoelectric point Aliphatic index

Reston 5. Reston Ebola virus strain Pennsylvania, complete genome 739 83.452 5.18 77.93

Reston 5. Reston Ebola virus strain Pennsylvania, complete genome 329 36.409 7.56 84.498

Zaire 8. Zaire ebolavirus strain Zaire 1995, complete genome 251 28.235 9.88 104.94

Zaire 8. Zaire ebolavirus strain Zaire 1995, complete genome 2212 252.788 8.73 90.018

Sudan 9. Sudan ebolavirus strain Gulu, complete genome 738 81.804 5 80.745

Sudan 9. Sudan ebolavirus strain Gulu, complete genome 329 36.116 7.67 85.441

And What i need is samples in a same cluster be analyzed together

MariusHelf · September 2012

Ok, so if I understand you correct, what you want (in other words) is to join each two adjacent lines.
You can do so by installing the Series extension and using the Windowing operator. Set both window_size and step_size to 2, because you always have 2 lines which belong together.
Maybe you have to add a Select Attributes or some Rename and Set Role operators after the Windowing operator, but that should be pretty straight forward.

Does that operator do what you need?

Best, Marius

blueearth · September 2012

What does Windowing Operator exactly do? i mean what will happen to attributes of two proteins in same cluster ? should i check the single attribute option?
and my second question what if my number of rows is not always 2?
cant it be done by using set role operator and selecting batch or cluster role for cluster column ?

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Question about Clustering Data before running a model"

Answers