The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Question about Clustering Data before running a model"

blueearthblueearth Member Posts: 42 Contributor II
edited June 2019 in Help
Hi i have a problem with a biologic data and here i explain it with a simple example
I have 10 proteins that every two protein belong to one organism
for example protein 1&2 belongs to human 3 &4 belong to mouse and so on
I have Five organism which consist my label and my goal is making a model to predicts these five organism
but the problem is when i run this data every proteins is analyzed independently and the final result consist of 10 proteins which belongs to 5 organism while i every two proteins are linked together and they should be analyzed together .....what i want is  every two proteins with same organism get into one group and then i get 5 groups which are classified by the organism of my protein get analyzed by model
i wanna know is there any way to cluster these proteins and similar data ?

I
Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    can you please post your table structure with one or two rows of example data, and the desired outcome?

    andcanyoupleaseusedotsandlinebreaksotherwiseyourpostsareprettyhardtounderstand.

    Best,
    Marius
  • blueearthblueearth Member Posts: 42 Contributor II
    OH sorry :)   ;D Here is a sample list
    Type Cluster Length Weight Isoelectric point Aliphatic index
    Reston 5. Reston Ebola virus strain Pennsylvania, complete genome 739 83.452 5.18 77.93
    Reston 5. Reston Ebola virus strain Pennsylvania, complete genome 329 36.409 7.56 84.498
    Zaire 8. Zaire ebolavirus strain Zaire 1995, complete genome 251 28.235 9.88 104.94
    Zaire 8. Zaire ebolavirus strain Zaire 1995, complete genome 2212 252.788 8.73 90.018
    Sudan 9. Sudan ebolavirus strain Gulu, complete genome 738 81.804 5 80.745
    Sudan 9. Sudan ebolavirus strain Gulu, complete genome 329 36.116 7.67 85.441

    And What i need is samples in a same cluster be analyzed together
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Ok, so if I understand you correct, what you want (in other words) is to join each two adjacent lines.
    You can do so by installing the Series extension and using the Windowing operator. Set both window_size and step_size to 2, because you always have 2 lines which belong together.
    Maybe you have to add a Select Attributes or some Rename and Set Role operators after the Windowing operator, but that should be pretty straight forward.

    Does that operator do what you need?

    Best, Marius
  • blueearthblueearth Member Posts: 42 Contributor II
    What does Windowing Operator exactly do? i mean what will happen to attributes of two proteins in same cluster ? should i check the single attribute option?
    and my second question what if my number of rows is not always 2?
    cant it be done by using set role operator and selecting batch or cluster role for cluster column ?
Sign In or Register to comment.