The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
How to structure data for cluster analysis
Hi everyone!
I am an MBA student and I would like to cluster companies based on certain text files of their websites to see which are most similar but I don’t know how to structure the data?
Would it best to copy those texts in excel cells (1 cell per text)? Or how do I do this? I want to be able to tokenize and stem the text later on and to use a TF IDF generation.
I also couldn’t find a instruction video that does the cluster analysis with text files but only with excel files with numerical and categorical variables so if anyone knows a good tutorial that would help too.
Thanks in advance already!
I am an MBA student and I would like to cluster companies based on certain text files of their websites to see which are most similar but I don’t know how to structure the data?
Would it best to copy those texts in excel cells (1 cell per text)? Or how do I do this? I want to be able to tokenize and stem the text later on and to use a TF IDF generation.
I also couldn’t find a instruction video that does the cluster analysis with text files but only with excel files with numerical and categorical variables so if anyone knows a good tutorial that would help too.
Thanks in advance already!
Tagged:
0