The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Answers
as usual you cannot say how things will work before you tested it. One mostly applies the X-Validation to get a performance estimation if it's a classification or regression task. Clusterings might be evaluated using cluster measures. You might optimize the number of dimensions by iterating over this parameter and test every combination. There are operators for this, all starting with Optimize Parameters.
But anyway, I doubt that the SVD will work very well on Text Datasets, simply because it might take much to long time to compute the Singular Value Decomposition of such a huge matrix, as they frequently occur in text mining.
Greetings,
Sebastian
There seems to be a lot of literature about the use of SVD with text, but indeed the time might be prohibitive. Is there a way in RM to get the singular values themselves (I have read one can plot their squares and see where they level off to determine the best # of dimensions)?
Thanks!
I'm not sure about this, but aren't they displayed in the model of the Singular Value Decomposition?
Greetings,
Sebastian
as I saw now, all this information is discarded in RapidMiner and hence currently cannot be shown in the visualization of the preprocessing model. If you take a look at the result of the PCA, it's in deed possible to use these values for displaying. I could add that relative easily I guess, but this and next week, there won't be the time for it. I'm very busy with customer projects, that bye the way are about text mining. So I'm curious about the runtime of SVD with many features and the amount of memory it needs. As far as I can see, it needs the complete matrix, so it will crash with my around 40.000 word attributes. Did you made any experience with that? What about the classification performance, is it worth implementing a special SVD for sparse matrices?
Greetings,
Sebastian
regarding your question: "is it worth implementing a special SVD for sparse matrices?"
I think absolutely YES. it worth for sure. A couple of days ago I tested the SVD operator on my text dataset with 23000 features on a relatively high performance machine. After 10 hours the algorithm was finished!!!
As far as I know, LSA algorithm has tackled with this problem. It just use an approximation of term-document matrix (which is so smaller than the original matrix). So, I kindly suggest that RM team try to embed an LSA operator in RM.
cheers.