The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Applying an operation to a large example set
Hi,
I have an example set with 10,000 examples and 3,800 attributes. These are document file names and the TF-IDF values for 3800 terms in those documents. I want to raise each TF-IDF value by the power of 0.75. Is there a simple, fast way to do this?
What I have tried is looping through each of the attributes and generating a new attribute that is the TF-IDF value raised by the power of 0.75, then looping through the resulting collection and using recall, join, and remember operators to join each collection example to the previous ones as I iterate through the loop. The problem is that this slows down and eventually stalls out or crashes as the iterations increase and the joined example set gets larger and larger. So I am wondering if there is some more efficient way to do the (seemingly) simple thing of applying one operation like this to every value in the example set.
I should also mention that I looked at the Generate Function Set operator. This looks like what I want, except that the specific operation I want to do is not included as one of the choices in that operator.
Thanks in advance for your help.
I have an example set with 10,000 examples and 3,800 attributes. These are document file names and the TF-IDF values for 3800 terms in those documents. I want to raise each TF-IDF value by the power of 0.75. Is there a simple, fast way to do this?
What I have tried is looping through each of the attributes and generating a new attribute that is the TF-IDF value raised by the power of 0.75, then looping through the resulting collection and using recall, join, and remember operators to join each collection example to the previous ones as I iterate through the loop. The problem is that this slows down and eventually stalls out or crashes as the iterations increase and the joined example set gets larger and larger. So I am wondering if there is some more efficient way to do the (seemingly) simple thing of applying one operation like this to every value in the example set.
I should also mention that I looked at the Generate Function Set operator. This looks like what I want, except that the specific operation I want to do is not included as one of the choices in that operator.
Thanks in advance for your help.
0
Answers
Groovy is the answer. Use the Script operator with this code. I did an experiment with 10,000 examples by 3,800 attributes and it took 2 minutes on my laptop. Obviously other's results may vary
regards
Andrew
Thanks! I think that will work for me.
mikeb