The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"RM text mining - NLP?"

dan_agapedan_agape Member Posts: 106 Maven
edited June 2019 in Help

Just a general question: does RM use also Natural Language Processing based techniques for text mining (in addition to the statistical ones)? If not, are they in the attention of the RM team for development in the near future? Thanks.

Best regards
Dan

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Dan,
    in fact we started making the needed architectural changes to the Text Processing Extension two weeks ago and once we have this capabilities we will add more and more NLP functionality to the extension.
    For example we are planning to allow access to UIMA operations as we are doing for Weka.

    Greetings,
      Sebastian
  • dan_agapedan_agape Member Posts: 106 Maven
    Sounds exciting.

    Cheers,
    Dan
  • text_minertext_miner Member Posts: 11 Contributor II
    Sebastian,

    I'm also looking forward to the updated text processing plugin.

    Do you have any other information about what will be new/changed for the updated plugin?  Any new operators outside of the NLP operators?  Also, how extensive are the architectural changes going to be?  I've written some operators that make use of the text processing plugin.  Is it likely that I will need to update my code when the updated plugin is released?  Finally, is there a target date for the release of the updated plugin?

    Thanks!
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    we are going to clarify the usage of token sequences  and texts a little bit. The document should support multiple levels of tokenization in order to not loose information about what sentence the toke belongs to.
    Farther in the future there will be support for hierarchical data in attributes to support parse trees. This will come together with some tree kernels for SVM. Might be SVMstruct will be added as well.

    If you are missing operators and these operators might be useful for someone else, please tell us your ideas or even share your code. We could then make the functionality available for everybody.

    Greetings,
      Sebastian
Sign In or Register to comment.