The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Filtering by term frequency"

samtpfotesamtpfote Member Posts: 1 Learner III
edited June 2019 in Help
Hello everybody,

I would like to get all Terms of a html-collection that appear in more than 99% of the documents.

But how can I:
  -  get the number of documents in my collection and
  -  caluclate the value #Term (in documents )/#documents?

It would be really great if someone could help me!
Tagged:

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hello samtpfote,

    you can use Wordlist to Data to convert the wordlist output of Process Documents to a dataset. Then you can be creative with Generate Attributes and Filter Examples to generate/extract all the information that you need.

    The total number of documents corresponds the the number of examples in the exa output of Process Documents. You can extract that number into a macro with the Extract Macro operator.

    If you have further questions, please come back!

    All the best,
    Marius
Sign In or Register to comment.