Random sampling of a large corpus

frankc · September 2014

How can you pick a random sample from a large corpus for files to perform pre-processing and text mining with the text mining extension? Is there an operator that does that?

Frank

homburg · September 2014

Hi frankc,

just a quick question. Do you want to read a random set of files or read all files and shuffle a random set of documents?

Cheers,
Helge

bkriever · October 2014

You should be able to use the Sample operator and select "use local random seed" to select a random sample, similar to a non-text data set.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Random sampling of a large corpus

Answers