Process Documents from Files: Include all subdirectories
Hi all,
In the parameters box of "Process Documents from Files" I can set the "text directories".
I have a lot of html files on local hard drive, in a folder called "webpage", with many subfolders (~100). Too many subfolders to add them all separately. I am missing a checkbox that enables to include all the subfolders. Is there a way I can achieve this? I have created a CSV file with columns "class name" and "directory". Would be great if i could import this.
Cheers, Roger
Best Answer
-
Telcontar120 RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
If you don't want to enter the directories manually in that operator (which you can do, it sounds like you understand that), then another option would be to use "Loop Files" which does have a "recursive directory" parameter option and do your document processing inside the loop.
1
Answers
Thank you Brian. Yes, I know I can enter them manually, but I have like 100 subfolders and it would take me too long.
I'll give the "Loop Files" a try!
Here is a hastily cobbled together example showing how you might use loop files with your CSV of directories & class labels.
This might be useful if for some reason your directories might be in different locations rather than all nested nicely.
It uses macros to populate the parameters wanted.
Thank you JEdward! Will give it a try
Brian, it worked nicely. great!
here's my solution (text and terms):
only terms: