The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
"Extract information from weblog (how to handle 31 text files for 3GB)"
makchishing
Member Posts: 6 Contributor II
Hi all,
I am going to extract the IP and agent information from 31 files which is zipped (around 320MB)
steps as follows,
1 ) unzipped to 3GB text file (seems zipped file cannot be read by rapidminer ???)
2 ) use read server log process ( it works fine for a little files only,
It seems that the process read all files into RAM , but 3 GB text file cannot be handled well.....
)
3) Process : store to repository
4) Process : aggregate
5) Process : export to CSV
can anyone give me tips please ;D
I am going to extract the IP and agent information from 31 files which is zipped (around 320MB)
steps as follows,
1 ) unzipped to 3GB text file (seems zipped file cannot be read by rapidminer ???)
2 ) use read server log process ( it works fine for a little files only,
It seems that the process read all files into RAM , but 3 GB text file cannot be handled well.....
)
3) Process : store to repository
4) Process : aggregate
5) Process : export to CSV
can anyone give me tips please ;D
Tagged:
0
Answers
Save the extracted data in to the repository (part1, part2, ... partn), and when you are done, combine the repository files to make the final entry containing your extracted data.
Actually, I want to do a very simple work, i.e. to read thorugh the 3GB text, extract and aggregate some substring in it.
If the inner process of rapidminer is should be run process by process in RAM....(I mean..must read all text into RAM first),
I would rather to read the 3GB text into database first and aggregate myself.