Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Extract information from weblog (how to handle 31 text files for 3GB)"
makchishing
Member Posts: 6 Contributor II
Hi all,
I am going to extract the IP and agent information from 31 files which is zipped (around 320MB)
steps as follows,
1 ) unzipped to 3GB text file (seems zipped file cannot be read by rapidminer ???)
2 ) use read server log process ( it works fine for a little files only,
It seems that the process read all files into RAM , but 3 GB text file cannot be handled well.....
)
3) Process : store to repository
4) Process : aggregate
5) Process : export to CSV
can anyone give me tips please ;D
I am going to extract the IP and agent information from 31 files which is zipped (around 320MB)
steps as follows,
1 ) unzipped to 3GB text file (seems zipped file cannot be read by rapidminer ???)
2 ) use read server log process ( it works fine for a little files only,
It seems that the process read all files into RAM , but 3 GB text file cannot be handled well.....
)
3) Process : store to repository
4) Process : aggregate
5) Process : export to CSV
can anyone give me tips please ;D
Tagged:
0
Answers
Save the extracted data in to the repository (part1, part2, ... partn), and when you are done, combine the repository files to make the final entry containing your extracted data.
Actually, I want to do a very simple work, i.e. to read thorugh the 3GB text, extract and aggregate some substring in it.
If the inner process of rapidminer is should be run process by process in RAM....(I mean..must read all text into RAM first),
I would rather to read the 3GB text into database first and aggregate myself.