Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
large data sample handling issue
bingojosjtu
Member Posts: 5 Contributor II
Hi
I encounter a problem recently when using outlier detection funcition (LOF to be specific).
Condition:
My data sample is about 178000 in total samples and around 10-12 attributes.
My computer has 8 GB RAM and i7 2600 CPU. Hard disk enough space.
Scenario:
I let the program run overnight, but the next morning, the program says that it can not handle the process and the computer memory is too small for this task.
it stopped at outlier detection step, which I know is a very slow process but I did not expect it refuse to complete due to memory size.
Question:
My question is, for a given sample size and attributes number, how am I suppose to know the memory requirement or say upper limit of a particular procedure before hand?
Q2: Is there any way to solve this issue other than shrink my data sample size at current stage?
Q3: What if I increase my RAM to 16 or 32 GB, does it help to solve the issue?
BTW, I have submitted the job on the cloud server (32 GB version), hope with the help of your computation source, this issue can be solved.
Thank you!
RMer
I encounter a problem recently when using outlier detection funcition (LOF to be specific).
Condition:
My data sample is about 178000 in total samples and around 10-12 attributes.
My computer has 8 GB RAM and i7 2600 CPU. Hard disk enough space.
Scenario:
I let the program run overnight, but the next morning, the program says that it can not handle the process and the computer memory is too small for this task.
it stopped at outlier detection step, which I know is a very slow process but I did not expect it refuse to complete due to memory size.
Question:
My question is, for a given sample size and attributes number, how am I suppose to know the memory requirement or say upper limit of a particular procedure before hand?
Q2: Is there any way to solve this issue other than shrink my data sample size at current stage?
Q3: What if I increase my RAM to 16 or 32 GB, does it help to solve the issue?
BTW, I have submitted the job on the cloud server (32 GB version), hope with the help of your computation source, this issue can be solved.
Thank you!
RMer
Tagged:
0