IT HAS ALL THE SAME TRAINING CONTENT AS HERE PLUS MUCH MORE.
ENJOY AND HAPPY RAPIDMINING!
@sgenzer, Community Manager
Dealing with really large text / csv files
I have to deal with a huge tab seperated file, containing thousands of lines but also thousands of columns, and the latter is bringing my system to it's knees. It takes ages to load even a small amount of records because the system probably wants to validate all of the content for all of these attributes.
So my idea was to load it as a text file, then chop it into rows using a regex looking for end of line, and then chop into attributes looking for the tab character, no questions asked.
part 1 was easy, and indeed my file now loads into seconds rather than a whole lot of minutes. But I'm not sure how to actually convert the rest of my data to columns using the cut command as this would require me to name all of my columns.
is there a better way to achieve this, or using different logic / operators?