I have a data set that contains about 4000 attributes I'm attempting to cluster the attributes by using K-NN and then connect that up to a decision tree in order to see if it can classify the labels derived from the clustering This is all embedded in a optimize parameters operator which changes the value of k on each run (so for between 2-10 in three steps), The aim being to get the accuracy of the decision tree as high as possible I have installed the Parralell Processing Extension on my computer and was wondering is there anything special i have to do in order for it to process the information across the cores on the PC. I have not been able to get anything back from the experiment because it takes up a huge amount of resources (almost 20 gigs of memory)
you need to use the Loop Parameters (Parallel) operator, however your memory consumption will multiply with the number of threads you use... Anyway, Decision Trees are a really bad choice for very wide data, i.e. with many attributes, plus they are instable and can completely change when the underlying data changes only marginally. Instead, normalize your data and train e.g. a linear svm on it (provided that you have numerical data with two classes). By inspecting the weights that the SVM assigns to the attributes you get a much better idea which attributes have which impact.