I'm back with another knowlege base article. We'd like to present to you one of the many useful features of Old World Computing's Jackhammer Extension, bringing more convenience into handling collections with RapidMiner by indexing them. First, we will discuss
Indexed Collections, and in another article talk about Indexed Models.
The idea behind Indexed Collections is simple, yet powerful: building on the existing object collections functionality of RapidMiner, the Jackhammer Extension enables you to add group information
to the objects, thereby indexing them. This forms a clear structure for your results, making information readily accessible without having to start a cumbersome search through folders until you find what you were looking for.
Comparing the left and right results window, you can see that the results are now available in a much more ordered and structured fashion: instead of having to click your way through many folders, all with the same name, you can
precisely access the correct folder in just one step. This does not only improve the speed with which you find information, it also makes further processing or modeling steps more efficient and more precise.
We will illustrate this feature with an example: The king of Predictia is interested in knowing beforehand how much rain will fall in the upcoming season, for crop planning purposes. Charged
with the task of making predictions, you have obtained centuries worth of reports of precipitation quantities from every corner of the kingdom (Predictia started recording the weather much earlier than other countries) and
entered them into RapidMiner in order to later on construct predictive models upon them. Right now, however, you are drowning in the flood of information, all without any kind of information where or when a certain value was
measured, and it is dawning on you that this data, as plentiful as it may be, will not actually be very useful if you do not add this information.
Without Indexed Collections, you can go two ways: either you add attributes like Location and Month to the data entries, or you use the normal (i.e. not indexed) Collections to group the
information. Both options, however, have disadvantages: when adding the supplement information as attributes, you will always have to run through the entire ExampleSet to find information regarding a certain place or time.
Furthermore, in later analytical steps, you will have to apply extensive filters to make sense of the data. While with collections, you could group your entries, you have no way of knowing which folder is which: “Folder
1, Folder 2, Folder 3” does not reveal much about the content. You will have to click through all of them and always check back with your original data if you would like to use them for your later analysis.
Indexed Collections surpass both of these approaches. Your data is neatly organized into folders for easy access, providing an overview and structure. Because of their clear designations,
you can efficiently find and use relevant information for your analysis. What’s more, you now have direct access to the data for your later analyses without having to filter out unwanted items or loop over the whole
collection: with the operator Select by Key, you can easily and conveniently access exactly what you need.
Combine Indexed Objects: copies the IOObjects of each input with their respective group information into a single indexedIOObjectsContainer.
Extend Indexed Objects: extends the provided indexedIOObjectsContainer by the provided IOObject and group information.
Select by Key: retrieves the IOObject that was assigned to the provided group information.