How do I make existing Process (that referred to an Excel import) now recognise Read Excel pathway?

SkyTraderSkyTrader Member Posts: 88 Contributor II
Hi there, so I started creating processes when watching this video called "Elaborate Your Time Series Analysis."

I now realise that there's a difference between importing an Excel file into RM (which I originally opted for) and referring to an Excel file via an operator like Read Excel and making the file path to the Excel file on your computer.

Problem is all my processes and operators so far refer to an Excel file I physically imported using the import wizard... but I have now deleted that file in the repository because my Excel files will be regularly updated and I realised it'd be better to use "Read Excel" and "Store" and refer to a file on my Macbook that I'll be updating,

So my question is, how or can I make these existing operators I already built now find/point to the Macbook file (instead of pointing to the physically imported file in the repository)?

I have a "Retrieve" (Dow Jones Daily info from an Excel data file) operator in one of the processes before the next one does a Fourier Transform and obviously "
Retrieve" now can't access the original imported file (I deleted it), but the parameters window on the top right will only allow this operator to look in the repository? Do I have to start all over creating these processes and operators?

Also if I use the "Store" operator which is connected to the "Read Excel operator, won't "Store" stop RM reading/accessing the very latest updated version of my Excel file stored on my Macbook (assuming I can repoint these operators to the Macbook file)?

Cheers for any help!

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi!

    The best approach is to create a process with the Read Excel operator set up with the wizard. This will always read the current data from the Excel table and return it as a table on the process output.

    Then you can put this into your other processes using Execute Process (or just dragging and dropping from the repository) and replace your existing Retrieve with the static data. Then you'll always work with the current data.

    Regards,
    Balázs
  • SkyTraderSkyTrader Member Posts: 88 Contributor II
    Thanks @BalazsBarany !

    I've read the "Execute Process" description but the tutorial has completely confused me.

    So I can drop the Read operator into a process and use Execute Process operator?

    And those processes that I'd already created using a direct Excel import will have to have this "Execute Process" added to them, but I'm still not clear how to do that in terms of the order / hierarchy of operators?

    Also:
    "Then you can put this (Read Excel) into your other processes using Execute Process (or just dragging and dropping from the repository) ...."

    Again I'm still not sure of the order / hierarchy of how to do this?

    "and replace your existing Retrieve with the static data."

    Do I physically replace my existing Retrieve with the static data -- which I assume is the Read operator? But then how does Store fit into all this particularly in light of my original question? 

    Thanks again,

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi,

    I was referring to the two ways of executing another process in RapidMiner.
    1. Insert Execute Process, select the process to execute from the repository.
    2. Just drag and drop it from the repository into the process. An Execute Process operator will be automatically set up with the repository path. (If you drag other repository objects, they will be opened with Retrieve, but not processes.)

    Just try it, you will see how it works.

    In the other processes, right now you have Retrieve referring to a static example set in the repository. You want to execute your process with Read Excel instead of this. So you replace Retrieve with the Execute Process that reads the Excel file.

    If you want to make sure that you read the same data in different processes (e. g. if your file is being updated frequently and the processes take long), you should do it a bit differently. 
    One import process: Read Excel => Store
    In the other processes: Retrieve the stored data.

    Then you can control which state of the Excel file is being imported and processed.

    You can watch the first two Academy videos under Data Access and Preparation:
    https://academy.rapidminer.com/learning-paths/get-started-with-rapidminer-and-machine-learning

    Regards,

    Balázs
  • SkyTraderSkyTrader Member Posts: 88 Contributor II
    edited August 2020
    Thanks Balázs, at this early stage of use with RM, and having been able to follow along with Time Series videos reasonably well, I was just looking for a "connect Operator A to Operator B" type answer as you're assuming I can physically connect/find everything which I can't! 

    I've asked a specific question and I don't understand from you answer how to get forward with my project.

    Okay so I've learnt the hard way that following a video "Elaborate Your Time Series Analysis" hasn't taken into account that I'll be constantly updating my Excel file. So wanting to use the processes like the Fourier Transform that I've already created (from watching that video) and now using Read Excel operator I want to be able to use the many processes I created when I used the import Excel into the repository, processes like the Fourier Transform I now have.

    I understand that I must use Execute Process... but please understand the descriptions in RM on the RHS of the window are not alway easy to understand or follow, particularly for this operator:

    Eg: Description 
    "This operator (Execute Process) can be used to embed a complete process definition of a saved process into the current process definition."

    current process definition??

    Which I why I tend to watch videos...

    "1. Insert Execute Process, select the process to execute from the repository."

    I looked up the only video I can find on Execute Process and it's from 2018 and it's already out of date as I don't know how to "select the process to execute from the repository." In the video that's easy because there is: Process and Context, the latter which I can't find? https://www.youtube.com/watch?v=X2HYB2j3AX8 at 2.20secs.

    2. Just drag and drop it from the repository into the process. 
    Drag what into which process?

    I actually think it maybe be quicker to delete the processes I learnt from the Elaborate video (created using the import Excel file into the repository and start again with Read Excel).

    "So you replace Retrieve with the Execute Process that reads the Excel file."

    I get it conceptually, I just am not sure how to physically make these connections when I had this: 



    So the Execute Process operator has the process that Reads Excel attached/imported into it somehow - although I have no idea how to do that still? 

    Thank you for you patience. 

Sign In or Register to comment.