Options

Unable to get data read in when run process executed in AIHub.

caesarcaesar Member Posts: 6 Newbie
edited July 2021 in Help
Hi.

I have tried to follow other questions suggestions / answer, but to no avail.

Is there a set of steps detailed anywhere that I can follow to allow a process developed in Studio to access files when run in AIHub?

Attempt 1: File added through Add Content in the project Contents tab. Once I updated Studio, I saw the file there.  When I drag to the Process editor in Studio, I get a Read File.

Loop Parameters.input 1 (input 1)
Meta data: File
  • Filename: reg1.csv
  • Source: //aihub-thesis/reg1.csv

Generated by: Open File (2).file
Data: Repository location: //aihub-thesis/reg1.csv

In Studio, that errors, saying my process is getting the wrong type of data. Fair enough, I try Read CSV, set some values, then I get Missing Label. Again, I understand why, I have not set a label. However, if I run the Import Configuration Wizard, I am asked to specify a file on my laptop - I am not asked to set parameters based on the Open File operator feeding into Read CSV.

Attempt 2: Add file through Import Data in Studio. All works fine, can change types, etc,....

Process runs fine in Studio.

When I attempt to run in AIHub, it asks me to create a snapshot. I do and in the Contents listing in AIHub Server interface, I can see my file with a rmhdf5table extension.

Immediately upon starting, the request fails with:

The repository did not deliver the requested data. This can be caused by wrong file names, network errors, file system errors or broken entries in the repository.

and

Jul 11, 2021 8:04:40 PM com.rapidminer.tools.ResultService init
INFO: No filename given for result file, using stdout for logging results!
Jul 11, 2021 8:04:40 PM com.rapidminer.execution.jobcontainer.execution.ExecutionProcessListener processStarts
INFO: Execution of process started
Jul 11, 2021 8:04:40 PM com.rapidminer.Process execute
INFO: Process //aihub-thesis/loops_mine_backward.rmp starts
Jul 11, 2021 8:04:40 PM com.rapidminer.execution.jobcontainer.execution.ExecutionProcessListener processStartedOperator
INFO: Started operator : Process
Jul 11, 2021 8:04:40 PM com.rapidminer.execution.jobcontainer.execution.ExecutionProcessListener processStartedOperator
INFO: Started operator : Retrieve multi1-for-aihub
Jul 11, 2021 8:04:40 PM com.rapidminer.execution.jobcontainer.execution.ExecutionProcessListener processEnded
INFO: Execution of process stopped
Jul 11, 2021 8:04:40 PM com.rapidminer.execution.jobcontainer.execution.SimpleExecutor
SEVERE: Cannot retrieve repository data from entry 'multi1-for-aihub'. Reason: Cannot load data from 'multi1-for-aihub': com.rapidminer.versioning.repository.exceptions.DataRetrievalException: com.rapidminer.storage.hdf5.HdfReaderException: No valid HDF5 signature found. Please refer to the 'error.log' file for more details.


What am I doing wrong?

OR

Is there something I can follow that will allow me to run my process?

Below is my file listing. Showing in case this helps to direct me to how to access the data I am trying to use.

Any advice will be appreciated.

Thank you.

Answers

  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    Hi,
    please try to use relative paths, not absolute ones in your retrieve operator.
    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    caesarcaesar Member Posts: 6 Newbie
    edited July 2021
    Hi - thanks for suggestion. Do you mean I should select "Resolve relative to ... "? when I double click the Retrieve operator?

    If so, I am getting the same error.




    By the way, I am using Studio 9.8 plus the 9.8 image on Azure.
  • Options
    MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,507 RM Data Scientist
    yes, that is what I mean, the error should be different now? Please make sure to save the process!

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • Options
    caesarcaesar Member Posts: 6 Newbie
    I have rechecked that I saved my process, created a snapshot and made sure everything is in sync, still no success.

    Could you advise how to:

    1. Import data (which will allow my to set me labels, data types, etc, ...)
    2. Use the appropriate operator to connect to my imported data.

    that will allow it to run in AIHub?

    I ask that in case I have made any mistake in my previous steps - perhaps it is better for me to follow steps you know / expect to work?

    Thanks!
  • Options
    caesarcaesar Member Posts: 6 Newbie
    Can anyone help?
  • Options
    kamolchanok_tankamolchanok_tan Member Posts: 3 Contributor I
    edited September 2021
    Hi Caesar

    I used to have this problem when moving to Projects. I solved the problem by following
    1. reading the data, that you want to retrieve via repository in AI hub project,  using database such as Radoop or JDBC connection. Therefore, it means you need to insert the data to database first.

    2. Use store operator to store the data into repository into Project 
    3. don't forget to check and uncheck the box Resolve relative to "XXX" --> this is tricky, I need to do every time.

    4. Create snapshot  and add it to AI hub
    5. You MUST run the process on AI hub only -->key point is to create repository via Server AI hub 

    Then you will see the history after Run process via Server 

    6. Now you can retrieve the repository on AI hub Project



Sign In or Register to comment.