🎉 🎉 RAPIDMINER 9.10 IS OUT!!! 🎉🎉

Download the latest version helping analytics teams accelerate time-to-value for streaming and IIOT use cases.

CLICK HERE TO DOWNLOAD

Google big query can't connect

BautastenBautasten Member Posts: 7 Contributor I
Hi,

I'm trying out RapidMiner 9.5 and need to connect to google BigQuery but having problems or it's because I don't know how, so I'm using the Google cloud services connection and have the in-database extension installed. I create the connection and test it and everything works fine but I don't see have big query tables, I guess they should be below the connection in the repository or need I do something else with the connection?


any ideas?
Tagged:
Jasmine_

Best Answers

  • phellingerphellinger Employee, Member Posts: 101   RM Engineering
    edited April 22 Solution Accepted
    @btibert,

    the Access Scope(s) is a setting in the connection itself. 

    Now that you installed the In-Database Processing extension, you can choose “Google BigQuery Select ...” scope as well for this connection. A single connection can serve acces both to Google Storage and to BigQuery.

    I hope this helps.
    Peter

Answers

  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Administrator, Moderator, Employee, Member, University Professor Posts: 1,963   RM Engineering
    Hi,

    You need to use the connection, please see the documentation here: https://docs.rapidminer.com/latest/studio/connect/database/jdbc/

    Regards,
    Marco
    Jasmine_
  • BautastenBautasten Member Posts: 7 Contributor I
    I don't find any operator that can use the Google cloud services connection to get the data from bigquery (only data storage), is it possible the use that connection or do you need to use a JDBC-connection, sry for the newbie questions...
    Jasmine_
  • Marco_BoeckMarco_Boeck Team Lead Software Engineering Administrator, Moderator, Employee, Member, University Professor Posts: 1,963   RM Engineering
    Hi,

    My apologies, I misread the question. I have pinged someone with experience with those connections.

    Regards,
    Marco
    Jasmine_
  • BautastenBautasten Member Posts: 7 Contributor I
    thx a lot that did the trick!

    Just a follow up question, how do I supply my own sql-query to get biqquery-data into rapid miner?
    Jasmine_
  • phellingerphellinger Employee, Member Posts: 101   RM Engineering
    edited January 2020
    You can use the other In Database operators to filter, aggregate, etc. For custom query, you currently need to create a view in BigQuery that you will see listed in Retrieve after a refresh.

    Best,
    Peter
    Jasmine_
  • BautastenBautasten Member Posts: 7 Contributor I
    OK, thx!
    Jasmine_
  • btibertbtibert Member, University Professor Posts: 116  Guru
    edited April 19
    I had this exact question, but I am still not certain that I follow the solution.  Would it be possible for someone to show worked example in RM 9.9?  Some of my python courses use Google Big Query, and I would love to include similar tasks in a course that I teach with RM. 
  • phellingerphellinger Employee, Member Posts: 101   RM Engineering
    @btibert,

    There is an extension that can be installed from the Marketplace, called In-Database Processing. (Not to be confused with InDatabase Extension, a different product.)

    If you install this extension, you should see many new operators after a Studio restart.
    The In Database Nest meta-operator allows you to use a Google Cloud Services connection (either connect it to the input port or select it using the parameters). You can put a Retrieve (In Database) operator inside the nest. This operator's schema name and table name parameters should list your datasets and tables, respectively (after the validation thread completes, see the progress bar in the bottom right corner).

    Is there a point in this setup that fails in your case? E.g. if your datasets or tables don't show up, does the process (after connecting the outputs) still run if you manually enter valid, existing dataset / table combinations?

    Please let me know.

    Best,
    Peter

    dang
  • btibertbtibert Member, University Professor Posts: 116  Guru
    Thanks for this.  I am certain my Google Connection works because I am able to list my buckets and get files as expected.  I didn't have the extension loaded, so I installed, but I am not able to list my datasets and tables.   I attempted this via a direct connection using the Google source, as well as selecting it as a parameter without the connection.

    When I click refresh to get the datasets and tables, the lower right basically blips, as if it completed very quickly, but there isn't any info in the drop down options.  

    The error that I am getting is:

    Process failed: Insufficient Access Scope(s). Operator expects 'Google BigQuery Select and Create Tables', but the actual is 'Google Storage Read and Write'.

    But I have the operator Retrieve (In Database) within the subprocess of the Nest.

  • btibertbtibert Member, University Professor Posts: 116  Guru
    edited April 22
    Thanks, I am on a different machine, and it worked as expected.  What is interesting is that the existing Google Auth connection on this machine required that I re-auth; that didn't happen on my other Macbook.  Not implying anything, but I will report back tomorrow on the differences and/or if it's as simple as creating a new connection.  I suspect that is the case given flow and two inputs as you noted above.

    Will log a new ticket/case/query if there is something about the other machine.  Thanks for the guidance.
    phellinger
Sign In or Register to comment.