Creating database connections for RM Studio in AWS Cloud
Hi, we've just started running RM Studio in AWS cloud (EC2 Windows instance) but are having problems creating Orcale databse connections to both our on-premise datawarehouse and datawarehouse running in same virtual private clound (RDS Unix instance). Anybody come across this before? Any ideas on how to get around it are much appreciated? Thanks Mike
Best Answer
-
MichaelWall Member Posts: 9 Contributor II
Just to close this off, I finally resolved this with some help from the Rapidminer Tech guys. There were two issues:
1) The default JDBC URL string for Oracle needed some tweaking to add extra // after the @ and also to change the schema separator to /. I now have a custom Oracle database connection type set up.
2) By default Rapidminer tries to route the connection through a proxy, but all my connections are inside a a corporate network so no proxy required. The various database instances are open to each other, so by being helpful and looking for a proxy Rapidminer was actually stopping the connections from being made. In preferences the proxy needs to be set to Direct (no proxy).
1
Answers
This becomes tricky for sure if you're running AWS and want to connect back to a DB on your local network. I know with Radoop you have to create all these SOCKS proxies and tunnels to connect to AWS and I wonder if that's the same way. Maybe @phellinger can shed some light on this.
Thanks, I suspected it might not be straight forward. We're on a bit of a learning curve with both AWS and RM!
Connecting back to the on-premise Oracle datawarehouse requires network access and access rights for the database user. Probably it is not allowed to connect to the datawarehouse from outside, there can be some firewall or such protection. If the IP and the port is open, then you can check if the database user has privileges to connect from outside.
Connecting to an RDS instance requires proper security group settings on AWS. In the most simple cases you should add the EC2 security group of the RM instance to the RDS security group in order to grant access.
Thanks for the replies. I did manage to sort out the security group issues, so now the EC2 intance does have connectivity to both our Oracle database environments. I installed SQL Developer and can run queries against both environments.
Unfortunately Rapidminer Studio still cannot establish oracle database connections. It gets the error 'IO Error: The Network Adapter could not establish the connection'. The issue has to be with Rapdminer Studio, as SQL Developer is able to make JDBC connections from same box using same database credentials? Is there any way of trouble shooting the network adapter on Rapidminer Studio?
Thanks
Mike
First of all I propose to check the JDBC URL diplayed by the RapidMiner Studio's "Manage Database Connection" dialog . There are a lot of fine tune settings for this that change the URL. Please compare the URL with the URL used by the SQL Developer and try to find the differences. This connection is a plain JDBC connection so there couldn't be too many architectural differences.
Zoltán Sz.
Scott
On the same Windows EC2 instance:
In SQL Developer I have a basic Oracle connection (role is default) : Name: DCA_TKDAP01, Username: XXX, Password: xxx, hostname: nn.nnn.nn.nnn, port: 1521, service name: TKDAP01. The connection details appear as XXX@//nn.nnn.nn.nnn/TKDAP01. This works just fine. The same connection details even work if I switch to using SID: TKDAP01 instead of service name: TKDAP01.
In Rapidminer Studio I have the default Oracle connection set as: Name: DCA_TKDAP01, Database System: Oracle, hostname: nn.nnn.nn.nnn, port: 1521, database scheme: TKDAP01, Username: XXX Password: xxx. The URL details appear as jdbc:oracle:thin:@nn.nnn.nn.nnn:1521:TKDAP01. The credentials I am entering into RM are the same as those that work in SQL Developer, same hostname, same username and password. RM produces an error 'IO Error: The Network Adapter could not establish the connection'.
I've tried creating alternate Oracle database drivers in RM with tweaked URL prefix and or schema separator, but none have worked.
Thanks
Mike
Using command line terminal, I could get in contact with a database on ec2 aws.
Next,
ubuntu@ip-XXX-XX-XX-XXX:~$ psql -U ubuntu -d postgres
psql (10.1)
Type "help" for help.
postgres=> help
You are using psql, the command-line interface to PostgreSQL.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
postgres=> \?
My question, how can I connect this postgres db to my local RM studio considering the fact that I need to login for AWS with key (pem), followed by ssh -i etc.To make it short, a db on ec2 aws; how can I access the database to be able to use it on my local RM studio?
Cheers
Sven
basically what you need is a permanent or semi-permanent connection between your PC and your AWS machine. You can do it with SSH port forwarding, maybe OpenVPN or by exposing the PostgreSQL service on the public IP of your cloud server (not recommended).
If you go the SSH route, you need a port forwarding of port 5432 to localhost:5432.
ssh -L 5432:localhost:5432 yourawsmachine
Then you can configure your RapidMiner Studio to access the database on localhost and port 5432. The SSH "tunnel" needs to be running when you use Studio.
If you get errors about the user not being in pg_hba.conf or something similar, then you need to edit /etc/postgresql/10/main/pg_hba.conf (or similar) to allow connections from localhost with the authentication you are using (probably password, so the entry would be md5 or scram-sha256). But the default configuration should let you in.
Regards,
Balázs
cheers
Sven
I have to admit, it is too complex for me. As scientist, the context is not abnormal and might provide a solution for the future in other domains. I notice a huge explosion of databases integrated in daily clinical practice as eg genomics (http://www.cbioportal.org/). As such my request seems not that extraterrestrial.
The setting
The database (MIMIC-III) is available on AWS.
My amazon ID has been added to the AWS EC2 AMI.
The current zone of this AMI is US-WEST-2 (Oregon).
There is a DB user: mimicuser
Password: xxxxxxx
The ubuntu password is yyyyyyy
The ubuntu user has a password because there are a set of tools that can be used with it. The user by default is “locked”, meaning that the password cannot be sued to authenticate via ssh.
The available tools that use the ubuntu password are:
https://<IP>/rstudio/
https://<IP>/jupyter/
There is also the Querybuilder tool without any limitations running in this server.
https://<IP>/
To access these websites you have to add the port 443 to the network security group rules.
Is there any manual with the steps needed to arrange the connection in order my local RM studio could be used study the db.
Hoping to progress I send you my greetings
Sven