Qluster connects to all of your supported data sources and loads the data from them into your destination. We recommend Qluster SaaS on Google Cloud if most of your data sources and destinations are on Google Cloud.
Each data source has one or more connectors that run independently, validate and process data and persist the data to the destination.
At the minimum, Qluster needs the following:
NOTE: If you prefer an easier way for the Qluster team to set up your resources, please follow the instructions here.
Qluster's Philosophy about your data:
Your data belongs to you. Therefore, even Qluster's configurations will stay within your infrastructure.
In this guide, we will:
On the Google Cloud console, ensure you are on the correct project, and then go to the SQL page.
If it tells you "In order to create an instance, you have to enable the Compute Engine API first", then click on the "Enable API" button.
Choose an instance ID aka instance name. Don't include sensitive or personally identifiable information in your instance name; it's publicly visible.
keep track of the password you are creating (we will call it the Postgres DB password), and choose the latest Postgres version that pops up in the list.
If you want to save money, you may want to choose a smaller Postgres instance than what Google Cloud Storage is recommending. At the minimum, you will want an instance with 3.75Gb memory and no shared cores.
If you have signed up for Qluster on a multi-tenant Kubernetes deployment, you will want to choose US-West 1 (Oregon) as your region. If you want to save money, choose a single-zone deployment. In that case, please select US-West-1-a as your primary zone.
If you have signed up for Qluster on a dedicated Kubernetes deployment, create the region as you wish.
Click on the "customize your instance" link to expand it. Here you can choose a different instance size depending on your needs. Qluster needs at least an instance with 3.75GB of memory.
Make sure to keep the connection type as "Public IP enabled". You still need to allow specific IP addresses to reach your database, so there is no security concern here.
Next, click on "add flag"
Set the max_connections to 25000 in the dialogue box that pops up.
Click on create instance. It may take up to 20 minutes for the instance to be created.
When Qluster is given Cloud SQL editor role, it can completely manage the configuration database for you and scale it as needed. This option requires the least amount of involvement from your engineering side.
On the Google Cloud console, ensure you are on the correct project, and then go to the IAM page.
It may say "ADD" or "Grant Access" at the top. Click on it.
Enter getqluster.com as the principal. Then click on "select a role".
Enter Cloud SQL Editor as the role. We want to limit this role to only the database you are using for Qluster. So click on "ADD IAM CONDITION".
Enter a title for the condition. Use any title you want, for example "Limit Qluster Access To One DB". The title is only for your own internal usage.
Then click on the condition type >Resource > Name. You may want to use tags in this step instead of the Name. That is up to you. In this tutorial, we will use Name.
Enter a title for the condition and then click on the condition type >Resource > Name. You may want to use tags in this step instead of the Name. That is up to you. In this guide, we will use Name.
Please make sure to choose the "Ends with" as the operator. Then, use the "database instance name" that you chose in Section An the value field. Double check that you have not put any extra spaces in the value field when copy-pasting the database instance name (aka instance ID). Then click Save.
Finally, click on save again.
This section is the same as section B, except Qluster only gets client access. If you have done section B, you still need to do this section too,
On the Google Cloud console, ensure you are on the correct project, and then go to the IAM page.
It may say "ADD" or "Grant Access" at the top. Click on it.
Enter the service account that Qluster has created for you as the principal. If you don't know your service account, contact Qluster support. In this example, we are using a generic service account email address. Once you fill-up the principal, click on "select a role".
Enter Cloud SQL Client as the role. We want to limit this role to only the database you are using for Qluster. So click on "ADD IAM CONDITION".
Enter a title for the condition. Use any title you want, for example "Limit Qluster Access To One DB". The title is only for your own internal usage.
Then click on the condition type >Resource > Name. You may want to use tags in this step instead of the Name. That is up to you. In this tutorial, we will use Name.
Enter a title for the condition and then click on the condition type >Resource > Name. You may want to use tags in this step instead of the Name. That is up to you. In this guide, we will use Name.
Please make sure to choose the "Ends with" as the operator. Then, use the "database instance name" in the value field. Double check that you have not put any extra spaces in the value field when copy-pasting the database instance name (aka instance ID). Then click Save.
Finally, click on save again. Then send a message to your Qluster support and include both the project name and the database instance name.
When Qluster detects new data files, it first encrypts and backs up the raw data on an object storage service such as Google Cloud Storage.
Qluster needs to be able to read, write, and delete objects from this bucket.
Choose a name for your bucket. Such as "my-company-qluster-backup".
You can leave everything else in their default values.
Click on Create.
Click on Confirm if it asks you about no public access to the bucket.
Once the bucket is created, please click on the permissions tab. Then click on the "Grant Access" or "Add Access" that appears.
Add the service account that Qluster provided to you. In this example, we are using a generic service account.
Assign the "Legacy Object Owner" and "Legacy Bucket Owner" roles to this service account. These roles are only assigned to this bucket. If you have other buckets in the same Google Cloud account, Qluster will not have access to them.
Click on Save then.
Qluster pulls data from data sources and pushes them to destinations. The data source is an object storage service such as Google Cloud Storage. You may be asking why Qluster needs a backup storage bucket, which we built in the previous section, and a raw data bucket. The reason is that a data source is generally a storage space where you give your 3rd party write access so they can drop off files for you. We don't want to mix this bucket and where you backup all your raw data gathered across many data sources.
Qluster needs to be able to read and delete objects from this bucket.
The steps to create the raw data bucket are the same as those for creating the backup data bucket above. Please choose a different name for the raw data bucket than the backup bucket.
Once you have created the resources above, please get in touch with Qluster's support with the following information: