Contact Qluster

Phone: 424.272.8920
Hours: 10am PST - 6pm PST
Thank you! Your submission has been received!
Looks like you didn't complete all of the required fields
Close

Problem

Qluster is designed to solve the unruly file ingestion problem.

1. Incoming filedata structures are highly varied and change unexpectedly
2. Expensive engineering resources are spent on brittle traditional data pipeline approaches
3. Data receiver bears the onus of cleaning and consolidation of data

Qluster provides a Data Firewall that protects your data pipeline from bad data, reduces your engineering effort, and helps you sleep better at night.

Many production applications still rely on raw data from 3rd-parties or non-tech-savvy companies in the form of tabular data (CSV file, etc) or json streams which can lead to the ingestion of suddenly corrupted files. 

1. These incoming raw data inputs can change without notice frequently, because the sender of data changes their implementation without any communication i.e. by hiring a new contractor.
2. Proper definition of data requirements and validation rules rarely exist, meaning the data mess accumulates by the receiver of data and the trust in the quality of data goes down.
3. Engineering teams find themselves expending effort in a reactive manner to stopgap some of these issues instead of focusing on their core application.

Data can’t reach its full potential usage when it is deemed unreliable. With raw data ingestion, you are lucky if your data pipeline simply breaks by ingesting these suddenly corrupted files. At least you know about it right away. But, more often, the bad data gets through and someone downstream uses this bad data and makes the wrong decisions that cost your company money and your credibility.

Solution

Qluster Data Firewall: The first layer of defense against bad data

1. Qluster sits between the sender and receiver of raw data with minimal change in the existing pipelines. Using machine learning, Qluster learns from good data so it can proactively detect bad data and quarantine it.
2. Using Qluster’s no-code solution, domain knowledge holders directly review and resolve data issues immediately without any need to wait for anybody else i.e. engineers.
3. As data validation rules mature and evolve over time, Qluster provides a workflow for the sender of data to easily view these rules and understand why certain data was quarantined.

Connections

Vendors can use AWS S3, Google Cloud Storage, SFTP, DropBox, Google Drive, Google Sheets, or Restful clients to push data to Qluster.
If you need a connection that you don't see above, feel free to drop us a line. We are here to solve your problems.

Destinations

Qluster supports Postgres, SnowFlake, Redshift, and many other relational databases as the destination.

You can also have Qluster push data to object storage services such as AWS S3, Google Cloud Storage, and others.

For advanced use cases, the data can be pushed to external API's.

A lot of work has gone into making sure that Qluster never locks your Postgres tables and keeps the historic data accessible.

We empower engineers to build sophisticated data cleaning and validation logic in any language and run it as docker images within Qluster.

It can be as easy as getting the input file mounted on the docker image for the custom code to take over or as comprehensive as using Qluster's API directly.

Full Control With Admin Console

Admins have a 360° view on the data pipelines. They can create new datasets, data sources, destinations, etc. They can resolve data issues across data sources, approve database migrations, and view all the data in all destinations.

The admin console provides the answer to who did what when.

User Console

Historically one of the pains of data ingestion has been that the onus of cleaning data is on the recipient of data. Now with Qluster's user interface, we empower anyone including the data senders i.e. vendors to do their fair share of data cleaning! The vendor users can only access their own data and resolve issues specific to their own data.

Notifications

Receive Slack notifications as soon as an issue requires your attention. You can control how often you receive the notifications.

Vendors can receive notifications for their own data as well!

Email and PagerDuty notification integrations are coming soon.

Deployments

Google Cloud

Hosted Qluster on GCP is operational. We support Google Cloud Storage as a data source and a backup storage.

AWS

Hosted Qluster on AWS is operational
for enterprise deployments.
SaaS deployment is in Beta.

Azure

Hosted Qluster on Azure is coming in 2023.

Qluster is brought to you by the creators of DeepDiff

We create solid data tools. DeepDiff has over 6.4 million monthly downloads.
Companies such as Google, Netflix, and Uber are among the major users of DeepDiff .

Uber's Peloton uses DeepDiff

Netflix's Security Monkey uses DeepDiff

Google DeepMind's Pysc2 uses DeepDiff

Cisco's Data Center uses DeepDiff

Ready to see Qluster in action?

Request a demo