The technology industry increasingly is using a new term – data clouds – to describe the fast-emerging world of big-data management and analytics in the cloud.
However, in a market already crowded with database platforms, data warehouses, data lakes, and other technologies, just how new and different are data clouds?
References to data clouds are popping up everywhere. Oracle offers the Oracle Data Cloud. Cloudera pitches enterprise data clouds. And Pinecone Systems mentioned data clouds when it launched its new vector database in January.
The company making the most noise is Snowflake, which references data clouds more than a dozen times on its website home page. Snowflake has a Data Cloud Academy and even a podcast titled the Rise of the Data Cloud.
What exactly is a data cloud? Each vendor has its own spin, but what data clouds have in common is an all-encompassing data architecture and a business objective to put more data and insights into the hands of more people.
Let’s start with Snowflake, which frames the discussion as “one cloud, many workloads, no data silos.” The Snowflake Data Cloud comprises data warehouses and data lakes, which support data engineering, data science/analytics, and applications. Also, data marketplaces and data sharing are central to the Snowflake model, enabling data to be pulled in from myriad sources and shared far and wide.
Snowflake this week announced several new capabilities for its data cloud, including an expanded data marketplace with more than 500 data listings, new privacy controls and other governance improvements, and a usage dashboard.
Must-have capabilities
Google Cloud is also driving the trend. In May, Google Cloud hosted its first Data Cloud Summit, where is announced new services, in preview now, that can be used by customers to establish data clouds. They are:
- Dataplex, a data fabric for integrating and managing data without duplication or movement, for improved data consistency.
- Datastream, a “change data capture” and replication service for database replication, event-driven actions, and real-time analytics.
- Analytics Hub, for combining data sets and sharing data and insights, including dashboards and machine learning models, inside and outside an organization.
Together, these are among the requisite capabilities for data clouds: integration, replication, and a central source for access and sharing.
Deutsche Bank, Equifax, and Loblaw are early adopters of Google Cloud’s data cloud solutions. Gil Perez, Chief Innovation Officer at Deutsche Bank, said in a statement that a data cloud will help the financial firm “unify data across our entire organization and innovate faster for our customers.”
Unifying data is a resonant theme with data clouds. Sudhir Hasbe, product management leader for Google Cloud Data Analytics, told me in a recent conversation that customers view data clouds as a way to bring together AI, machine learning, business intelligence, and analytical and operational data.
“The way customers are thinking about data clouds is pretty extensive — everything that they’re doing,” he said. “They are looking at it like, there’s a cloud for all of their data and how they can get value out of it. It’s a holistic view of data, not just a platform or infrastructure.”
All data for all users
That’s how data clouds differ from long-established platforms such as data warehouses and data lakes. Data clouds are more comprehensive — the sum total of databases, data warehouses, lakes, and fabrics across an enterprise. Essential components include data integration and replication, data engineering, sharing, and analytics, all overlaid with end-to-end security and governance.
Snowflake sums up the data cloud this way: “For all of your data and all of your users.” (For more on the approach that Snowflake and other vendors are taking, check out this Cloud Wars Live podcast episode.)
Not every cloud database vendor uses “data clouds” in their lexicon. For example, Databricks talks about many of the same underlying technologies and principles, but I’m hard pressed to find specific references to data clouds by Databricks.
And there will be some who consider data clouds little more than a new catch phrase for long-standing data strategies and implementations.
But naysayers may find themselves in the minority as more people rally around the belief that businesses can do more to maximize their use of data — and talk of data clouds as a better and more holistic way to do that.