Data mesh is a new, decentralized approach to data that allows end-users to easily access data where it lives without a data lake or data warehouse. Domain-specific teams manage and serve data as a product to be consumed by others. Its objective is to allow for data products to be created from virtually any data source while minimizing intervention from data engineers.
Data mesh has four principles to achieve this objective. The principles are:
- Domain-Oriented Ownership
- Data as a Product
- Self-Service Data Infrastructure
- Federated Computational Governance
Starburst can be used to achieve a data mesh. The following sections outline how Starburst can be used in alignment with each of the four principles of data mesh.
Domain-Oriented Ownership
In a data mesh, data teams are organized by domain, which is another word for the subject area. Teams publish data products that other teams can access and use to derive their own new data products. Starburst’s goal is to allow teams to focus less on building infrastructure and data pipelines around serving data products and more on using familiar tools such as SQL to prepare data products for end-users.
To achieve this, Starburst provides a large set of connectors that allows each domain to connect to data wherever and in whatever format it may live using a SQL query interface.
Figure 1 shows various example data sources that can be accessed from Starburst.
Data as a Product
After connecting to a data source in Starburst, Starburst allows you to curate data products from it for other users to access.
Users can browse the published data products as shown in Figure 2.
Self-Service Data Infrastructure
Starburst’s SQL query interface allows users to discover, understand, and evaluate the trustworthiness of data products. Figure 3 shows an example of using the SQL query interface to query an Amazon S3-based data product.
Using the SQL query interface, you can also join data products from different technologies together. For example, Figure 4 shows an example of joining together a PostgreSQL-based data product with an Amazon S3-based data product on a common field. The result of this join can be considered a new, derived data product that can also be registered in Starburst.
Federated Computational Governance
Data mesh proposes a federated model for data governance that focuses on shared responsibility between the domains and the central IT organization in order to adhere to governance, risk, and compliance concerns while allowing adequate autonomy for the domains.
Starburst provides connectors and access to various data governance and data catalog tools such as Collibra and Alation to help users discover, understand, and evaluate the trustworthiness of data products.
Starburst also significantly reduces the need to create copies of data between systems as Starburst’s query engine can read across data sources and can replace or reduce a traditional ETL/ELT pipeline. Copying data also requires reapplying entitlements, which can result in potential opportunities for a data breach; with Starburst that risk is minimized simply because fewer copies of the data will exist since data is mostly queried at the source. This concept, known as data minimization, means data privacy, security, and governance are more achievable goals in organizations that embrace Starburst together with data mesh.
Sources
Https://Www.Starburst.Io/Resources/Starburst-Data-Products/
Https://Blog.Starburst.Io/Data-Mesh-And-Starburst-Domain-Oriented-Ownership-Architecture
Https://Blog.Starburst.Io/Data-Mesh-And-Starburst-Data-As-A-Product
Https://Blog.Starburst.Io/Data-Mesh-Starburst-Self-Service-Data-Infrastructure
Https://Blog.Starburst.Io/Data-Mesh-Federated-Computational-Governance