Embracing an Open and Connected Data Foundation
Learn more about the importance of an open and connected data foundation, data as a product and Teradata VantageCloud Lake storage options.
Today’s modern cloud platforms have revolutionized the data ecosystem, enabling exponential growth in AI/ML innovation. Servers, storage, networks, and software are readily available, connected, and ready for consumption. However, with great power comes great responsibility, as every CPU cycle, IO operation, and byte of data stored or moved incurs charges. Traditional siloed architecture patterns struggle to contain these costs and provide value for the future.
With this mind, companies are increasingly recognizing that their existing data architectures are inadequate for future needs. They must rethink how they manage and utilize data. A modern data architecture should be open and connected, allowing for seamless integration and collaboration across multiple data sources.
Re-evaluating their architecture will challenge organizations due to increased complexity, redundancy, and cost. At Teradata, we advocate for a future-ready architecture that emphasizes:
- Transitioning from “data as an asset” to trusted “Data as a Product”
- Cost-effective collaboration through a Connected Data Foundation
This approach ensures enterprises can seamlessly integrate AI into decision-making, operations, customer interactions, and business model transformation.
Data as a product
Organizations are shifting from viewing data merely as an asset to treating it as a trusted product. This involves packaging trusted data for easy access and reuse for cost-effective data collaboration across different business domains such as sales or marketing.
Data products are comprehensive entities that are carefully constructed, provisioned for specific purposes, and accompanied by detailed metadata that outlines semantics, cost, quality, availability, and performance.
- Raw data is uncurated information collected from various sources, used by data engineers for assessment and curation to ensure security and discoverability, while maintaining its original format.
- Discoverable and secured data: Data products designed for discoverability and security facilitate trusted data utilization for various data science applications, with features for tracking historical changes, ensuring interoperability, and maintaining time series data for comprehensive analysis and integration.
- Reusable and governed data: Reusable and governed data products are essential for data and business analysts, integrating core business concepts and ensuring consistency and efficiency in data utilization across various domains.
- Highly consumable data: Highly consumable data products are specialized packages designed for specific analytical purposes, offering targeted insights for AI and decision-making but with restricted applicability beyond their initial use cases.
The connected data foundation
A connected data foundation optimizes performance for various user groups, ensuring that data is accessible and usable across the organization. Open and connected data platforms, like Teradata VantageCloud Lake, enable the establishment of a single logical Connected Data Store. This store connects data products to any required services within the AI/ML Ecosystem. Business domains can introduce their own services using standard APIs, ensuring that produced products are secure and cost-effective.
Core principles of the connected data foundation
- Optimization of workloads and personas: The foundation optimizes workloads for different user groups, such as data engineers and data scientists. It ensures that each group’s needs are met from a price performance perspective, aligning with their specific service level agreements (SLAs).
- Default placement in Object Store and OpenTable Format: Data products and raw data are placed in the object store and OpenTable format by default, providing maximum flexibility and access. This approach doesn’t require immediate migration of existing data but emphasizes new environments’ setup in object storage for better information sharing and engine operations.
- Ownership and management by domain owners: Data products are managed by domain owners who understand and possess the skills to handle the data effectively. They ensure data is updated within a logical connected data store, using open and optimized formats based on performance and data temperature. Data temperature, indicating how frequently data is accessed, influences its placement and execution.
The Connected Data Foundation is structured into various layers or zones, each providing different levels of data curation, reuse, and access. These layers are deployed onto optimized and open storage formats, including OpenTable and open file formats, offering significant flexibility in architectural design.
Data product placement considerations
Data product placement varies based on access requirements, performance needs, and concurrency demands. Teradata has embraced an open and connected ecosystem, providing data access to multiple data sources, allowing for seamless integration and collaboration across various data products.
Teradata advocates object storage in an open table format as the default placement for data products while maintaining the ability to use different storage layers. This approach offers flexibility in performance and access across all users and technologies in the AI Data ecosystem. Optimized formats or storage layers are available to address the most critical performance or concurrency requirements.
VantageCloud Lake storage options
VantageCloud Lake addresses the full range of enterprise preferences when it comes to storage options. Each storage type provides a different price / performance profile, enabling alignment of service level agreements and budget. Some enterprises opt for a hybrid approach, blending open and optimized formats to balance flexibility and performance. Others commit fully to open formats, gradually transitioning to OpenTable formats for their advanced features.
- Open File Formats: These formats, including Parquet, Avro, and CSV, are industry standards typically stored in object stores. They offer flexibility and interoperability, making them a popular choice for many organizations. The open nature of these formats allows for easy integration with various tools and platforms, fostering a versatile data environment.
- Open Table Formats: Building on open file formats, Open Table formats like Iceberg and Delta Lake introduce ACID compliance, ensuring data integrity and consistency. These formats are gaining traction as they combine the benefits of open file formats with robust transactional capabilities, making them suitable for complex data operations.
- Object File System: Teradata’s optimized, object file system is designed for performance, leveraging a default columnar environment within an object storage system. These formats are tailored for high-speed data retrieval and efficient storage management, catering to workloads that demand quick access and processing.
- Block File System: Known for its high performance, Teradata’s block storage utilizes SSDs to deliver rapid data retrieval with indexing capabilities. This layer is ideal for high-performance transactional and operational intelligence workloads, where speed and efficiency are paramount.
Teradata’s ability to integrate these diverse storage layers allows seamless access to optimized and external data storage solutions. This flexibility is crucial as technology evolves, ensuring that customers can adapt and optimize their data strategies effectively.
Conclusion
Embracing an open and connected data foundation is essential for modern organizations to stay competitive in today's data-driven world. By treating data as a product and leveraging a connected data platform like Teradata VantageCloud Lake, businesses can ensure seamless integration, collaboration, and optimized performance across various user groups. This approach not only enhances data accessibility and usability but also provides the flexibility needed to adapt to evolving technological landscapes. As companies continue to innovate and grow, a robust and connected data foundation will be the cornerstone of their success.
最新情報をお受け取りください
メールアドレスをご登録ください。ブログの最新情報をお届けします。