記事

Embracing an Open and Connected Data Foundation

Learn more about the importance of an open and connected data foundation, data as a product and Teradata VantageCloud Lake storage options.

2024年9月26日 5 分で読める

Today’s modern cloud platforms have revolutionized the data ecosystem, enabling exponential growth in AI/ML innovation. Servers, storage, networks, and software are readily available, connected, and ready for consumption. However, with great power comes great responsibility, as every CPU cycle, IO operation, and byte of data stored or moved incurs charges. Traditional siloed architecture patterns struggle to contain these costs and provide value for the future.

With this mind, companies are increasingly recognizing that their existing data architectures are inadequate for future needs. They must rethink how they manage and utilize data. A modern data architecture should be open and connected, allowing for seamless integration and collaboration across multiple data sources.

Re-evaluating their architecture will challenge organizations due to increased complexity, redundancy, and cost. At Teradata, we advocate for a future-ready architecture that emphasizes: 

  • Transitioning from “data as an asset” to trusted “Data as a Product” 
  • Cost-effective collaboration through a Connected Data Foundation 

This approach ensures enterprises can seamlessly integrate AI into decision-making, operations, customer interactions, and business model transformation.

Data as a product 

Organizations are shifting from viewing data merely as an asset to treating it as a trusted product. This involves packaging trusted data for easy access and reuse for cost-effective data collaboration across different business domains such as sales or marketing.

Data products are comprehensive entities that are carefully constructed, provisioned for specific purposes, and accompanied by detailed metadata that outlines semantics, cost, quality, availability, and performance.  

  • Raw data is uncurated information collected from various sources, used by data engineers for assessment and curation to ensure security and discoverability, while maintaining its original format.  
  • Discoverable and secured data: Data products designed for discoverability and security facilitate trusted data utilization for various data science applications, with features for tracking historical changes, ensuring interoperability, and maintaining time series data for comprehensive analysis and integration.  
  • Reusable and governed data: Reusable and governed data products are essential for data and business analysts, integrating core business concepts and ensuring consistency and efficiency in data utilization across various domains.  
  • Highly consumable data: Highly consumable data products are specialized packages designed for specific analytical purposes, offering targeted insights for AI and decision-making but with restricted applicability beyond their initial use cases. 

The connected data foundation 

A connected data foundation optimizes performance for various user groups, ensuring that data is accessible and usable across the organization. Open and connected data platforms, like Teradata VantageCloud Lake, enable the establishment of a single logical Connected Data Store. This store connects data products to any required services within the AI/ML Ecosystem. Business domains can introduce their own services using standard APIs, ensuring that produced products are secure and cost-effective.

Core principles of the connected data foundation 

  1. Optimization of workloads and personas: The foundation optimizes workloads for different user groups, such as data engineers and data scientists. It ensures that each group’s needs are met from a price performance perspective, aligning with their specific service level agreements (SLAs). 
  2. Default placement in Object Store and OpenTable Format: Data products and raw data are placed in the object store and OpenTable format by default, providing maximum flexibility and access. This approach doesn’t require immediate migration of existing data but emphasizes new environments’ setup in object storage for better information sharing and engine operations. 
  3. Ownership and management by domain owners: Data products are managed by domain owners who understand and possess the skills to handle the data effectively. They ensure data is updated within a logical connected data store, using open and optimized formats based on performance and data temperature. Data temperature, indicating how frequently data is accessed, influences its placement and execution. 

The Connected Data Foundation is structured into various layers or zones, each providing different levels of data curation, reuse, and access. These layers are deployed onto optimized and open storage formats, including OpenTable and open file formats, offering significant flexibility in architectural design. 

Data product placement considerations

Data product placement varies based on access requirements, performance needs, and concurrency demands. Teradata has embraced an open and connected ecosystem, providing data access to multiple data sources, allowing for seamless integration and collaboration across various data products. 

Teradata advocates object storage in an open table format as the default placement for data products while maintaining the ability to use different storage layers. This approach offers flexibility in performance and access across all users and technologies in the AI Data ecosystem. Optimized formats or storage layers are available to address the most critical performance or concurrency requirements. 

VantageCloud Lake storage options 

VantageCloud Lake addresses the full range of enterprise preferences when it comes to storage options. Each storage type provides a different price / performance profile, enabling alignment of service level agreements and budget. Some enterprises opt for a hybrid approach, blending open and optimized formats to balance flexibility and performance. Others commit fully to open formats, gradually transitioning to OpenTable formats for their advanced features.

  • Open File Formats: These formats, including Parquet, Avro, and CSV, are industry standards typically stored in object stores. They offer flexibility and interoperability, making them a popular choice for many organizations. The open nature of these formats allows for easy integration with various tools and platforms, fostering a versatile data environment. 
  • Open Table Formats: Building on open file formats, Open Table formats like Iceberg and Delta Lake introduce ACID compliance, ensuring data integrity and consistency. These formats are gaining traction as they combine the benefits of open file formats with robust transactional capabilities, making them suitable for complex data operations. 
  • Object File System: Teradata’s optimized, object file system is designed for performance, leveraging a default columnar environment within an object storage system. These formats are tailored for high-speed data retrieval and efficient storage management, catering to workloads that demand quick access and processing. 
  • Block File System: Known for its high performance, Teradata’s block storage utilizes SSDs to deliver rapid data retrieval with indexing capabilities. This layer is ideal for high-performance transactional and operational intelligence workloads, where speed and efficiency are paramount. 

Teradata’s ability to integrate these diverse storage layers allows seamless access to optimized and external data storage solutions. This flexibility is crucial as technology evolves, ensuring that customers can adapt and optimize their data strategies effectively.

Conclusion 

Embracing an open and connected data foundation is essential for modern organizations to stay competitive in today's data-driven world. By treating data as a product and leveraging a connected data platform like Teradata VantageCloud Lake, businesses can ensure seamless integration, collaboration, and optimized performance across various user groups. This approach not only enhances data accessibility and usability but also provides the flexibility needed to adapt to evolving technological landscapes. As companies continue to innovate and grow, a robust and connected data foundation will be the cornerstone of their success. 

 

Tags

Barry Silvester について

Barry Silvester is the Senior Manager of Technical Product Marketing and a subject matter expert for VantageCloud Lake and AI Unlimited. With a rich background in Product Management, Barry previously led both software and hardware programs focused on business continuity, security, and Linux operating systems.

Barry Silvesterの投稿一覧はこちら

Nathan Green について

Nathan is the Worldwide Data Architecture Leader for Teradata. He is focused on providing trusted advice to customers on how to effectively build a diverse, sustainable, scalable and connected ecosystem for analytics, given the rapid change and evolution in tools and technologies within the analytics landscape. 

He believes passionately that using Analytics effectively can and will transform an enterprise. Teradata’s software, experience, IP, tools and people are the best in the market and coupled with partner tools, platforms and technologies, can enable new and innovative analytic capabilities. 

Nathan Greenの投稿一覧はこちら

最新情報をお受け取りください

メールアドレスをご登録ください。ブログの最新情報をお届けします。



テラデータはソリューションやセミナーに関する最新情報をメールにてご案内する場合があります。 なお、お送りするメールにあるリンクからいつでも配信停止できます。 以上をご理解・ご同意いただける場合には「はい」を選択ください。

テラデータはお客様の個人情報を、Teradata Global Privacy Statementに従って適切に管理します。