記事

Embracing an Open and Connected Data Foundation

Modern data architectures should allow for seamless integration and collaboration across sources.

2025年2月6日 5 分で読める

Today’s modern cloud platforms have revolutionized the data ecosystem, enabling exponential growth in AI/ML innovation. Compute resources, storage, networks, and software are readily available, connected, and ready for consumption.

However, every unit of compute, input/output access, or byte moved between cloud service providers (CSPs), cloud regions, and even on-premises locations incurs cost. Traditional siloed architecture design patterns struggle to contain these costs and provide future value.

Companies must rethink how they manage and utilize data. A modern data architecture should be open and connected, allowing for seamless integration and collaboration across multiple data sources.

At Teradata, we advocate for a future-ready architecture that emphasizes:

  • Transitioning from “data as an asset” to trusted “data as a product”
  • Collaborating cost-effectively through a connected data foundation

This approach ensures enterprises can seamlessly integrate AI into decision-making, operations, customer interactions, and business model transformation.

Before we dive into the power of connected data foundations, let’s gain a better understanding of data foundations and data as a product.

What is a data foundation?

A data foundation encompasses the infrastructure, processes, and strategies that support the effective collection, management, storage, organization, and utilization of data within an organization. With a robust data foundation, an organization can derive valuable insights, make informed decisions, and support immediate operational needs and long-term strategic goals.

Benefits and advantages of data foundations

A robust data foundation offers numerous benefits and advantages for organizations:

  • Improved decision-making with accurate, timely, and integrated data 
  • Enhanced data quality that reduces errors 
  • Increased efficiency that streamlines data management 
  • A unified view of data across the organization, fostering consistency and trust 
  • Improved data governance, compliance with regulatory requirements, and data security 
  • Greater scalability and flexibility that supports data integration 
  • Enhanced collaboration and communication across the business

These benefits empower organizations to leverage data more effectively, driving innovation and competitive advantages.

 

Challenges of building a data foundation 

Building a robust data foundation poses several challenges:

  • Data quality: Cleaning data to ensure its accuracy and completeness 
  • Data integration: Merging data from various sources 
  • Data governance: Establishing effective data management policies 
  • Scalability: Ensuring that a data foundation can handle increasing data volumes 
  • Cost: Investing in the time and resources needed 
  • Training: Onboarding staff to new processes 

What is data as a product?

Data as a product is a strategic approach that views data as a trusted asset, rather than a cost center. This involves packaging trusted data for easy access and reuse, enabling cost-effective data collaboration across an organization.

Data products are comprehensive entities; they’re carefully constructed, provisioned for specific purposes, and accompanied by detailed metadata outlining semantics, cost, quality, availability, and performance: 

  • Raw data is non-curated information collected from various sources and maintained in its original format, used by data engineers for assessment and curation to ensure security and discoverability 
  • Discoverable and secured data facilitates trusted data utilization for data science applications, with features for tracking historical changes, ensuring interoperability, and maintaining time series data for comprehensive analysis and integration 
  • Reusable and governed data is essential for data and business analysts, integrating core business concepts and ensuring consistency and efficiency in data utilization across domains 
  • Highly consumable data is a specialized package designed for specific analytical purposes, offering targeted insights for AI and decision-making—but with restricted applicability beyond their initial use cases. 

What is a connected data foundation?

A connected data foundation optimizes performance for various user groups, ensuring that data is accessible and usable across the organization. Open and connected data platforms, like Teradata VantageCloud Lake, enable the establishment of a single logical connected data store. This store connects data products to required services within the AI/ML ecosystem. Business domains can introduce their own services using standard application programming interfaces (APIs), ensuring that products are secure and cost-effective.

 

Core principles of connected data foundations 

A connected data foundation optimizes workloads for different personas, such as data engineers and data scientists. It ensures that each group’s needs are met from a price-to-performance perspective, aligning with specific service-level agreements (SLAs).

Data products and raw data are placed in the object store and open table format by default, providing maximum flexibility and access. This approach doesn’t require immediate migration of existing data but emphasizes new environments’ setup in object storage for better information sharing and engine operations.

Data products are managed by domain owners who understand and can handle the data effectively. They ensure data is updated within a logical connected data store, using open and optimized formats based on performance and data temperature. Data temperature, indicating how frequently data is accessed, influences its placement and execution.

The connected data foundation is structured into various layers or zones, each providing different levels of data curation, reuse, and access. These layers are deployed to optimized and open storage formats, including open table and open file formats, offering significant flexibility in architectural design. 

Data product placement considerations

Data product placement varies based on access requirements, performance needs, and concurrency demands. Teradata has embraced an open and connected ecosystem, providing data access to multiple data sources, allowing for seamless integration and collaboration across various data products.

Teradata advocates object storage in an open table format as the default placement for data products while maintaining the ability to use different storage layers. This approach offers flexibility in performance and access across all users and technologies in the AI data ecosystem. Optimized formats or storage layers are available to address the most critical performance or concurrency requirements. 

VantageCloud Lake storage options

VantageCloud Lake addresses the full range of enterprise storage preferences: 

  • Block storage: Known for its high performance, block storage utilizes solid state drives (SSDs) to deliver rapid data retrieval with indexing capabilities. This layer is ideal for high-performance transactional and operational intelligence workloads, in which speed and efficiency are paramount. 
  • Object storage: Teradata’s optimized object file system is designed for performance, leveraging a default columnar environment within an object storage system. These formats are tailored for high-speed data retrieval and efficient storage management, catering to workloads that demand quick access and processing. 
  • Open file formats: These formats, including Apache Parquet, Apache Avro, and comma-separated values (CSV), are industry standards typically stored in object stores. They offer flexibility and interoperability, making them a popular choice for many organizations. Their open nature allows for easy integration with various tools and platforms, fostering a versatile data environment. 
  • Open table formats: Open table formats like Apache Iceberg and Delta Lake introduce atomicity, consistency, isolation, and durability (ACID) requirements, ensuring data integrity and consistency. These formats are gaining traction because they combine the benefits of open file formats with robust transactional capabilities, making them suitable for complex data operations. 

 

Each storage type provides a different price-performance profile, enabling alignment of SLAs and budget. Some enterprises opt for a hybrid approach, blending open and optimized formats to balance flexibility and performance. Others commit fully to open formats, gradually transitioning to open table formats for their advanced features.

Teradata’s ability to integrate these diverse storage layers enables seamless access to optimized and external data storage solutions. This flexibility is crucial as technology evolves, ensuring that organizations can adapt and optimize their data strategies.

Innovate and grow with open and connected data foundations

Embracing an open and connected data foundation is essential for modern organizations to compete and scale. By treating data as a product, managing data as an organizational asset, and leveraging connected data platforms like VantageCloud Lake, businesses can ensure seamless integration, collaboration, and performance. As companies innovate and grow, open and connected data foundations will drive their success.

Tags

Barry Silvester について

Barry Silvester is the Senior Manager of Technical Product Marketing and a subject matter expert for VantageCloud Lake and AI Unlimited. With a rich background in Product Management, Barry previously led both software and hardware programs focused on business continuity, security, and Linux operating systems.

Barry Silvesterの投稿一覧はこちら

Nathan Green について

Nathan is the Worldwide Data Architecture Leader for Teradata. He is focused on providing trusted advice to customers on how to effectively build a diverse, sustainable, scalable and connected ecosystem for analytics, given the rapid change and evolution in tools and technologies within the analytics landscape. 

He believes passionately that using Analytics effectively can and will transform an enterprise. Teradata’s software, experience, IP, tools and people are the best in the market and coupled with partner tools, platforms and technologies, can enable new and innovative analytic capabilities. 

Nathan Greenの投稿一覧はこちら

最新情報をお受け取りください

メールアドレスをご登録ください。ブログの最新情報をお届けします。



テラデータはソリューションやセミナーに関する最新情報をメールにてご案内する場合があります。 なお、お送りするメールにあるリンクからいつでも配信停止できます。 以上をご理解・ご同意いただける場合には「はい」を選択ください。

テラデータはお客様の個人情報を、Teradata Global Privacy Statementに従って適切に管理します。