Friday, 27 September 2024

Understanding the Contextual Awareness of data

    Image attribution: Thomas Nordwest, CC BY-SA 4.0, via Wikimedia Commons

  • What sources does this data come from?
  • How was this data collected in the first place?
  • Do you clearly understand the structure and format of your data?
  • What specific business objectives does this data help you achieve?
  • Do you know that there are relationships between this dataset and others, if you do which?
  • What limitations or biases there may exist in this data?
  • How frequently is this data updated or refreshed, is it stale?
  • What context is important to consider when trying to understand this data?
  • What questions are you trying to answer with this data?
  •  
    The above questions can help us achieve data zen, or the contextual awareness we all seek.

    Contextual Awareness of data refers to the ability to understand and interpret data within its relevant situation, including the source, structure, and intended use. It is the 'world of interest' it tries to describe. It goes beyond merely collecting data; it encompasses a comprehensive understanding of what the data represents, where it originates, and how it relates to other datasets. This awareness is crucial for organisations to derive meaningful insights, as data without context can lead to misinterpretation and ineffective decision-making.

    The importance of contextual awareness cannot be overstated. Without a clear understanding of the data’s context, organisations risk making decisions based on incomplete or misleading information. For instance, data collected from disparate sources may appear accurate in isolation but may contradict one another when analysed together. By fostering contextual awareness, organisations can ensure they ask the right questions and identify relevant insights, ultimately driving strategic initiatives and operational efficiency.

    Furthermore, contextual awareness enhances data quality assessment and improves the overall data governance framework. When organisations know the context of their data, when they know the semantics or the meaning of their data, they can better assess its reliability and relevance. This clarity allows for more informed decision-making processes, ensuring that data insights are actionable and aligned with business objectives. In today's data-driven landscape, cultivating contextual awareness is not just useful—it is essential for achieving the competitive edge and fostering a data-driven culture within the organisations.

    Several tools like conceptual data models, data profiling, talking to colleagues, data catalogs and ontologies can all help you in the journey of understanding your data.

    So ask loads of questions the next time you see that data set!


    Tuesday, 9 January 2024

    Unraveling Data Architecture: Data Fabric vs. Data Mesh

    In this first post of this year, I would like to talk about modern data architectures and specifically about two prominent models, Data Fabric and Data Mesh. Both are potential solutions for organisations working with complex data and database systems.
    While both try to make our data lives better and try to bring us the abstraction we need when working with data, they are different in their approach.
    But when would be best to use one over the other? Let's try to understand that.
    Definitions Data Fabric and Data Mesh
    Some definitions first. I come up with these after reading up on the internet and various books on the subject.
    Data Fabric is a centralised data architecture focusing on creating a unified data environment. In it's environment data fabric integrates different data sources and provides seamless data access, governance and management and often it does this via tools and automation. Keywords to remember from the Data Fabric definition are centralised, unified, seamless data access.

    Data Mesh, is a paradigm shift, it is a completely different way of doing things. Is based on domains, most likely inspired by the Domain Driven Design (DDD) example, is capturing data products in domains by decentralising data ownership (autonomy) and access. That is, the domains are the owners of the data and are responsible themselves for creating and maintaining their own data products. The responsibility is distributed. Key words to take away from Data Mesh are decentralised, domain ownership, autonomy and data products.

    Criteria for Choosing between Data Fabric and Data Mesh

    Data Fabric might be preferred when there is a need for centralised governance and control over the data entities. When a unified view is needed across all database systems and data sources in the organisation or departments. Transactional database workloads are very suitable for this type of data architecture where consistency and integrity in their data operations is paramount.

    Data Mesh can be more suitable for organisational cultures or departments where scalability and agility is a priority. Because of its domain-driven design, Data Mesh might be a better fit for organisations or departments that are decentralised, innovative, and require their business units to swiftly and independently decide how to handle their data. Analytical workloads and other big data workloads may be more suitable to Data Mesh data architectures.

    Ultimately, the decision-making process between these data architectures hinges on the load of data processing and the alignment of diverse data sources. There's no universal solution applicable to all scenarios or one size fits all. Organisations and departments in organisations operate within unique cultural and environmental contexts, often necessitating thorough research, proof of concept, and pattern evaluation to identify the optimal architectural fit.

    Remember, in the realm of data architecture, the data workload reigns supreme - it dictates the design.