Skip to main content

Glossary

Editors Note

Each glossary term should be defined in a short paragraph if possible. Ideally, link back to the chapter that first introduces the concept so that it can be explored in context.

Structured Data

See: Structured Data & Visualisation - Introducing a basic dataset

Feature

A feature is an attribute of data that is used in modelling. In general terms, a property of a record could be called an attribute, but when that attribute is acted on or classified in a way that it can be used in machine learning or model development, it becomes a feature. A database or dataset that restructures or presents data in such a way that the features are exposed is referred to as a feature store. This will have a different structure to something like a star schema, and highlights how the structure or presentation of the same data can be varied enormously depending on how we want to use it.

Anaconda

Anaconda is a distribution of Python and R specifically designed for data science and machine learning. It includes package management and deployment tools and provides a wide range of pre-installed libraries, including pandas, NumPy, and others. Anaconda simplifies environment management and package installation.

How to learn more: Get Started with Anaconda

Data Science

Data science is a broad term that encompasses the study and application of the mathematics, statistics, computer science and technology relating to data.

Data science can be as simple as basic statistics - such as the concept of a 'measure' over data (such as a summation of sales values in a table) or as complex as the theory and technical implementation of AI systems.

As the field has evolved, it has incorporated more advanced techniques and technologies, including machine learning and artificial intelligence, to analyze large datasets for predictive modeling, pattern recognition, and decision-making.

Data science provides the tools, techniques, and theories required to allow us to extract insights from data - from the simple to the highly complex.

TODO

Work in progress - many terms are being defined as I write this content.

  • Measure, e.g. in a fact table. Star schema. Dimension.