Note: Other entities may use the terms ‘Business Glossary’ and ‘Data Glossary’ interchangeably. This is not the same as a ‘Data Dictionary’ (description of a data set or model).
AWS (Amazon Web Services) - A comprehensive cloud computing platform that provides servers, database storage, remote computing, security, and application services. Within AWS there are hundreds of products and other services available to businesses, developers, and analysts.
Source: Amazon Web Services
AWS Terminology - Due to the expansive set of services and toolkits, there are several unique terms and names that may be used by practitioners. Documentation and definitions of these terms can be found in AWS’ external glossary, found here and their document here.
Agile development - productOps uses an Agile development process. Agile consists of biweekly sprints with well-defined tasks that conclude with a demonstration of functional software. Each sprint may contain design, development, testing, operations, and other tasks. productOps provides tools for managing the Agile process that will allow for transparency between us and our clients.
Assets - Any resource owned or controlled by a company that is capable of yielding economic value.
Business equity - The value of assets after deducting all liabilities (i.e. how much a business is worth).
Concentration Ratio - A measurement used to understand the level of competition that is present in an industry or marketplace. This measurement is representative of the size of a company in relation to the industry.
Financial Ratio Analysis - The evaluation and interpretation of a company’s financial data using finance/ accounting ratios. These values may come from balance sheets, income statements, statements of cash flow, statement of retained earnings, etc.
KPI (Key performance indicator)/ KSI (Key Success Indicator) - Metrics identified to measure specific goals for a business. These can be general, or specific to the company/ project. Examples may include revenue, employment statistics, or levels of efficiency.
Market share - The percent of total sales in an industry that can be attributed to a particular company.
Net Present Value - Accounts for the time value of money, it is the difference between cash inflows and outflows. It is used to estimate the profitability of projects or investments.
Revenue/ Top-line Growth- Gross sales/ revenue before any expenses are subtracted.
Revenue/ Bottom Line Growth - Net income after all expenses, another name for what is commonly thought of as profit or net earning.
ROI ( Return on Investment) - A measure of profitability. How much capital (money, assets, time, etc.) is gained when compared to an initial investment
SOW (Statement of Work) - A document provided by productOps to a client, outlining the scope and parameters of a project. This may include key objectives, deliverables, assumptions, cost estimates, and timeline.
WACC (Weighted Average Cost of Capital) - the rate that a company is expected to pay on average to to finance its assets. This measurement is dictated by external market needs, not management. WACC can be used to test if an ROI can exceed or meet the cost of invested capital (equity + debt).
AI (Artificial Intelligence) - the development of computer systems and their ability to perform tasks that have historically required human intelligence. This includes visual perception, speech recognition, decision making, and language translation.
Source: Oxford Languages
AIDC (Automatic Identification and Capture) - A family of technologies that identify, verify, record, communicate, and store information on discrete, packaged, or containerized items. This automated process allows information to be gathered quickly and accurately. Common examples include barcodes, QR codes, and RFID tags.
Behavioral analytics - An area of data analytics that provides insight into the actions of people. This can be applied to the fields of marketing, application development, and security.
Data Aggregation - Compiling data from multiple sources, with the intent of summarizing and combining multiple datasets for processing
Data Architecture - A framework that shows the infrastructure for how data is acquired, transported, stored, queried, and secured.
Data Architecture Document - Diagrams, systems and definitions, ingestion and data transforms, systems of record, entity relationship diagram, etc.
Data Catalog - An organized inventory of data assets within an organization, designed to help users quickly find the most appropriate data for analytical or business purposes. This inventory is made searchable through the use of metadata.
Data Cleansing - The process of detecting and correcting problematic information from a record set, table, or database. Once inaccurate, irrelevant, or incomplete data is identified, it is then replaced, modified, or removed from the set.
Data Platform - a tool used for the collection, storage, and management of data. A data platform (DP) allows for the analysis and generation of insight into operations and improves the overall leverage of an organization’s data.
Data: Semi Structured - Data that contains tags or other markers to separate semantic elements and enforce hierarchies of records, but does not conform to a formal structure of data models. Examples include JSON and XML.
Source: Enterprise Big Data Framework
Data: Structured - The most ‘traditional’ form of data storage, structured data adheres to a pre-defined data model. It conforms to a tabular format with relationships between rows and columns.
Source: Enterprise Big Data Framework
Data: Unstructured - Information that either does not have a pre-defined model or is not organized in a pre-defined manner. Unstructured information is typically text-heavy, but may contain data such as dates, numbers, and facts as well. This results in irregularities and ambiguities that make it difficult to understand using traditional programs as compared to data stored in structured databases. Common examples of unstructured data include audio, video files or No-SQL databases.
Source: Enterprise Big Data Framework
Descriptive Analytics - tracking key performance indicators (KPIs) to understand and/or communicate the current state of business and operations.
Distributed Processing/ Computing - The technique of leveraging multiple processors over a network in order to share data spread workload. Distributed computing offers advantages in scalability, performance, and cost-effectiveness.
Elasticsearch - distributed, free and open search and analytics engine for all types of data, including textual, numerical, geospatial, structured, and unstructured.
ETL (Extract, transform, load) - The process used to extract data from different sources, transform the data into a usable and trusted resource, and then load that data into systems accessible to end-users. Source: Databricks
Future Vision - Roadmap where the system could evolve in the future and quantifiable impacts in the form of metrics wherever possible and appropriate.
Graph Database - A database designed to treat the relationships between data as equally important to the data itself. It is ideal for managing highly- connected data and complex queries.
IoT (Internet of Things)- Physical objects that are able to connect with one another and share data about the way they are used and the environment around them.
Landscape Diagram - Executive operational overview. A client’s business roles & personas in the form of supply and demand, overlaid with major system components.
Linked data - Relationships or connections between structured data from different sources such a database or the web. This connection facilitates the machine-readability of data.
Key Value Stores/ Key Value Database - a simple database that uses an associative array as its fundamental data model. Each key is associated with only one value in a collection. The relationship is referred to as a key-value pair.
Load balancing - The distribution of tasks or traffic over a set of resources. Load balancing increases efficiency by optimizing response times and preventing overloads.
Source: Amazon Web Services
Machine Generated Data - Information automatically generated by a computer process, application, or other mechanism without the active intervention of a human.
Metadata - Data about data. Metadata is a fundamental tool for understanding and managing the data.
ML (Machine Learning) - A component of artificial intelligence, machine learning is the study of computer algorithms that can improve automatically through experience and by the use of data.
M2M (Machine to Machine data)- The underlying system to IoT (Internet of Things), M2M refers to data exchanges between various devices, typically without human participation.
NLP (Natural Language Processing) - The ability of computers to understand text and spoken word with the nuance and understanding innate to human communication. Data scientists use a range of techniques to advance and improve this capability, including sentiment analysis, topic modeling, and keyword extraction algorithms.
Source: Towards Data Science
noSQL - A database whose method for storage and retrieval is modeled in means other than the tabular relations used in relational databases.
ODS (Operational data store) - A database designed to integrate data from multiple sources, for the purpose of performing additional operations on the data, reporting, controls, and operational decision support. Typically designed to contain low-level data with limited history captured in real time or near real time.
OLAP (Online analytical processing) - An approach to answer multi-dimensional analytical queries swiftly in computing. OLAP is a powerful technology for data discovery, including capabilities for limitless report viewing, complex analytical calculations, and predictive “what if” scenario (budget, forecast) planning. OLAP helps companies extract insights from their transaction data, which is then used in data driven decision making.
OLTP (Online Transaction Processing) - A type of data processing that consists of executing a number of transactions occurring concurrently. While OLTP enables the real time execution of large numbers of transactions, OLAP usually involves querying these transactions in a database for analytical purposes.
Predictive Analytics - The use of trend data to assess the likelihood of future outcomes.
Prescriptive Analytics - The process through which past performance and identified issues/ inefficiencies are used to make recommendations for similar situations in the present or future.
Relational Database - A collection of data items with pre-defined relationships between them. With the identifiers of primary and foreign keys, this data can be accessed in many different ways without reorganizing the database tables themselves. Relational databases can be primarily interacted with using SQL (Structured Query Language).
Source: Amazon Web Services
S3 (Amazon Simple Storage Service) - An AWS service that provides scalable object level storage. S3 has many types of storage classes, depending on what data retrieval and availability needs a user may have.
Source: Amazon Web Services
Silos - Occurs when information systems are isolated or incapable of reciprocal operation with related systems. This may occur due to data inconsistencies or lack of information sharing.
SQL (Structured Query Language) - The standard language used to store, manipulate, and retrieve data in a relational database.
Storytelling - The dissemination and presentation of data through a compelling and meaningful narrative. Storytelling puts the data in necessary context and communicates impact.
CCPA (California Consumer Privacy Act) - Consumer privacy rights include:
The right to know about the personal information a business collects about them and how it is used and shared;
The right to delete personal information collected from them (with some exceptions);
The right to opt-out of the sale of their personal information; and
The right to non-discrimination for exercising their CCPA rights.
Businesses are required to give consumers notices explaining their privacy practices. This applies to for-profit businesses with a significant footprint in California (see website for specific requirements).
Source: State of California
Data Governance: The management processes that ensure the consistency and trustworthiness of data, as well as provisioning that it is being used efficiently and effectively. This includes the establishment and adherence to roles, policies, standards, processes, and metrics, and reporting.
DDDM (Data Driven Decision Making) - The process of making organizational decisions using facts, metrics, and data. Using data to guide these strategic decisions ensures alignment with goals, objectives, and initiatives. The process:
Know your mission and identify objectives
Identify data sources
Clean and organize data
Exploration and analysis
Conclusions and insights
Take action/ make decisions
Source: Northeastern University
Data Governance Plan - The internal data standards and policies that ensure the availability, usability, integrity and security of the data in enterprise systems.
GDPR (Global Data Protection Regulation) - International rules relating to individual protections regarding the processing of personal data and its free movement. These regulations apply to the processing of personal data in the context of the activities of an establishment of a controller or a processor in the European Union, regardless of whether the processing takes place in the Union.
Source: GDPR text via info.eu
SSOT (Single Source of Truth, One True Source) - The practice of aggregating the data from many systems within an organization into a single location. Ensures that data is only mastered or edited in one place, and that all decisions are made on the same data.
Business Analyst - Strategic and operational partner. The business analyst combines data science, domain research, client insights, and economics to help identify unique challenges and solutions to key business objectives.
CDO (Chief Data Officer) - Oversees a data governance program. Responsibilities may include:
Governance: Advising on, monitoring, and governing
Operations: Enabling data usability, availability, and efficiency
Innovation: Driving enterprise digital transformation innovation, cost reduction, and revenue generation
Analytics: Supporting analytics and reporting on products, customers, operations, and markets
CDO (Chief Digital Officer) - Helps an organization drive growth through the use of modern online technologies and data. This may include the development and implementation of a digital strategy, and the conversion of traditional or analog processes into more technologically based practices.
CIO (Chief Information Officer) - A member of senior or executive management who oversees information technology and innovation within an organization. The roles and responsibilities of a CIO may also be filled by a CTO (Chief Technology Officer), with the exception that a CIO may place an emphasis on internal technology
CTO (Chief Technology Officer) - A member of senior or executive management who oversees the technical needs and development of an organization. The roles and responsibilities of a CTO may also be filled by a CIO (Chief Information officer), with the exception that a CTO may place an emphasis on external/ client facing technology.
Data Custodian - Responsible for the safe custody, transport, and storage of data, as well as the implementation of rules surrounding governance. While a data Steward is responsible for what is stored in the data field, a custodian oversees the technical environment and database structure.
Source: Carnegie Mellon University
Data Engineer - Responsible for managing data. This includes ingest and transformation of data to prepare it for analytical or operational use.
Data Scientist - Involved in the gathering and curating of data for analysis. This individual may articulate requirements for the collection and structure of information, as well as develop new ways of capturing and creating data. Their goal is to further the knowledge, understanding, and usability of your data.
Data Steward - A data governance role, responsible for ensuring the quality and usability of an organization's data assets. Ensures that an organization’s data is accessible, usable, safe, and trusted.
Source: Science Direct
Engineer - Responsible for building and maintaining the software necessary to interface with data (i.e. a data platform)
Governance Team - Includes data stewards, can include executives and IT - work together to create the standards and policies for governing data, as well as implementation.
Project Manager - The primary contact at productOps for a client. They will maintain adherence to the project schedule and scope, and ensure proper and consistent staffing.
Let Us Solve Your Hardest Problem