Today's global organizations need to implement a data warehousing solution to centrally store and make sense of customer, sales and analytics data housed in various disparate systems. With the wide spread adoption of computing within the enterprise, the rate at which information is being created has grown exponentially.
Bill Inmon defines a data warehouse as "A subject-oriented, integrated, time-variant and non-volatile collection of data in support of management's decision making process."
You can use a data warehouse to analyze a particular subject area. For example, sales can be a particular subject. It integrates data from multiple data sources and thus standardizes the structure and attributes this information to be analyzed in a common format. Business intelligence (BI) is the art of analyzing information in a data warehouse to identify trends and make intelligent decisions.
Data warehouses are typically used to store historical data. In most transactional systems it makes sense to keep only the current or the most relevant data due to database performance reasons. You should never alter the historical data to maintain its integrity.
Data warehouse is an important asset for organizations to maintain efficiency, profitability and gain competitive advantage. Organizations collect data through many touch points - Online, Call Center, Sales Leads, Inventory Management. The data collected has different degrees of value and business relevance. As data is created, it has to be managed through a well-defined Data Life Cycle Process.
OLTP databases are where operational data is stored. OLTP databases can reside in transactional applications such as Enterprise Resource Management (ERP), Customer Relationship Management (CRM), Supply Chain and Point of Sale systems. These OLTP databases are designed for transaction speed and accuracy.
Metadata helps maintain the sanctity and accuracy of data entering into the data lifecycle process. It helps ensure that data is in the right format and relevant. The data cleansing process will go a lot quicker if you do a good job with the metadata to begin with.
Before data enters the data warehouse, the Extraction, Transformation and Loading (ETL) process ensures that the data passes the data quality threshold. ETLs are also responsible for running scheduled tasks that extract data from OLTPs.
The data warehouse repository is the database that stores active data of business value for an organization. The data warehouse modeling design is optimized for data analysis. There are variants of data warehouses - data marts and ODS. Data marts are not physically any different from data warehouses. Think of a data mart as a smaller data warehouse that focuses on a particular department instead of the entire company.
Data warehouses act as a repository for historical data and not always efficient for providing up-to-date analysis. This is where Operational Data Stores (ODS) come in. ODS are used to hold recent data before migration to the Data Warehouse and thus you can tap into them to analyze information that is not in the OLTP but happened recently.
The last and most critical pieces are the front-end applications that business users use to interact with data stored in the repositories. Data Mining is the discovery of useful patterns in data to help with prediction analysis and classification. It helps answer questions such as "What is the likelihood that a customer will migrate to a competitor?"
Online Analytical Processing (OLAP) is used to analyze historical data and slice the business information. Reporting tools are used to keep track of key performance indicators (KPI).
Data Visualization tools are used to display data from the data repository. Often data visualization is combined with Data Mining and OLAP tools. Data Visualization can allow you to manipulate data to show relevancy and patterns.
A data warehouse implementation enables your team to have easy access to information required to identify trends and gain a better understanding about the environment that your business operates in.
Data warehouses also increase the consistency of the data and allow it to be checked over and over to determine how relevant it is. Because most data warehouses are integrated, you can pull data from many different areas of your business, for instance human resources, finance, IT and accounting.