As data continues to grow and diversify, many organisations are finding that traditional methods of managing information are becoming outdated. Data management, storage and analytics has even more so become a priority. Along with the increases in volume of data, storage is more complex and diverse. Businesses are challenged by the disconnected siloes of multiple storage sources, which lead to poor visibility, low performance and limited management capabilities. Data lakes can provide significant value here, making it possible for enterprises to engage new types of analytics like big data and machine learning and to better manage this data complexity.
What Is A Data Lake?
Aberdeen research has found that the amount of data coming into organisations has increased by 25% every year for the last five years. Given how data is transforming and challenging businesses, the need for power that data lakes can bring is more important than ever before. A data lake is essentially a centralised repository that allows you to store all your structured and unstructured data at any scale. You can store your data as-is without having to first structure the data and can run different types of analytics - from dashboards and visualisations to big data processing, real-time analytics, and machine learning to guide better business decisions.
Data Lakes Compared To Data Warehouses
All data repositories have a similar core function: housing data for business reporting and analysis. But their purpose, the types of data they store, where it comes from and who has access to it differs. In general, data comes into these repositories from systems that generate data -- CRM, ERP, HR, financial applications and other sources. The data records created from those systems are applied against business rules and then sent to a data warehouse, data lake or other data storage area. Once all the data from the disparate business applications is collated onto one data platform, it can be used in business analytics tools to identify trends or deliver insights to help make business decisions.
Smaller organisations may require a simple SQL data mart or data stores to manage data, while mid to large organisations, depending on the requirements, may require both a data warehouse and a data lake as they serve different needs, and use cases. This data repository cheat sheet that Tech Target put together is quite useful.
A data warehouse is a database optimised to analyse relational data coming from transactional systems and line of business applications. The data structure, and schema are defined in advance to optimise for fast queries, where the results are typically used for operational reporting and analysis. Data is cleaned, enriched, and transformed so it can act as the “single source of truth” that users can trust.
A data lake is different, because it stores relational data from line of business applications, and non-relational data from mobile apps, IoT devices, and social media. The structure of the data or schema is not defined when data is captured. This means you can store all of your data without careful design or the need to know what questions you might need answers for in the future. Different types of analytics on your data like SQL queries, big data analytics, full text search, real-time analytics, and machine learning can be used to uncover insights.
As businesses with data warehouses see the benefits of data lakes, they are evolving their warehouse or data stores to include data lakes, and enable diverse query capabilities, data science use-cases, and advanced capabilities for discovering new information models. Gartner names this evolution the “Data Management Solution for Analytics” or “DMSA” of which Microsoft Azure is cited as one of the Leaders, noted for its leadership in cloud data management. In fact, according to Gartner, Microsoft in this space is growing at twice the rate of the overall market.
Data Lakes - What Should Be Considered?
As organisations are building data lakes as part of analytics platforms, below are some important capabilities to consider:
- Unlimited storage, scalability & fast performance for analytics data
- Ease of integration & data ingestion with your current architecture
- Look for enterprise grade security (encryption, network level security & access control)
- Affordability is important - cloud-based data lakes are a great solution here
- Ensure the data lake can accommodate your data types & what you want to do with the data
- Ensure you have a strategy and process around data management & data governance as this can be more difficult where data is unstructured
- Consider the type of tools and skills that exist within your business as building & maintaining a data lake is not the same as working with databases, it requires big data architecture expertise
- Consider planning ahead in data lake design - even though not necessary, structuring data schemas upfront can ensure better data quality
Whatever you require, a simple SQL Data Mart or assistance in automating, building or managing a data warehouse or a data lake, Inside Info can assist as Qlik Data Integration specialists.