Data Warehouse and Data Lake projects have historically been “multi year – multi million” projects done by IT. Change in requirements as we go along are a challenge. Business used to make do with individual data marts to solve their needs. A new breed of innovative business driven agile data warehousing to solve needs of business is evolving with the new paradigms enabled by cloud.
Organizations get challenged to get value from their huge ERP investments and start their data analytics journey first with Excel and then with a BI tool like Qlik, PowerBI or Tableau. As their data grows and there is need to manage newer sources at volume they look for ETL and Datawarehousing. Legacy on premise options included OLTP databases like Oracle which face performance challenges for OLAP loads or expensive columnar stores like Teradata, Exadata, Netezza, Vertica or the more reasonably priced SQL Server.
The organizations invested in traditional data warehouse today face several challenges
- Inability to consume semi structured and unstructured data
- Oversizing to prevent performance bottlenecks
- Large upfront investments and long deployment cycles
These challenges were handled by cloud data warehouses such as AWS Redshift, Azure Synapse and Snowflake. Cloud data warehouses have revolutionized the way organizations handle data. The cloud brings in the elasticity, zero infra hassles and easy administration. What it also does is moving your costs from capex to opex mode. With providers giving country specific data centres, the compliance aspect to is taken care making it more lucrative for organizations to make the shift.
How can such modern cloud data warehouse address above mentioned challenges?
Structured, semi structured, and unstructured data
Today’s organizations have multiple applications such as ERP, CRM, HRMS, SCM, LMS and many more. Some of the applications are cloud based and the only way to get data from such apps would be through APIs. Such data when extracted may not be in structured form and is categorized as semi-structured and could be in form of XML, json, text files etc.
Is the data warehouse capable of storing such data? If not, then probably the business users will struggle to get a 360-degree view of data and miss crucial insights as well as resort back to using Excel. As use cases broaden in an organization with AI/ML being done on images, videos and voice, the unstructured data too has become equally important and ideally your DW should be able to store this.
The ability of some of the modern cloud warehouses to store all kind of data make its an ideal option. Organizations can use such warehouses to address all the possible use cases ranging from reporting, diagnostic, descriptive, predictive, and prescriptive giving them an edge over competition.
Performance bottlenecks
How often have we heard about business users complaining about performance of systems. This leads to lower adoption and great amount of heartburn between IT and business. The reasons for performance could be many ranging from slower queries, wrongly sized hardware, network issues, too many user requests. Some of the performance issues are seen typically during month ends or during certain seasons/occasions.
How does cloud warehouse solve some of the issues? If your organization faces challenges which occur only for short period of times or are infrastructure related, then migration to cloud warehouse makes it all the more meaningful. Cloud is elastic and thus give organizations greater flexibility to scale at a short notice and maintain the performance level. That is where platforms like Amazon Redshift shot to fame and hold formidable positions.
Optimize data warehouse investments with pay as you go
Although all cloud data warehouses are scalable, there are some subtle differences.
The two main components of a cloud data warehouse are storage and compute. In simple words storage is the area where the data rests and compute are the resources consumed to interact with the data. It could be the resources consumed for writing data from source to DW or it could be resources consumed to read the data via a query.
Typically, more the storage, more the compute provisioned and vice versa. Such a solution although pay per use, may not be cost effective because with increasing data you also end up paying for more compute even if you are unlikely to use all of it.
Innovate architectures of cloud data warehouse such as Snowflake provide true pay as you go by separating the storage and compute components. Such a model gives great flexibility to organizations to deal with situations when they experience heavy loads such as month end reporting. This is time when all the teams such as Sales, Finance, HR and others have heavy usage. There are higher number of queries coupled with increased concurrency of users. For most of the data warehouses you need to factor this and have a data warehouse designed to cater for the maximum load.
Let’s see how this can be effectively managed if the storage and compute layers are separate. For storage the costing is proportional to data size and this give a predictability of cost incurred. For special occasions such as month end reporting to cater to the increased queries and concurrency the compute can be increased only for the period when the load is high. Once load subsides the compute resources can be reduced to save costs. Such a model gives greatest flexibility in cost without compromising on performance. The compute layer can be auto switched off when there is no compute request and automatically turned on when there is request.
Other factors to consider while deciding your cloud data warehouse
- Integrations available for analytical and predictive tools (eg. Qlik, PowerBI, DataRobot)
- Ability to store multiple data types (structure, semi-structured, unstructured) as well as ELT/ETL integrations
- Pricing model -Is it true pay as you go or are you paying for unused resources and getting locked in
- Vendor lock-in (how easy is it to move to another cloud provider)
- Maintenance and administration overheads
Conclusion
Cost optimization is a hot topic in boardroom discussions. IT is challenged to reduce cost while still delivering the needs of business in a speedy manner. Leveraging data to achieve this needs all data integrated in a seamless fashion along with external data. ‘Pay as you Go Datawarehousing is the solution for achieving this. You don’t have to worry about making huge upfront payments to software vendors and then waiting and wondering if you will get any benefits. Using the OPEX by the minute model, your benefits and payments are aligned. At the same time you do not have to oversize the purchase for what you will need a year later. Scale up and scale down is within minutes!!!
Cloud Data warehouses are here to stay and thrive faster. Learn More about them