You can argue about how much the pandemic had to do with the increasing pace at which Artificial Intelligence (AI) was adopted throughout 2021, but what you cannot argue with is that Covid has pushed leaders to accelerate research and work in this field. Managing uncertainty for the past two years has been a major reason for our clients to keep a data-driven business model as their top strategic priority to stay relevant and competitive, empowering them to actively and effectively respond to rapidly shifting situations.
However, all of us are faced with a myriad of technology solutions and tools of increasing technical complexity. To help navigate this sheer amount of information, I have prepared a brief summary of my own perspective on what lies ahead for the next year.
When putting this article together I found summarising the more than 30 conversations I had when recording our Data Stand-up! podcast really helpful. I spoke with successful entrepreneurs, CIOs, CDOs, Lead Data Scientists from all around the world, and all of them brought a great share of perspectives on the question: Where is data going in 2022 when it comes to supporting business strategies.
So, what does 2022 have in store for us? Let‘s dive in!
1. Data Lake Houses
Putting it simply, there have been two “traditional” ways to operationalise data analytics at a business level in terms of the underlying infrastructure used and the type of data being fed:
- Structured datasets and Data Warehouses: This is about retrieving datasets that display a consistent schema (i.e. data from business applications such as CRMs) that is imported into a Data Warehouse storage solution that then feeds Business Intelligence tools. These “Warehousing architectures” particularly struggle with advanced data use cases. For instance, their inability to store unstructured data for machine learning development is a downside that cannot be overlooked. Furthermore, proprietary Data do not match well with some open-source data science and engineering tools like Spark.
- Unstructured, semi-structured datasets and Data Lakes: Data lakes were designed to store unprocessed data or unstructured data files such as pictures, audio or video that cannot fit as neatly into data warehouses. Retrieving raw data and importing it directly into a Data Lake without any cleansing or pre-processing in between becomes handy when dealing with these files. The majority of data being generated today is unstructured so it is now imperative to use tools that enable processing and storing unstructured sets. Data lake’s drawback is the difficulty in maintaining data quality and governance standards, sometimes becoming “Data Swamps” full of unprocessed information lacking a consistent schema. This makes it difficult to search, find and extract data at will.
The reality is that both scenarios need to “coexist”, integrating and unifying a Data Warehouse and Data Lake becomes a requirement as analytics teams need structured and unstructured data both indexed and stored. Any modern company needs the best of both worlds by building a cost-efficient resilient enterprise ecosystem that flexibly supports its analytical demands. Meaning, any Data Engineer should be able to configure data pipelines and grant retrieval access to Data Scientists regardless of underlying infrastructure in order to perform their downstream analytics job duties. This is the idea and vision behind The“Data Lakehouse”:
A unified architecture that provides the flexibility, cost-efficiency, and ease of use of Data Lakes with the data granularity, consistency, reliability, and durability of data warehouses, enabling subsequent analyses, ML and BI projects.
There are a few providers out there that offer top-notch Data Lakehouse solutions. Databricks seem to be leading the race and is the industry leader as it was the original creator of the Lakehouse architecture (i.e. Delta Lake). Amazon Web Services (AWS) is another winning horse with a Lakehouse architecture (i.e. Lake Formation + AWS Analytics). Snowflake is also a relevant provider of this emerging “hybrid” infrastructure.
I predict that the Data Lakehouse architecture will continue to be in the spotlight in 2022 as companies will also focus on Data Engineering even more than previously. There is already a huge demand for data architects and engineers in charge of platforms, pipelines and DevOps.
2. Low-code and No-code AI. Is it really the future?
Data Science is not just a research field anymore and it has been many years since it was validated as a powerful tool that every area of the business wants a piece of. However, the market continues to struggle to keep up with the filling of new openings as talent demand still exceeds supply.
Low-code or no-code platforms were and still are one of the promising solutions to turn this around as they empower non-technical business professionals to “act” as Data Scientists. Moreover, these tools present an added benefit: More people across the organisation may begin to understand what can be done with data and, therefore, know better what questions can be realistically asked.
Some well-known solutions such as DataRobot, H2O AutoML, BigML or ML Studio allow the development of practical data applications with little to no programming experience but…
Is it realistic for people who haven’t learned how to code to implement functional and safe analytical systems or AI solutions? Yes, but only if these non-technical professionals are guided and supported.
These days you may find a marketing executive building an NLP solution for sentiment analysis or a Hypermarket operations manager building a demand prediction system, but I must share a word of caution based on recent experience. Codeless does not mean maths-less. Background knowledge of the processes and mathematics behind data transformation, feature engineering and algorithms is needed for the correct ideation and implementation of effective solutions.
These days you may find a marketing executive building an NLP solution for sentiment analysis or a Hypermarket operations manager building a demand prediction system, but I must share a word of caution based on recent experience. Codeless does not mean maths-less. Background knowledge of the processes and mathematics behind data transformation, feature engineering and algorithms is needed for the correct ideation and implementation of effective solutions. My take here:
These tools´ adoption will continue to grow and low-code solutions will continue to be a relevant trend in 2022. However, the definition of new roles ( QA, Coaches, Evangelists, etc.) surrounding the adoption of these tools will be needed too.
Many have quickly realised that the supervision and guidance of qualified data professionals is critical, more so when explainable and transparent AI is an upcoming legal prerequisite.
3. Augmented and hybrid human workforce
Employees have been understandably concerned about robots taking over during the last few years, especially when Gartner claimed that one in three jobs will be taken by software or robots, as some form of AI, by 2025. It seems common sense that organisations should have highlighted earlier that AI is only aimed to augment our capabilities, providing us with more time for creative and strategic thinking tasks, and not just replacing people. In my view, Machine Learning will now start to really enhance the lives of employees. Boring and repetitive admin tasks will fade into obscurity and soon be long gone.
I believe that 2022 will be the year when we begin to see that AI, in the form of digital-co workers, is really welcomed by people at organisations. Whether you choose to call them robots, RPA systems, or digital co-workers, AI will allow us all to make quicker decisions, automate processes and process vast amounts of information at scale much faster.
In order to remain competitive, businesses of all kinds will have to start designing a hybrid workforce model where humans and “digital co-workers” work hand in hand.
We should still be realistic about expecting automation to fully replace some jobs, but I do hope that reinvented jobs and new positions will balance out all the jobs lost.
Cultural adoption barriers still pose a major challenge, but despite popular pessimistic beliefs and potential drawbacks, the redefined augmented workforce is one of the key trends to keep an eye on during 2022 and beyond.
4. Efficiency vs complexity
Whilst a huge chunk of the research efforts and R&D data initiatives by FANGs are directed towards pushing the boundaries of narrow AI in the pursuit of General AI, developing, training and running complex models in this pursuit has inevitably had a negative collateral impact on the environment.
Due to the computational power required to fuel some hyper-parameterized models, it is no surprise that data centres are beginning to represent a significant chunk of global CO2 emissions. For reference, back in 2018, the number of parameters in the largest AI models was 94 million parameters and this grew to 1.6 trillion in 2021 as these larger players pushed the boundaries of complexity. Today, these trillions of parameters are language and image or vision-based. Models such as GPT-3 can comprehend natural language, but also require a lot of computational power to function. This has motivated leading organisations to explore how they can effectively reduce their Machine Learning carbon footprint.
Big players have started to look at ways of developing efficient models and this has had an impact in the Data Science community as teams seem to now be looking for simpler models that perform as well as complex ones for solving specific problems.
A relatively simple Bayesian model may sometimes perform as well as a 3D-CNN while using significantly less data and computational power. In this context, “model efficiency” will be another key aspect of modern data science.
5. Multi-purpose modelling
It takes a lot of data sets, hard-to-get talent, costly computing resources and valuable time to ideate, develop and train AI models. Data teams are very familiar with the effort that it takes to deploy a model that works properly and accurately, hence Data Scientists understand that every aspect of the development work should be reapplied if possible in other modelling exercises.
We have seen this happening in many industries and this trend seems to be pointing in the direction of training capable general-purpose models that are able to handle very diverse data sets and therefore solve thousands of different tasks. This is something that may be incremental over the next few years.
These multimodal models could be thought of and designed from the beginning to be highly efficient reapplicable tools.
These AI models would be combining many ideas that may have been pursued independently in the past. For instance, Google is already following this vision in a next-generation kind of data architecture and umbrella that they have named Pathways.
You should not be surprised if you read about substantial progress in the research field of multi-purpose modelling in the next few months.
6. People Analytics
Dissatisfaction with job conditions, reassessments of work-life balance, and lifestyle alterations due to the hardships of the pandemic led to the Great Resignation, an informal name for the widespread phenomenon of hundreds of thousands of workers leaving their jobs during the COVID-19 era. Also called the Big Quit, it has often been mentioned when referring to the US workforce, but this trend is now international. All pandemic effects are still unpredictable but organisations have been forced to wake up and now seem committed to understanding their people.
Companies are looking for effective ways to gain this comprehension of their employees. Many have come to the realisation that People Analytics could be the answer. In my view, there are two main drivers that have encouraged leaders to consider People Analytics:
- The KPIs that define business value have changed during the past years. In the past, it was related to tangible stuff such as warehouse stock, money in the bank, owned real estate, etc., but value nowadays is highly tied to having a talented workforce that can be an industry reference and that nurtures innovation. This relates to the previous trend about workforce changes where creativity will become more and more important hence the need to have a motivated and innovative team that thinks outside the box.
- Data Technology and AI now form the backbone of the strategic decision-making toolkit at most advanced companies.
People analytics has become a data-driven tool that allows businesses to measure and track their workforce behaviour in relation to their strategy.
People analytics is built upon the collection of individual talent data and the subsequent analysis of the same, allowing companies to comprehend the evolving workplace, but also surfacing insights that drive customer behaviour and engagement. Moreover, it assists the management and HR units to manage and steer the holistic people strategy by prescribing future actions. These actions may be related, but not limited to, improving talent-related decisions, improving workforce processes and promoting positive employee experience.
People Analytics was only adopted by large enterprises with big budgets in the past and it has not been until recently that mid-size organisations joined in too. As of 2020, more than 70 percent of organisations were investing in people analytics solutions to integrate resulting insights into their decision-making. I am pretty certain that this percentage will increase significantly during the next months
7. Data marketplaces
If data is now understood as the new oil and the most valuable asset for any company, data marketplaces may become a mainstream way in 2022 when it comes to exchange and trade information.
Even though some companies in specific sectors still jealously guard their data, others have spotted an opportunity in exchanging information. Some platforms such as Snowflake’s Data Marketplace allow businesses to become data providers, enabling them to easily share and monetise large data sets. For enterprises that generate large datasets or highly unique ones as part of their day-to-day activities, some companies may find that it is worthwhile to explore this route as a new way of generating additional revenue.
In contrast, a few years back, it was common that medium and large businesses would fully outsource data analytics projects to an IT provider that would eventually use the 3rd parties data without consent. Now that everyone has understood that data is the most valuable asset, data will be exchanged and shared at will, but always with the expectation of something in return.
Nevertheless, companies that aim to capitalise on this opportunity need to ideate a robust strategy for it by carefully assessing all legal and privacy implications. Similarly, they will have to build processes that automate the required data transformations so that data exports comply with existing regulations
The rise in AI applications will contribute to the widespread adoption of this trend. Complex models require vast amounts of data to be fed and many will also use these exchanges as a way of developing and training models.
2022 might be the year when the well-known statement by the Economist from 2017 about data being the new Oil will come closer to business reality with the first ‘commodity exchanges’.
There are almost certainly more than these 7 trends, but I have chosen to focus on the high-level ones in order to provide a rough prediction of what may shape corporate strategies and business plans around the world. Now to recap the 7 trends that we have discussed and that you could expect to see in the Analytics and the Artificial Intelligence space in 2022:
So what data trends are here to stay and what is coming next?
Truthfully, no one could tell you and this is just my opinion so we will have to play along and see!
- Data Lakehouses as a hybrid architecture that allows efficient processing and analysis of structured, semi-structured and unstructured data sets.
- Low and no-code data solutions will continue to be a way to democratise Data Science, but new supervisor roles may appear around them.
- AI-enhanced workforce will continue to rise where analytical mechanisms and automation are the norm
- Model efficiency and simplicity will be a defining metric more than ever.
- Data Science teams will demonstrate significant interest in >Multi-purpose AI as a way to efficiently reutilise pieces of modelling work from previous tested developments.
- People Analytics will be one of the most sought after data initiatives that can realistically support business goals
- Data Marketplaces and data exchanges will present a new revenue opportunity for businesses that generate large or unique data sets