Bedrock Humanised Intelligence
  • Data by Design
  • Our work
  • Knowledge
    • Podcast
    • Articles
    • La Diaspora
    • La Pipa TV
  • People
    • Team
    • Careers
  • LA PIPA
  • Let’s talk
  • Data by Design
  • Our Work
  • Knowledge
    • Podcasts
    • Articles
    • La Diaspora
    • La Pipa TV
  • People
    • Team
    • Careers
  • La Pipa
Let's talk

Category: Articles

Back to Articles
Abr 29 — 2022

Driving adoption in data science solutions

Driving adoption in data science solutions

Have you ever seen or participated in a project where a great tool was developed and then no one ever used it? It’s a story that is repeated time and time again: when a project is delivered and adoption is not as high as would have been desired. How is it possible if the solution works perfectly and fully meets the specifications?

Maybe because the interface looked confusing to the final users, maybe because they didn´t perceive the value that it would bring, maybe because the users didn’t feel confident about what the solution was doing in the back-end. Or maybe, it is a combination of all these, plus many other reasons. 

Successful data solutions must ensure the involvement of all parties in the development phase or risk becoming “a cool demo collecting dust in a drawer”. Human-centred solutions make all parties feel involved and proud of the process, which naturally drives adoption.

This article includes some steps and thoughts about what’s needed to ensure that a tool has widespread adoption once it’s finished. And by adoption I don’t mean making the use of the tool compulsory (which could also get 100% adoption!) but adoption because the users feel involved in the creation process and see value in the solution.

How to develop meaningful solutions

The starting point should be to understand the tool’s users. Ask the people who are actually going to use the tool: what are your day-to-day problems? Where do you think that there is margin for improvement? Additionally, involve yourself in their processes to really know what they are dealing with; listening to someone’s problem is not the same as actually seeing the problem with your own eyes. This step of the project is crucial, make informed decisions and do not rush them, doing it well the first time saves time and money in the long run.

Once we know the first version of what we want to develop, we need to start actually developing it. As the agile principles say, we should deliver working software frequently and welcome changing requirements. Expectations evolve with business needs, and data driven teams must adapt quickly to ensure they are driving value where needed. Even if no new suggestions are made, keeping everyone updated about the development process is an investment that’s useful in and of itself. A high focus should always be kept on the validation of this code: it’s a two way process, the code should be delivered, the stakeholders should test it, and then feedback should be sent. Otherwise this is rendered almost useless.

If communication with the end user is constant from the beginning and everyone makes sure that they are on the same page, then the requirements are less likely to change. Then if something is found along the way, via constant delivery, it’s corrected early. What’s often most highlighted about this is that it makes the tool better -which is indeed true- but it’s just as important to note that it also helps make everyone feel that they own the solution, so they’ll want to use it, and they’ll also want other people within the organisation to use it.

So in summary, we should involve everyone during the development of the tool and listen to what they think. This way we will get adoption not only by them, but also by everyone in the company.

Highlights for data science solutions

Although in the title I talked about data science solutions, so far everything could apply to any technical (or non-technical) solution. So where does the difference lie when applying this for data science solutions? There are plenty of different factors, and I’d like to highlight these two:

  • Fear of being substituted by an AI that does your job: this is something that I’ve seen in quite a few projects, where the humans who are going to be helped by the AI are scared that they will be replaced. To overcome this they need to understand perfectly what the AI can and cannot do, and why although it’s a great help, it isn’t a substitute for humans. If the users are scared of the tool, adoption will be impossible.
  • The results of these projects are fully dependent on the quality of the data, and this data needs to be provided and explained by humans. Very often, different people and departments need to collaborate in order to get all the data required for a project, so we must involve those people from the beginning and explain the importance of the task.

In summary, the way to get adoption for data science solutions is through constant communication where everyone listens to each other: periodic meetings to explain the status of the project and receive feedback, individual meetings with different stakeholders/departments, and always being reachable by anyone who might have relevant feedback (which is everyone!). This helps develop these projects faster and with better results, both in terms of the quality of the product delivered and in terms of the use of this product.

Back to Articles

Abr 7 — 2022

Outliers in data preprocessing: Spotting the odd one out!

Outliers in data preprocessing: Spotting the odd one out!

More than 2.5 quintillion bytes of data were generated daily in 2020 (source: GS Analytics). To put this in perspective, a quintillion is a million million million or, more simply, a 1 followed by 18 zeros. It is therefore not surprising that a significant amount of this data is subject to errors. Data scientists are aware of this and routinely check their databases for values that stand out from the rest. This process referred to as “outlier identification” in the case of numerical variables, has become a standard step in data preprocessing.

The search for outliers

The search for univariate outliers is quite straightforward. For instance, if we are dealing with human heights and most of the individuals´ measurements are expected to range between 150cm and 190cm, then heights such as 1,70cm and 1700cm must be understood to be annotation errors. Aside from such gross outliers, which should definitely be cleaned when performing data preprocessing tasks, there is still room for outliers that are inherent to the type of data we are dealing with. For instance, some people could be 140cm or 200cm tall. This type of outlier is typically identified with rules of thumb such as the absolute value of the z-score being greater than 3. Unless there is an obvious reason (such as an annotation error), these outliers should not be removed/cleaned in general, still, it is important to identify them and monitor their influence on the modelling task to be performed.

Multivariate outliers

A more difficult problem arises when we are dealing with multivariate data. For example, imagine that we are dealing with human heights and weights and that we have obtained the data represented in the scatterplot below. The individual marked in red is not a univariate outlier in either of the two dimensions separately, however, when jointly considering both height and weight this individual clearly stands out from the rest.

A popular technique for the identification of multivariate outliers is based on the use of the Mahalanobis distance, which is just a measure of how far a point x is from the centre of the data. Mathematically speaking, the formula is as follows:

where mu represents the mean vector (i.e., the centre of the data) and Sigma the covariance matrix, both of them typically being estimated from the data by the sample mean vector and the sample covariance matrix.

Interestingly, the Mahalanobis distance may be used for drawing tolerance ellipses of points that are at a certain Mahalanobis distance from the centre of the data, thus allowing us to easily identify outliers. For instance, returning to the example of human height and weight, it can be seen that the individual marked in red is actually the most outlying point when taking into account the graphical shape of our dataset.

In fact, one could understand the Mahalanobis distance as the multivariate alternative to the z-score. More precisely, ‘being at a Mahalanobis distance d from the centre’ is the multivariate equivalent of ‘being d standard deviations away from the mean’ in the univariate setting. Therefore, under certain assumptions, such as the data being obtained from a multivariate Gaussian distribution, it is possible to estimate the proportion of individuals lying inside and outside a tolerance ellipse. In the case above, we are representing a 95% tolerance ellipse, meaning that around 95% of the data points are expected to lie inside the ellipse if the data is obtained from a multivariate Gaussian distribution.

The identification of multivariate outliers becomes even more problematic as the number of dimensions increases because it is no longer possible to represent the data points in a scatterplot. In such a case, we should rely on two/three-dimensional scatterplots for selected subsets of the variables or for new carefully-constructed variables obtained from dimensional reduction techniques. Quite conveniently, the Mahalanobis distance may still be used as a tool for identifying multivariate outliers in higher dimensions, even when it is no longer possible to draw tolerance ellipses. For this purpose, it is common to find graphics such as the one below, where the indices of the individuals on the dataset are plotted against their corresponding Mahalanobis distances. The blue dashed horizontal line represents the same level as that marked by the tolerance ellipse above. It is easy to spot the three individuals lying outside the tolerance ellipse by looking at the three points above the blue dashed horizontal line and, in particular, the individual marked in red is shown again to clearly stand out from the other data points.

As a drawback of this method for the identification of multivariate outliers, some authors have pointed out that the Mahalanobis distance is itself very influenced by the outliers. For instance, imagine that five additional individuals — also marked in red in the scatterplot below — are added to the dataset. The tolerance ellipse (in red) has now been broadened and contains the individual previously considered as the most outlying. To avoid this problem, we may replace the sample mean vector and the sample covariance matrix in the definition of the Mahalanobis distance by other alternatives that are not strongly influenced by the outliers. A popular option is the Minimum Covariance Estimator (MCD) for jointly estimating the mean vector and the covariance matrix, which will identify a tolerance ellipse that is closer to the original ellipse (the blue one) than to the ellipse heavily influenced by the outliers (the red one).

Another potential drawback for the identification of multivariate outliers is the shape of the dataset since the Mahalanobis distance only takes account of linear relationships between variables. More specifically, the Mahalanobis distance should not be used when there is clear evidence that there exist several clusters of individuals in the data or, more generally, if the shape of the dataset is not somehow elliptical. In this case, we may want to tap into different techniques such as “depth-based” and “density-based” outlier detection techniques.

Conclusion

To summarise, in this article we have seen a prominent technique for outlier identification that should be performed as a data preprocessing task. Additionally, data analysts or data scientists may also be interested in reducing the influence of outliers on the resulting model by considering techniques that are less sensitive to the presence of outliers (for such purposes, the reader is directed to classic books on robust statistics). However, the study of outliers should not end there since it is also important to ultimately analyse the influence of the outliers on the performance of the analytical model. More precisely, one must be careful with the so-called influential points, which are outliers that, when deleted from the dataset, noticeably change the resulting model and its outputs. Further analysis of the reasons why these influential points appear in the dataset must be performed, not only by the data professionals but also by experts with vast specific domain knowledge on the nature of the problem.

Back to Articles

Jan 20 — 2022

7 trends that will define transformational programs and data initiatives in 2022

You can argue about how much the pandemic had to do with the increasing pace at which Artificial Intelligence (AI) was adopted throughout 2021, but what you cannot argue with is that Covid has pushed leaders to accelerate research and work in this field. Managing uncertainty for the past two years has been a major reason for our clients to keep a data-driven business model as their top strategic priority to stay relevant and competitive, empowering them to actively and effectively respond to rapidly shifting situations. However, all of us are faced with a myriad of technology solutions and tools of increasing technical complexity. To help navigate this sheer amount of information, I have prepared a brief summary of my own perspective on what lies ahead for the next year. When putting this article together I found summarising the more than 30 conversations I had when recording our Data Stand-up! podcast really helpful. I spoke with successful entrepreneurs, CIOs, CDOs, Lead Data Scientists from all around the world, and all of them brought a great share of perspectives on the question: Where is data going in 2022 when it comes to supporting business strategies. So, what does 2022 have in store for us? Let‘s dive in!

1. Data Lake Houses

Putting it simply, there have been two “traditional” ways to operationalise data analytics at a business level in terms of the underlying infrastructure used and the type of data being fed:
  • Structured datasets and Data Warehouses: This is about retrieving datasets that display a consistent schema (i.e. data from business applications such as CRMs) that is imported into a Data Warehouse storage solution that then feeds Business Intelligence tools. These “Warehousing architectures” particularly struggle with advanced data use cases. For instance, their inability to store unstructured data for machine learning development is a downside that cannot be overlooked. Furthermore, proprietary Data do not match well with some open-source data science and engineering tools like Spark.
  • Unstructured, semi-structured datasets and Data Lakes: Data lakes were designed to store unprocessed data or unstructured data files such as pictures, audio or video that cannot fit as neatly into data warehouses. Retrieving raw data and importing it directly into a Data Lake without any cleansing or pre-processing in between becomes handy when dealing with these files. The majority of data being generated today is unstructured so it is now imperative to use tools that enable processing and storing unstructured sets. Data lake’s drawback is the difficulty in maintaining data quality and governance standards, sometimes becoming “Data Swamps” full of unprocessed information lacking a consistent schema. This makes it difficult to search, find and extract data at will.
The reality is that both scenarios need to “coexist”, integrating and unifying a Data Warehouse and Data Lake becomes a requirement as analytics teams need structured and unstructured data both indexed and stored. Any modern company needs the best of both worlds by building a cost-efficient resilient enterprise ecosystem that flexibly supports its analytical demands. Meaning, any Data Engineer should be able to configure data pipelines and grant retrieval access to Data Scientists regardless of underlying infrastructure in order to perform their downstream analytics job duties. This is the idea and vision behind The“Data Lakehouse”:
A unified architecture that provides the flexibility, cost-efficiency, and ease of use of Data Lakes with the data granularity, consistency, reliability, and durability of data warehouses, enabling subsequent analyses, ML and BI projects.
There are a few providers out there that offer top-notch Data Lakehouse solutions. Databricks seem to be leading the race and is the industry leader as it was the original creator of the Lakehouse architecture (i.e. Delta Lake). Amazon Web Services (AWS) is another winning horse with a Lakehouse architecture (i.e. Lake Formation + AWS Analytics). Snowflake is also a relevant provider of this emerging “hybrid” infrastructure. I predict that the Data Lakehouse architecture will continue to be in the spotlight in 2022 as companies will also focus on Data Engineering even more than previously. There is already a huge demand for data architects and engineers in charge of platforms, pipelines and DevOps.

2. Low-code and No-code AI. Is it really the future?

Data Science is not just a research field anymore and it has been many years since it was validated as a powerful tool that every area of the business wants a piece of. However, the market continues to struggle to keep up with the filling of new openings as talent demand still exceeds supply. Low-code or no-code platforms were and still are one of the promising solutions to turn this around as they empower non-technical business professionals to “act” as Data Scientists. Moreover, these tools present an added benefit: More people across the organisation may begin to understand what can be done with data and, therefore, know better what questions can be realistically asked. Some well-known solutions such as DataRobot, H2O AutoML, BigML or ML Studio allow the development of practical data applications with little to no programming experience but…
Is it realistic for people who haven’t learned how to code to implement functional and safe analytical systems or AI solutions? Yes, but only if these non-technical professionals are guided and supported.
These days you may find a marketing executive building an NLP solution for sentiment analysis or a Hypermarket operations manager building a demand prediction system, but I must share a word of caution based on recent experience. Codeless does not mean maths-less. Background knowledge of the processes and mathematics behind data transformation, feature engineering and algorithms is needed for the correct ideation and implementation of effective solutions. These days you may find a marketing executive building an NLP solution for sentiment analysis or a Hypermarket operations manager building a demand prediction system, but I must share a word of caution based on recent experience. Codeless does not mean maths-less. Background knowledge of the processes and mathematics behind data transformation, feature engineering and algorithms is needed for the correct ideation and implementation of effective solutions. My take here:
These tools´ adoption will continue to grow and low-code solutions will continue to be a relevant trend in 2022. However, the definition of new roles ( QA, Coaches, Evangelists, etc.) surrounding the adoption of these tools will be needed too.
Many have quickly realised that the supervision and guidance of qualified data professionals is critical, more so when explainable and transparent AI is an upcoming legal prerequisite.

3. Augmented and hybrid human workforce

Employees have been understandably concerned about robots taking over during the last few years, especially when Gartner claimed that one in three jobs will be taken by software or robots, as some form of AI, by 2025. It seems common sense that organisations should have highlighted earlier that AI is only aimed to augment our capabilities, providing us with more time for creative and strategic thinking tasks, and not just replacing people. In my view, Machine Learning will now start to really enhance the lives of employees. Boring and repetitive admin tasks will fade into obscurity and soon be long gone. I believe that 2022 will be the year when we begin to see that AI, in the form of digital-co workers, is really welcomed by people at organisations. Whether you choose to call them robots, RPA systems, or digital co-workers, AI will allow us all to make quicker decisions, automate processes and process vast amounts of information at scale much faster.
In order to remain competitive, businesses of all kinds will have to start designing a hybrid workforce model where humans and “digital co-workers” work hand in hand.
We should still be realistic about expecting automation to fully replace some jobs, but I do hope that reinvented jobs and new positions will balance out all the jobs lost. Cultural adoption barriers still pose a major challenge, but despite popular pessimistic beliefs and potential drawbacks, the redefined augmented workforce is one of the key trends to keep an eye on during 2022 and beyond.

4. Efficiency vs complexity

Whilst a huge chunk of the research efforts and R&D data initiatives by FANGs are directed towards pushing the boundaries of narrow AI in the pursuit of General AI, developing, training and running complex models in this pursuit has inevitably had a negative collateral impact on the environment. Due to the computational power required to fuel some hyper-parameterized models, it is no surprise that data centres are beginning to represent a significant chunk of global CO2 emissions. For reference, back in 2018, the number of parameters in the largest AI models was 94 million parameters and this grew to 1.6 trillion in 2021 as these larger players pushed the boundaries of complexity. Today, these trillions of parameters are language and image or vision-based. Models such as GPT-3 can comprehend natural language, but also require a lot of computational power to function. This has motivated leading organisations to explore how they can effectively reduce their Machine Learning carbon footprint.
Big players have started to look at ways of developing efficient models and this has had an impact in the Data Science community as teams seem to now be looking for simpler models that perform as well as complex ones for solving specific problems.
A relatively simple Bayesian model may sometimes perform as well as a 3D-CNN while using significantly less data and computational power. In this context, “model efficiency” will be another key aspect of modern data science.

5. Multi-purpose modelling

It takes a lot of data sets, hard-to-get talent, costly computing resources and valuable time to ideate, develop and train AI models. Data teams are very familiar with the effort that it takes to deploy a model that works properly and accurately, hence Data Scientists understand that every aspect of the development work should be reapplied if possible in other modelling exercises. We have seen this happening in many industries and this trend seems to be pointing in the direction of training capable general-purpose models that are able to handle very diverse data sets and therefore solve thousands of different tasks. This is something that may be incremental over the next few years.
These multimodal models could be thought of and designed from the beginning to be highly efficient reapplicable tools.
These AI models would be combining many ideas that may have been pursued independently in the past. For instance, Google is already following this vision in a next-generation kind of data architecture and umbrella that they have named Pathways. You should not be surprised if you read about substantial progress in the research field of multi-purpose modelling in the next few months.

6. People Analytics

Dissatisfaction with job conditions, reassessments of work-life balance, and lifestyle alterations due to the hardships of the pandemic led to the Great Resignation, an informal name for the widespread phenomenon of hundreds of thousands of workers leaving their jobs during the COVID-19 era. Also called the Big Quit, it has often been mentioned when referring to the US workforce, but this trend is now international. All pandemic effects are still unpredictable but organisations have been forced to wake up and now seem committed to understanding their people. Companies are looking for effective ways to gain this comprehension of their employees. Many have come to the realisation that People Analytics could be the answer. In my view, there are two main drivers that have encouraged leaders to consider People Analytics:
  • The KPIs that define business value have changed during the past years. In the past, it was related to tangible stuff such as warehouse stock, money in the bank, owned real estate, etc., but value nowadays is highly tied to having a talented workforce that can be an industry reference and that nurtures innovation. This relates to the previous trend about workforce changes where creativity will become more and more important hence the need to have a motivated and innovative team that thinks outside the box.
  • Data Technology and AI now form the backbone of the strategic decision-making toolkit at most advanced companies.
People analytics has become a data-driven tool that allows businesses to measure and track their workforce behaviour in relation to their strategy.
People analytics is built upon the collection of individual talent data and the subsequent analysis of the same, allowing companies to comprehend the evolving workplace, but also surfacing insights that drive customer behaviour and engagement. Moreover, it assists the management and HR units to manage and steer the holistic people strategy by prescribing future actions. These actions may be related, but not limited to, improving talent-related decisions, improving workforce processes and promoting positive employee experience. People Analytics was only adopted by large enterprises with big budgets in the past and it has not been until recently that mid-size organisations joined in too. As of 2020, more than 70 percent of organisations were investing in people analytics solutions to integrate resulting insights into their decision-making. I am pretty certain that this percentage will increase significantly during the next months

7. Data marketplaces

If data is now understood as the new oil and the most valuable asset for any company, data marketplaces may become a mainstream way in 2022 when it comes to exchange and trade information. Even though some companies in specific sectors still jealously guard their data, others have spotted an opportunity in exchanging information. Some platforms such as Snowflake’s Data Marketplace allow businesses to become data providers, enabling them to easily share and monetise large data sets. For enterprises that generate large datasets or highly unique ones as part of their day-to-day activities, some companies may find that it is worthwhile to explore this route as a new way of generating additional revenue. In contrast, a few years back, it was common that medium and large businesses would fully outsource data analytics projects to an IT provider that would eventually use the 3rd parties data without consent. Now that everyone has understood that data is the most valuable asset, data will be exchanged and shared at will, but always with the expectation of something in return. Nevertheless, companies that aim to capitalise on this opportunity need to ideate a robust strategy for it by carefully assessing all legal and privacy implications. Similarly, they will have to build processes that automate the required data transformations so that data exports comply with existing regulations
The rise in AI applications will contribute to the widespread adoption of this trend. Complex models require vast amounts of data to be fed and many will also use these exchanges as a way of developing and training models.
2022 might be the year when the well-known statement by the Economist from 2017 about data being the new Oil will come closer to business reality with the first ‘commodity exchanges’.

Conclusions

There are almost certainly more than these 7 trends, but I have chosen to focus on the high-level ones in order to provide a rough prediction of what may shape corporate strategies and business plans around the world. Now to recap the 7 trends that we have discussed and that you could expect to see in the Analytics and the Artificial Intelligence space in 2022:
  • Data Lakehouses as a hybrid architecture that allows efficient processing and analysis of structured, semi-structured and unstructured data sets.
  • Low and no-code data solutions will continue to be a way to democratise Data Science, but new supervisor roles may appear around them.
  • AI-enhanced workforce will continue to rise where analytical mechanisms and automation are the norm
  • Model efficiency and simplicity will be a defining metric more than ever.
  • Data Science teams will demonstrate significant interest in >Multi-purpose AI as a way to efficiently reutilise pieces of modelling work from previous tested developments.
  • People Analytics will be one of the most sought after data initiatives that can realistically support business goals
  • Data Marketplaces and data exchanges will present a new revenue opportunity for businesses that generate large or unique data sets
  So what data trends are here to stay and what is coming next? Truthfully, no one could tell you and this is just my opinion so we will have to play along and see!
Back to Articles

Jan 11 — 2021

TRANSFORMERS: multi-purpose AI models in disguise (Part 2)

TRANSFORMERS: multi-purpose AI models in disguise

In the first part of this article, we took a look at the Transformer model and its main use cases related to NLP, which have hopefully broadened your understanding of the topic at hand. If you have not read it yet, I suggest you give it a brief glance first since it will help you understand its current standing.

In the second part of this article, we will present novel model architectures and research employing Transformers in several fields unrelated to NLP, as well as showing some code examples of the capabilities of these remarkable new approaches.

 

APPLYING THE TRANSFORMER TO OTHER AI TASKS

As previously mentioned, the Transformer architecture provides a suitable framework designed to take advantage of long-term relationships between words. This allows the model to find patterns and meanings in the sentences, and makes it suited for many tasks in NLP. The most common ones are:

 

  • Text classification into categories, such as obtaining the sentiment of a text
  • Question answering, where the model can extract information from a text when prompted to do so
  • Text generation, such as GPT-3
  • Translation; Google Translate already employs this technology
  • Summarization of a text into few words or sentences

 

After the success of the Transformer model applied to NLP tasks, people began to wonder: If it can find long-term relationships in the data and be trained efficiently, then could it be as efficient in other tasks besides  ? This is the start of the current movement of research, where this model is used as the backbone for many algorithms in AI and machine learning previously dominated by other techniques. Some amazing contributions in other AI fields are:

 

Lip reading and text transcription:

A recurring problem in society is related to text transcription, especially for hearing impaired people. The current advances are divided into two groups: using an audio track and transcribing it, or directly interpreting the words from the person’s lip movements. The latter problem is a lot harder to resolve since many factors are involved.

 

Potentially, this solution could provide help in situations where recording audio is impossible, such as in noisy areas. In this regard, most researchers use CNN or LSTM as the main model interpreting this as a pure Computer Vision task. Recently, some studies such as (https://ieeexplore.ieee.org/document/9172849) have been published where Transformer-based solutions to this problem are presented, providing better results than the current state-of-the-art solutions.

Traffic route prediction at sea:

Figure x: maritime traffic density map as of 2015. (https://www.researchgate.net/figure/2015-worldwide-maritime-traffic-density-map-The-density-is-evaluated-as-the-number-of_fig1_317201419)

 

One of the main focuses of AI models is the prediction of routes, either for individual people or traffic. However, in this regard, the models employed are mostly trained on land traffic since it is easier to obtain data and model it. Regarding sea traffic, the scarcity and dependence on external factors such as weather or sea currents makes it more difficult to provide accurate predictions for the next few hours.

Most of the employed models rely on LSTM or CNN for the same previous reasons, but these models struggle when dealing with long-term predictions and they don’t take into account the specific characteristics of data obtained at sea. A recent study (https://arxiv.org/abs/2109.03958) presents a novel algorithm that takes into account the data’s nuances and provides a vessel trajectory prediction using a Transformer model. The accuracy of the predictions is well above the alternative models available, where long-term predictions are mandatory.

 

Object detection:

This is a subset of Computer Vision and one of the most common AI tasks. In this task, the model can detect certain objects in an image or video and draw a box around them; some common examples are your phone’s face recognition functionality when you take a picture or unlock it, or CCTV detection of license plates.

In this regard, the models that have been employed in the past are mostly based on CNN since these excel at finding relationships in images; the most common ones being SSD and Faster R-CNN. As a result, most algorithms currently used in these tasks have some variation of this model architecture.

However, as was the case for the other tasks, the Transformer architecture has also been experimented with for finding patterns in images. This has lead to several approaches where CNN and Transformer are used jointly, like Facebook’s DETR (https://arxiv.org/abs/2005.12872), or purely Transformer-based architectures like Vision Transformer (https://arxiv.org/abs/2010.11929). The most impactful research in the past few months has been the novel approach of shifted windows in the Swin Transformer (https://arxiv.org/abs/2103.14030), achieving cutting edge results on a number of categories involving image analysis.

LEARN BY SEEING: OBJECT DETECTION WITH DETR

For most of these models, the code and training data is publicly available and open-sourced, which eases their use for inference and fine-tuning. As an example, we will show below how to load and use the DETR model for a specific image.

First, install the dependencies (Transformer, timm) and load an image of a park using its URL:

Figure x: image of pedestrians in a park.

# Install dependencies
!pip install -q transformers
!pip install -q timm

# Load the needed libraries to load images
from PIL import Image
import requests

# In our case, we selected an image of a park
url = ‘https://www.burnaby.ca/sites/default/files/acquiadam/2021-06/Parks-Fraser-Foreshore.jpg’im = Image.open(requests.get(url, stream=True).raw)

# Show the image
im

Then, we apply the feature extractor to resize and normalize the image so the model can interpret it correctly. This will use the simplest DETR model, with the ResNet-50 backbone:

from transformers import DetrFeatureExtractor

feature_extractor = DetrFeatureExtractor.from_pretrained(“facebook/detr-resnet-50”)

encoding = feature_extractor(im, return_tensors=”pt”)

encoding.keys()

Next, load the pre-trained model and pass the image through:

from transformers import DetrForObjectDetection

model = DetrForObjectDetection.from_pretrained(“facebook/detr-resnet-50”)

outputs = model(**encoding)

And that’s it! Now we only have to interpret the results and represent the detected objects with some boxes:

import matplotlib.pyplot as plt

# colors for visualization

COLORS = [[0.000, 0.447, 0.741], [0.850, 0.325, 0.098], [0.929, 0.694, 0.125],[0.494, 0.184, 0.556], [0.466, 0.674, 0.188], [0.301, 0.745, 0.933]]

# Define an auxiliary plotting function
def plot_results(pil_img, prob, boxes):
plt.figure(figsize=(16,10))
plt.imshow(pil_img)

ax = plt.gca()
colors = COLORS * 100

for p, (xmin, ymin, xmax, ymax), c in zip(prob, boxes.tolist(), colors):

ax.add_patch(plt.Rectangle((xmin, ymin), xmax – xmin, ymax -ymin, fill=False, color=c, linewidth=3))

cl = p.argmax()
text = f'{model.config.id2label[cl.item()]}: {p[cl]:0.2f}’
ax.text(xmin, ymin, text, fontsize=15,
bbox=dict(facecolor=’yellow’, alpha=0.5))
plt.axis(‘off’)
plt.show()
import torch

# keep only predictions of queries with 0.9+ confidence (excluding no-object class)
probas = outputs.logits.softmax(-1)[0, :, :-1]
keep = probas.max(-1).values > 0.9

# rescale bounding boxestarget_sizes = torch.tensor(im.size[::-1]).unsqueeze(0)postprocessed_outputs = feature_extractor.post_process(outputs, target_sizes)bboxes_scaled = postprocessed_outputs[0][‘boxes’][keep]# Show the detection results
plot_results(im, probas[keep], bboxes_scaled)

Figure x: image of pedestrians in a park.

The accuracy of these models is remarkable! Even smaller objects, which are harder to detect for the usual neural networks, are identified correctly. Thank you @NielsRogge (https://github.com/NielsRogge) for the awesome implementation (https://github.com/NielsRogge/Transformers-Tutorials) of these models in the Transformers library!

These examples are just the tip of the iceberg of this research movement. The high flexibility of this architecture and the numerous advantages provided are well suited for a number of AI tasks, and multiple advances are being made on a daily basis on multiple fronts. Recently, Facebook AI published a new paper presenting the scalability of these models for CV tasks that has stirred the community quite a bit; you can also check it out here (https://medium.com/syncedreview/a-leap-forward-in-computer-vision-facebook-ai-says-masked-autoencoders-are-scalable-vision-32c08fadd41f).

Will this be the future of all AI models? Is the Transformer the best solution for all tasks, or will it be resigned to its NLP applications? One thing is for sure: for the time being, the Transformer is here to stay!

Back to Articles

Jan 10 — 2021

TRANSFORMERS: multi-purpose AI models in disguise (Part 1)

TRANSFORMERS: multi-purpose AI models in disguise

Novel applications of this powerful architecture set the bar for future AI advances.

If you have dug deep into machine learning algorithms, you will probably have heard of terms such as neural networks or natural language processing (NLP). Regarding the latter, a powerful model architecture has appeared in the last few years that has disrupted the text mining industry: The Transformer. This model has altered the way researchers focus on analysing texts, introducing a novel analysis that has improved the models used previously. In the NLP field, it has become the game-changer mechanism and it is the main focus of research around the world. This has brought the model wide recognition, especially through developments such as OpenAI’s GPT-3 model for the generation of text.

Moreover, it has also been concluded that the architecture of Transformers is highly adaptable, hence applicable to tasks that may seem totally unrelated to each other. These applications could drive the development of new machine learning algorithms that rely on this technology.

The goal of this article is to present the Transformer in this new light, showing common applications and solutions that employ this model, but also remarking on the new and novel uses of this architecture that take into account its many advantages and high versatility.

So, a brief introduction to the Transformer, its beginnings and the most common uses will be presented next. In the second part of this article, we will delve deeper into the new advances being made by the research community, presenting some exciting new use cases and code examples along the way.

It should be noted that AI solutions sometimes lack the responsibility and rigour  required when practising Data Science. The undesired effect is that models can retain the inherent bias of the data sets used to train them, and this can lead to fiascos such as Google’s Photos app. (https://www.bbc.com/news/technology-33347866). I recommend you check out my colleague’s Jesús Templado article on responsible AI and some hands-on criteria to follow when ideating, training or fine-tuning these models.

(https://medium.com/bedrockdbd/part-i-why-is-responsible-ai-a-hot-topic-these-days-da037dbee705).

 
 

TRANSFORMER: APPEARANCE & RESEARCH

NLP is one of the cornerstones of Data Science, and it is involved in most of our daily routines: web search engines, online translations or social networks are just some examples where AI algorithms are applied in the understanding of textual data. Until 2017, most research in this field was focused on developing better models based on recurrent and convolutional neural networks. These models were the highest performers in terms of accuracy and explainability at the time, albeit at the cost of enormous processing power and long training times. This meant the focus of the whole research community was on how to make these models perform better, or how to reduce the machine processing costs. However, a bottleneck was quickly being reached in terms of computational power, and novel ways of analysing text were needed more than ever.

In December 2017, the Transformer model architecture was proposed by Google Brain and Google Research members in the paper Attention is all you need (https://arxiv.org/abs/1706.03762), providing a new approach to NLP tasks through self-attention technology. This architecture completely outperformed previous models, both in terms of accuracy and training time, and quickly became the state-of-the-  architecture for these applications.

One question may come to your mind: How does a Transformer work? How and why is it better? Although we will avoid highly technical explanations, a basic grasp of the fundamentals for each model is needed to understand its many advantages.

Figure x: schema of a neural network. (https://www.w3schools.com/ai/ai_neural_networks.asp)

Neural networks are connections of nodes that represent relationships between data. They consist of input nodes where data is introduced, intermediate layers where it is processed, and output nodes where the results are obtained. Each of these nodes performs an operation on the data (specifically a regression) that affects the final result.

Figure 2: Graphical comparison between a neural network and a RNN. The loop provides the time dimension to the model.

Recurrent neural networks or RNN also take into account the time dimension of the data, where the outcome is influenced by the previous value. This allows the previous state of the data to be kept and sent into the next value. A variation of the RNN named LSTM or long short-term memory also takes into account multiple points, so the result avoids short-term memory issues with the model that the RNN usually presents.

Figure 3: schematic view of a CNN. Feature learning involves the training process, while classification is the model output.

Convolutional neural networks or CNN apply a mathematical transformation called convolution to the data over a sliding window; this essentially looks at small sections of the data to understand its overall structure, finding patterns or features. The architecture is especially useful for Computer Vision applications, where objects are detected after looking at pieces of each picture.

Recurrence is the main advantage of these models and makes them particularly suited for Computer Vision applications, but it becomes a burden when dealing with text analysis and NLP. The computational power increase when dealing with more complex word relationships and context quickly became a limiting factor for the direct application of these models.

 

The advantage of the Transformer is replacing recurrence for Attention. Attention in this context is a relation mechanism that works “word-to-word”, computing the relationship of each word with the rest, including itself. Since this m  components, the computational cost needed is lower than recurrence methods.

In the original Transformer architecture, this mechanism is actually a multi-headed attention that runs these operations in parallel to both speed the calculations, as well as to learn different interpretations for the same sentence. Although other factors are involved, this fact is the main reason why the Transformer takes less time to be trained and produces better results than its counterparts, and the reason why it is the predominant algorithm in NLP.

If you want to learn more about the original Transformer and its most famous variants, I suggest you take a look at Transformers for Natural Language Processing by Denis Rothman; it includes a hands-on explanation and coding lines for each step performed by the model, which helps to understand its inner workings.

Another great thing about the Transformer research community is the willingness to share and spread knowledge. The online community HuggingFace provides a model repository, a Python library and plenty of documentation to use and train new models based on the available frameworks developed by researchers. They also provide a course for those interested in learning about their platform, so this should be the first stop for you, as an interested reader, if you aim to learn more about the current state-of-the-art models!

Using these models is also very easy with the help of their library, in just a few lines of code we can use pre-trained models for different tasks. One of those is the use of over 1000 translation models developed by the University of Helsinki:

# Import the libraries
from transformers import MarianMTModel, MarianTokenizer
import torch

# Load a pretrained “English to Spanish” model
tokenizer = MarianTokenizer.from_pretrained(“Helsinki-NLP/opus-mt-en-es”)

model = MarianMTModel.from_pretrained(“Helsinki-NLP/opus-mt-en-es”)

# Input a sentence
input = tokenizer(“Transformers are a really cool tool for multiple NLP tasks, but they can do so much more!!”, return_tensors = ‘pt’, padding = True)

# Print the results
print(tokenizer.batch_decode(model.generate(**input), skip_special_tokens=True)[0])

The output is the sentence: Los transformadores son una herramienta realmente genial para múltiples tareas NLP, pero pueden hacer mucho más!!

Our team at Bedrock has been able to leverage these models to deliver powerful business solutions to People Analytics companies, further reinforcing their utility in the professional environment!

Stay tuned for the next part of this article, where we will present cutting-edge uses of the Transformer in other areas of application of AI, where previously other models reigned supreme.

Back to Articles

Aug 19 — 2021

The opportunity to apply responsible AI (Part 1): Guidelines, Data Science tools, legal initiatives, and tips.

Intro

Dramatic increases in computing power have led to a surge of Artificial Intelligence applications with immense potential in industries as diverse as health, logistics, energy, travel and sports. As corporations continue to operationalise Artificial Intelligence (AI), new applications present risks and stakeholders are increasingly concerned about the trust, transparency and fairness of algorithms. The ability to explain the behaviour of each analytical model and its decision-making pattern, while avoiding any potential biases, are now key aspects when it comes to assessing the effectiveness of AI-powered systems. For reference, bias is understood as the prejudice hidden in the dataset used to design, develop and train algorithms, which can eventually result in unfair predictions, inaccurate outcomes, discrimination and other similar consequences. Computer systems cannot validate data on their own, but are empowered to confirm decisions and here lies the beginning of the problem. Traditional scientists understand the importance of context in the validation of curated data sets. However, despite our advances in AI, the one thing we cannot program a computer to do is to understand context and we consistently fail in programming all of the variables that come into play in the situations that we aim to analyse or predict.
“A computer cannot understand context and we consistently fail in programming all of the variables that come into play in the situations that we aim to analyse or predict.”

Historical episodes of failed algorithmia and black boxes

Since the effectiveness of AI is now measured by the creators´ ability to explain the algorithm’s output and decision-making pattern, “Black boxes” that offer little discernible insight into how outcomes are reached are not acceptable anymore. Some historical episodes that brought us all here have demonstrated how critical it is to look into the inner workings of AI.
  • Sexist Headhunting: We need to go back to 2014 to understand where all this public awareness on Responsible AI began. Back then, a group of Scottish Amazon engineers developed an AI algorithm to improve headhunting, but one year later that team realised that its creation was biased in favour of men. The root cause was that their Machine Learning models were trained to scout candidates by finding terms that were fairly common in the resumés of past successful job applicants, and because of the industry´s gender imbalance, the majority of historical hires tended to be male. In this particular case, the algorithm taught itself sexism, wrongly learning that male job seekers were better suited for newly opened positions.
  • Racist facial recognition: Alphabet, widely known for its search engine company Google, is one of the most powerful companies on earth, but also came into the spotlight in May 2015.

Mr Alcine tweeted Google about the fact its app had misclassified his photo.

The brand was under fire after its Photo App mislabelled a user´s picture. Jacky Alcine, a black Web developer, tweeted about the offensive incorrect tag, attaching the picture of himself and a friend who had both been labelled as “gorillas” . This event quickly went viral.

 

  • Unfair decision-making in Court: In July 2016, the Wisconsin Supreme Court ruled that AI-calculated risk scores can be considered by judges during sentencing. COMPAS, a system built for augmented decision-making, is based on a complex regression model that tries to predict whether or not a perpetrator is likely to reoffend. The model predicted double the number of false positives for reoffending for African American ethnicities than for Caucasian ethnicities, most likely due to the historical data used to train the model. If the model had been well adjusted at the beginning, it could have worked to reduce unfair incarceration of African Americans rather than increasing it. Also in 2016, an investigation run by ProPublica found that there were some other algorithms used in US courts that tended to incorrectly dispense harsher penalties to black defendants than white ones based on predictions provided by ML models. These models were used to score the likelihood of these same people committing future felonies. Results from these risk assessments are provided to judges in the form of predictive scores during the criminal sentencing phase to make decisions about who is set free at each stage of the justice system, when assigning bail amounts or when taking fundamental decisions about imprisonment or freedom.
  • Apple´s Credit Card. Launched in August 2019, this product quickly ran into problems as users noticed that it seemed to offer lower credit to women. Even more astonishing was that no one from Apple was able to detail why the algorithm was providing this output. Investigations showed that the algorithm did not even use gender as an input, so how could it be discriminating without knowing which users were women and which were men? It is entirely possible for algorithms to discriminate on gender, even when they are programmed to be “blind” to that variable. A “gender-blinded” algorithm may be biased against women because it may be drawing data inputs that originally correlated with gender. Moreover, “forcing” blindness to a critical variable such as gender only makes it more difficult to identify and prevent biases on those variables.
  • Most recently, mainly around 2020, AI-enhanced video surveillance has raised some of the same issues that we have just read about such as a lack of transparency, paired with the potential to worsen existing racial disparities. Technology enables society to monitor and “police” people in real time, making predictions about individuals based on their movements, emotions, skin colour, clothing, voice, and other parameters. However, if this technology is not tweaked to perfection, false or inaccurate analytics can lead to people being falsely identified, incorrectly perceived as a threat and therefore hassled, blacklisted, or even sent to jail. This example became particularly relevant during the turmoil caused by the Black Lives Matter riots and the largest tech firms quickly took action: IBM ended all facial recognition programs to focus on racial equity in policing and law enforcement and Amazon suspended active contracts for a year to reassess the usage and accuracy of their biometric technology to better govern the ethical use of their facial recognition systems.

All these are examples of what should never happen. Humans can certainly benefit from AI, but we need to pay attention to all the implications around the advancements of technology.

Transparency vs effective decision-making: The appropiate trade-off

For high volume, relatively “benign” decision-making applications, such as a TV series recommendation in an Over-The-Top streaming platform, a “black box” model may be seem valid. For critical decision-making models that relate to mortgages, work requests or a trial resolution, black boxes are not an acceptable option.

After reading in the previous 5 examples where AI is ineffectively used to support decisions on who gets a job interview, who is granted parole, and even for making life-or-death decisions, it is clear that there’s a growing need to ensure that interpretability, explainability and transparency aspects are addressed thoroughly. This being said, “Failed algorithmia” does not imply that humans should not strive to automate or augment their intelligence and decision-making, but that it must be done carefully by following clever and strict development guidelines.

AI was born to augment human intelligence, but we need to ensure that it does not evolve towards automating our biases too. AI systems should be deemed trustworthy, relate to human empowerment, technical robustness, accountability, safety, privacy, governance, transparency, diversity, fairness, non-discrimination and societal and enviromental well-being.

“AI was born to augment human intelligence, but we need to ensure that it does not evolve towards automating our biases too.”

This responsibility also applies to C-level leaders and top executives. Global organisations aren’t leading by example yet and still show no willingness or need to expose their models’ reasoning or to establish boundaries for algorithmic bias . All sorts of mathematical models are still being used by tech companies that aren’t transparent enough about how they operate, probably because even those data and AI specialists who know their algorithms are at a risk of bias are still keen to achieve their end goal rather than taking it out.

So, what can be done about all this?

There are some data science tools, best practices, and tech tips that we follow and use at Bedrock.

I will be talking about all this in the second part of this article as well as about the need for guidelines and legal boundaries in the Data Science & AI field.

Back to Articles

Aug 19 — 2021

The opportunity to apply responsible AI (Part 2): Guidelines, Data Science tools, legal initiatives, and tips.

Intro

In the first part of this article we discussed the potential harm and risks of some Artificial Intelligence applications that have demonstrated immense potential across many industries. We concluded that the ability to explain each algorithm’s behaviour and its decision-making pattern is now key when it comes to assessing the effectiveness of AI-powered systems.

In this second part we will be providing some tips, tools and techniques to tackle this challenge. Likewise, we will be commenting on promising initiatives that are happening in the EU and worldwide around responsible AI. Lastly, we will comment on how responsible AI is an opportunity rather than a burden for organisations.

 

Technical guidelines and best practices

As professionals that operate in this field and that can be held accountable for what we develop, we should always ask ourselves two key questions:

  1. What does it take for this algorithm to work?
  2. How could this algorithm fail, and for whom?

Moreover, those developing the algorithms should ensure the data used to train the model is bias-free, and not leaking any of their own biases either. Here are a couple of tips to minimise bias:

  • Any datasets used must represent the ideal state and not the current one, as randomly sampled data may have biases since we live in an unfair way. Therefore, we must proactively ensure that data used represents everyone equally.
  • The evaluation phase should include a thorough “testing stage” by social groups, filtering these groups by gender, age, ethnicity, income, etc. when population samples are included in the development of the model or when the outcome may affect people.

 

What tools Data Scientists have

There are tools and techniques that professionals from our field use when they need to explain complex ML models.

  • SHAP (SHapley Additive exPlanation): Its technical definition is based on the Shapley value, which is the average marginal contribution of a feature value over all possible coalitions. In plain English: It works by considering all possible predictions by using all possible combinations of inputs and by breaking down the final prediction into the contribution of each attribute.
  • IBM’s AIX360 or AI Fairness 360: An open-source library that provides one of the most complete stacks to simplify the interpretability of machine learning programs and allows the sharing of the reasoning of models on different dimensions of explanations along with standard explainability metrics. It was developed by IBM Research to examine, report, and mitigate discrimination across the full AI application lifecycle. It is likely that we will see some of the ideas behind this toolkit being incorporated into mainstream deep learning frameworks and platforms.
  • What-IF-TOOL: A platform to visually probe the behaviour of trained machine learning models with minimal coding requirements.
  • DEON: A relatively simple ethics checklist for responsible data science.
  • Model Cards: Proposed by Google Research provides confirmation that the intent of a given model matches its original use case. Model Cards can help stakeholders to understand conditions under which the analytical model is safe and also safe to implement.

 

The AI greenfield requires strict boundaries

AI represents a huge opportunity for society and corporations, but the modelling processes should be regulated to ensure that new applications and analytical mechanisms always ease and improve everyone’s life. There is not any legal framework that helps to tackle this major issue, that sets boundaries and/or that provides bespoke guidelines. Likewise, there is not any international consensus that allows consistent ruling, audit or review of what is right and wrong in AI. In fact there is not even national consensus within countries.

Specific frameworks such as The Illinois’ Biometric Information Privacy Act (BIPA) in the US are a good start. The BIPA has been a necessary pain for tech giants as it forbids the annotation of biometric data like facial recognition images, iris scans or fingerprints without explicit consent.

There are ambitious initiatives such as OdiseIA that shed some light on what to do across industries and aim to build a plan to measure the social and ethical impact of AI. But this is not nearly enough because of the immediate need of international institutions to establish global consistency. If a predictive model recommends rejecting a mortgage, can the responsible data science and engineering team detail the logical process and explain to a regulator why it was rejected? Can the leading data scientist prove that the model is reliable within a given acceptable range of fairness? Can they prove that the algorithm is not biased?

The AI development process must be somehow regulated, establishing global best-practices as well as a mandatory legal framework around this science. Regulating the modelling process can mean several things: from hiring an internal compliance team that supports data and AI specialists to outsourcing some sort of audit for every algorithm created or implemented.

AI could be regulated in the same way The European Medicines Agency (EMA) in the EU follows specific protocols to ensure the safety, efficacy and adversarial effects for drugs.

 

Emerging legal initiatives: Europe leading the way

On 8th April 2019 the EU High Level Expert Group on Artificial Intelligence proactively set the Ethics Guidelines for Trustworthy AI that were applicable to model development. They established that AI should always be designed to be:

  1. Lawful: Respecting applicable laws and regulations.
  2. Ethical: Respecting human ethical principles.
  3. Robust: Both from a technical and sustainable perspective

The Algorithmic Accountability Act in the USA that dates from November 2019 is another example of a legal initiative that also aimed to set a framework for the development of algorithmic decision-making systems and has also served as a reference to other countries, public institutions and governments.

Fast forward to the present day, the European Commission proposed on 21st April 2021 new rules and actions with the ambition of turning Europe into the global hub for trustworthy AI by combining the first-ever legal framework on AI and a new Coordinated Plan with Member States. This new plan aims to guarantee the safety and fundamental rights of people and businesses, while strengthening AI uptake, investment and innovation across Europe. New rules will be applied in the same way across all European countries following a risk-based approach, and an Intelligence Board will facilitate implementation and drive the development of AI standards.

 

The opportunity in regulation

Governance in AI, such as that which the EU is driving, should not be considered as an evil. If performed accurately, AI regulation will level the playing field, will create a sense of certainty, will establish and strengthen trust and will promote competition. Moreover, governance would allow us to legally frame the boundaries on acceptable risks and benefits of AI monetisation while ensuring that any project is planned for success.

“AI regulation will level the playing field, will create a sense of certainty, will establish and strengthen trust and will promote competition.”

Regulation actually opens a new market for consultancies that help other companies and organisations manage and audit algorithmic risks. Cathy O’Neil, a mathematician and the author of Weapons of Math Destruction, a book that highlights the risk of algorithmic bias in dozens of contexts, heads the Online Risk Consulting & Algorithmic Auditing (ORCAA), a company that was set up to help companies identify and correct any potential biases in the algorithms they use.

Counting on a potential international legislator or auditor would also allow those that achieve the “Audited player label” to project a positive brand image while remaining competitive. Using an analogy that relates to drug development: Modern society relies on medicines prescribed by doctors because there is an inherited trust in their qualifications, and because doctors believe in the compulsory clinical trial processes that each drug goes through before hitting the market.

 

Final thoughts

Simply put, AI has no future without us humans. Systems collecting the data typically have no way to validate the data they collect and in which context the data is recorded and collected. Data has no intuition, strategic thinking or instincts. Technological advancements are shaping the evolution of our society, but each and every one of us is responsible for paying close attention to how AI, as one of these main advancements, is used for the benefit of the greater good.

If you and your organisation want to be ahead of the game, don’t wait for regulation to come to you, but take proactive steps prior to any imposed regulatory shifts:

  • It must be well understood that data is everything. Scientists strive to ensure the quality of any data set used to validate a hypothesis and go to great lengths to eliminate unknown factors that could alter their experiments. Controlled environments are the essence of well-designed analytical modelling.
  • Design, adapt and improve your processes to learn how to establish an internal “auditing” framework. Something like a minimal viable checklist that allows your team to work on fair AI while others are still trying to squeeze an extra 1% accuracy from a ML model. Being exposed to the risk of deploying a biased algorithm that may potentially harm your customers, scientific reputation and your P&L is not appealing.
  • Design and build repositories to document all newly created governance and regulatory internal processes so that all work is accessible and can be fully disclosed to auditors or regulators when needed, increasing external trust and loyalty to your scientific work.
  • Maintaining diverse teams, both in terms of backgrounds, demographics and in terms of skills is important for avoiding unwanted bias. While in the STEM world women and people of colour remain under-represented, these may be the first people to notice these issues if they are part of the core modelling and development team.
  • Be a promoter and activist for change in the field. Ensure that your communications team and technical leaders take part in AI ethics associations or debates of the like. This will allow your organisation to be rightly considered a force for change.

All these are AI strategic mechanisms that we use at Bedrock and that allow the legal and fair utilisation of data. The greatest risk for you and your business not only lies in ignoring the potential of AI, but also in not knowing how to navigate AI with fairness, transparency, interpretability and explainability.

Responsible AI in the form of internal control, governance and regulation should not be perceived as a technical process gateway or as a burden on your board of directors, but as a potential competitive advantage, representing a value-added investment that still is unknown for many. An organisation that successfully acts on its commitment to ethical AI is poised to become a thought leader in this field.

Back to Articles

Jul 23 — 2021

How using adaptive methods can help your network perform better

How using adaptive methods can help your network perform better

Intro

An Artificial Neural Network (ANN) is a statistical learning algorithm that is framed in the context of supervised learning and Artificial Intelligence. It is composed of a group of highly connected nodes called neurons that connect an input layer (input), and an output layer (output). In addition, there may be several hidden layers between the previous two, a situation known as deep learning.

Algorithms like ANNs are everywhere in modern life, helping to optimise lots of different processes and make good business decisions. If you want to read a more detailed introduction to Neural Network algorithms, check out our previous article, but if you’re feeling brave enough to get your hands dirty with mathematical details about ways to optimise them, you’re in the right place!

Optimisation techniques: Adaptive methods

When we train an artificial neural network, what we are basically doing is solving an optimisation problem. A well optimised machine learning algorithm is a powerful tool, it can achieve better accuracy while also saving time and resources. But, if we neglect the optimisation process, we can cause very negative consequences. For instance, the algorithm might seem perfect during the tests and fail resoundingly in the real world, or we could have incorrect underlying assumptions about our data and amplify them when we implement the model. For this reason, it is extremely important to spend time and effort optimising a machine learning algorithm and, especially, a neural network.

The objective function that we want to optimise, – in particular, minimise-, in this case is the cost function or loss function J, which depends on the weights \omega of the network. The value of this function is the one that informs us of our network’s performance, that is, how well it solves the regression problem or classification that we are dealing with. Since a good model will make as few errors as possible, we want the cost function to reach its minimum possible value.

If you have ever read about neural networks, you will be familiar with the classic minimisation algorithm: the gradient descent. In essence, gradient descent is a way to minimise an objective function – J(\omega) in our case – by updating its parameters in the opposite direction of the gradient of the objective function with respect to these parameters.

Unlike other simpler optimisation problems, the J function can depend on millions of parameters and their minimisation is not trivial. During the optimisation process for our neural network, it is common to encounter some difficulties like overfitting or underfitting, choosing the right moment to stop the training process, getting stuck in local minima or saddle points or having a pathological curvature situation. In this article we will explore some techniques to solve these two last problems.

Neural networks

Remember that gradient descent updates the weights \omega of the network in a step t + 1 th as follows mechanisms, concluding what happened and what is likely to happen next.

In order to avoid these problems, we can input some variations in this formula. For instance, we could alter the learning rate \alpha, modify the component relative to the gradient or even modify both terms. There are many different variations that modify the previous equation, trying to adapt it to the specific problem in which they are applied; this is the reason why these are called adaptive methods.

Let’s take a closer look at some of the most commonly used techniques:

  1. Adaptive learning rate

The learning rate \alpha is the network´s hyperparameter that controls how much the model must change, based on the cost function value, each time the weights are updated; it dictates how quickly the model adapts to the problem. As we mentioned earlier, choosing this value is not trivial. If \alpha is too small, the training stage takes longer and the process may not even converge, while if it is too large, the algorithm will oscillate and may diverge.

Although the common approach taking \alpha = 0.01 provides good results, it has been shown that the training process improves when \alpha stops being constant and starts depending on the iteration “t”. Below are three options that rephrase \alpha’s expression:

Exponential decay

Inverse decay

Potential decay

The constant parameter “k” controls how \alpha_t decreases and it is usually set by trial and error. In order to choose the initial value of \alpha, \alpha_0, there are also known techniques, but they are beyond the scope of this article.

Another simpler approach that is often used to adapt \alpha consists in reducing it by a constant factor every certain number of epochs – training cycles through the full training dataset -. For example, dividing it by two every ten epochs. Lastly, the option proposed in [1] is shown below,

where \alpha is kept constant during the first \tau iterations and then decreases with each iteration t.

 

Adaptive optimisers

  • Momentum

We have seen that when we have a pathological curvature situation, the descent of the gradient has problems in the ravines [Image 2],  , in the parts of the area where the curvature of the cost function is much greater along one dimension than the others. In this scenario, the gradient descent oscillates between the ridges of the ravine and progresses more slowly towards the optimum. To avoid this, we could use optimisation methods such as Newton’s known method, but this may significantly raise the computational power requirements since it would have to evaluate the Hessian matrix of the cost function for thousands of parameters.

The momentum technique was developed to dampen these oscillations and accelerate convergence of the training. Instead of only considering the value of the gradient at each step, this technique accumulates information about the gradient in previous steps to determine the direction in which to advance. The algorithm is set as follows:

where \beta \in [0,1] and m_0 is equal to zero.

If we set \beta = 0 in the previous equation, we see that we recover the plain gradient descent algorithm!

As we perform more iterations, the information of gradients from older stages has a lower associated weight; we are making an exponential moving average of the value of the weights! This technique is more efficient than the simple moving average since it quickly adapts the value of the prediction of fluctuations in recent data.

 

  • RMSProp

The Root Mean Square Propagation technique, better known as RMSProp, also deals with accelerating convergence to a minimum, but in a different way from Momentum. In this case we do not adapt the gradient term explicitly:

We have now introduced “v_t” as the exponential moving average of the square of gradients. As an initial value it’s common to take v = 0 and the constant parameters equal to \beta = 0.9 and \epsilon = 10^{-7}.

Let’s imagine that we are stuck at a local minimum and the values of the gradient are close to zero. In order to get out of this “minimum zone” we would need to accelerate the oscillations by increasing \alpha. Reciprocally, if the value of the gradient is large, this means that we are at a point with a lot of curvature, so in order to not exceed the minimum, we then want to decrease the size of the step. By dividing \alpha by that factor we are able to incorporate information about the gradient in previous steps and increase \alpha when the magnitude of the gradients is small.

 

  • ADAM

The AdaptativeMomentOptimization algorithm, better known as ADAM, combines the ideas of the two previous optimisers above.

\beta_1 corresponds to the parameter of the Momentum and \beta_2 to the RMSProp.

We are adding two additional hyperparameters to optimise in addition to \alpha, so some might find this formulation counterproductive, but it is a price to be paid if we aim to accelerate the training process. Generally, the values taken by default are \beta_1 = 0.9, \beta_2 = 0.99 and \epsilon = 10^{-7}.

It has been empirically shown that this optimiser can converge faster to the minimum than other famous techniques like the Stochastic Gradient Descent.

Lastly, it is worth noting that it is common to make a bias correction in ADAM’s equation: This is because at the first stages ​​we would not have much available data from previous ones, and then the formula would be reformulated with

\beta_1 corresponds to the parameter of the Momentum and \beta_2 to the RMSProp.

We are adding two additional hyperparameters to optimise in addition to \alpha, so some might find this formulation counterproductive, but it is a price to be paid if we aim to accelerate the training process. Generally, the values taken by default are \beta_1 = 0.9, \beta_2 = 0.99 and \epsilon = 10^{-7}.

It has been empirically shown that this optimiser can converge faster to the minimum than other famous techniques like the Stochastic Gradient Descent.

Lastly, it is worth noting that it is common to make a bias correction in ADAM’s equation: This is because at the first stages ​​we would not have much available data from previous ones, and then the formula would be reformulated with

Conclusions

In summary, the goal of this article is to introduce some of the problems that may arise when we wish to optimise a neural network and the most well-known adaptive techniques to tackle them. We’ve seen that the combination of a dynamic alpha with an adaptive optimiser can help the network learn much faster and perform better. We should remember, however, that Data Science is a field in constant evolution and while you were reading this article, a new paper may have been published trying to prove how a new optimiser can perform a thousand times better than all the ones mentioned here!

In future articles we will look at how to tackle the dreaded problem of an overfitting model and the vanishing gradient. Until then, if you need to optimise a neural network, don’t settle for the default configuration, use these examples to try to adapt it to your specific real problem or business application 🙂

REFERENCIAS:

[1] Bengio, Y. 2012. Practical Recommendations for Gradient-Based Training of Deep Architectures. arXiv:1206.5533v2.

[2] Intro to optimization in deep learning: Momentum, RMSProp and Adam – Ayoosh Kathuria https://blog.paperspace.com/intro-to-optimization-momentum-rmsprop-adam/

[3] Kingma, Diederik and Jimmy Ba. 2014. Adam: A method for stochastic optimization. arXiv

preprint arXiv:1412.6980.

[4] Zhen Xu, Andrew M. Dai, Jonas Kemp, Luke Metz. 2019. Learning an Adaptive Learning Rate Schedule. arXiv:1909.09712v1.

Back to Articles
Apr 27 — 2021

Quantum Physics II — Data collection, modelling, and analysis at CMS CERN

HOW DATA ANALYSIS AT CERN CAN HELP DETECT DARK MATTER A comprehensive guide of the CERN workflow in new Physics discoveries.

CMS ALGORITHMS FOR ANALYTICS AND DARK MATTER DETECTION

In this second part of the previous introductory article, we’ll tackle the more in-depth description of data collection, object modelling, and data analysis at CMS. The general workflow behind these kinds of experiments is complex, but I’ll try to give a brief description of each part so you can get a general idea of the whole process.

Analysis of a collision and important concepts

We‘ve been talking about particles and detectors, but an important question remains unanswered: what exactly is colliding? The answer is protons, or more exactly, bunches of protons. Accelerated through the LHC by multiple solenoid setups, these bunches come across one another in the detectors, and a collision happens. This proton beam has no perpendicular component, meaning it only moves in a circular manner through the LHC, due to the solenoids’ magnetic field; this is done on purpose since studying the collision in the perpendicular plane means the conservation of energy applies and all momentum must be null, meaning if there’s an imbalance an invisible particle has been produced. A cross-section of the detector can be seen below, with its components (here). Some examples of these particles are neutrinos, but also possible dark matter or other theoretical particles. So now, what’s the next step?

Analysis of a collision and important concepts We‘ve been talking about particles and detectors,

At the interaction vertex, these particles collide and decay products flow through the different parts of the detector, constructed so that measures are taken with high precision and allow for the (mostly) unequivocal identification of particles. However, not everything is as straightforward as it seems, and several events need to be taken into account; I’ll briefly explain them below.

One of the most important magnitudes in colliders is luminosity, which is interpreted as the number of collisions per unit of time and surface. As of 2018, when Run 2 ended, the luminosity value meant that every 25 ns a collision occurred and that almost 150 femtobarns of cumulative data (each femtobarn equals 100 billion collisions!) were analysed.

This raises a significant problem: it’s obvious that more collisions are needed to assess the veracity of models, but if there are too many, the detector won’t be able to keep up with them; this is further complicated since the interactions are among proton bunches. The concept is known as pile-up, and it refers to the average number of collisions that occur each time the detector tries to check the decay products of an interaction. With current data, this value is around 20 collisions, but CERN is allocating its resources to an upgrade called High-Luminosity LHC, which will improve on the integrated luminosity and, in turn, increase the pile-up to almost 200; this is unfeasible with current hardware and software, meaning this project will need serious backing and development; however, the benefits far outweigh the difficulties.

Okay, and now that we know a lot of collisions occur at almost the same time, how is CMS able to discern the decay products of the interaction it wants? Not only that, but the amount of data produced is orders of magnitude over what current hardware and software can process and store. This is the core process of CERN data gathering, and it’s taken care of by several software and hardware solutions commonly referred to as trigger. It’s a really interesting and core part of the data gathering process, but it’s too extensive and technical to discuss here, so I’ll explain the methodology behind it briefly and leave some documentation here, here, and here in case you’re interested. I’ll also leave a diagram below that summarises this information.

The first phase filters down a lot of data using detector hardware to register adequate data in a really short time (almost a millionth of a second!), taking into account information from the muon chambers and calorimeters specifically, and is known as the level-1 trigger or L1. This phase can cut down the total recorded data from 40 MHz up to just 100 kHz, offering the first cut towards useful data. Next, a second set of commercial processors and parallel machines takes that data and further refines it based on precision parameters, operating offline; this is called the high-level trigger or HLT. Then, it creates a backup of the data and sends it to the associated research institutions for analysis. Even at this last stage, recorded data is still around the 1 GB/s mark, showing just how much the trigger and further resources are needed.

CMS Level-1 trigger overview.
Schematic view of the CMS Data Acquisition System

Collision object identification

Now that collision data has been recorded, the next step is to identify the particles based on their traces through the detector. This will allow scientists to reconstruct the particles that have appeared as a result of the collision, using the data collected at the different parts of the detector. These reconstructions of particles at CMS employ an algorithm called Particle Flow; the algorithm itself is really complex and takes into account many measurements and variables to discriminate between particles and correctly tag them, so please check the documentation provided if you’re interested.

With this algorithm, photons and leptons (especially muons) are easily tagged with the algorithm, but more complex objects such as hadrons are harder to tag. The difficulty associated with this jet tagging is related to the nature of the Strong force and a concept called sea quarks that we’ll discuss briefly next.

In a collision between two protons, or more accurately between proton quarks, the immediate thought is that only the constituent quarks can interact, so only up and down quarks would interact. However, a measure of the mass of the proton and these quarks shows that most of the mass doesn’t come from its constituents but from internal bounding energy. This means that this excess energy, if the collider energy is enough, can produce other quarks such as quarks bottom and top, both several times the mass of quarks u and d, and these quarks may be the ones that interact at the collision point; an example of a collision may be seen in the image below.

Schematic interaction of two colliding protons and their partons.

An important concept of jet tagging is that, usually, the decay-inducing quark is unknown since the hadronisation process is so chaotic, but a certain type of jet can be identified. Jets coming from the decay of a b quark have a characteristic second vertex, which has enough travel distance from the first as to be measured (the reason is that b quarks have a longer lifetime than other quarks); for this reason, it’s perceived as a separate important event and receives the name b-tagging. This event has a specific algorithm created to detect it called CVS.

 

Data simulation and model checking

Once meaningful data has been retrieved and objects are recreated, it’s time to check this data against previously known results for the SM. The way CERN takes care of this comparison is by employing data simulations with MonteCarlo samples. These simulations include all data related to processes’ cross-sections (defined as the probability of decay related to all possible decays), decays, and detector components (to the point of knowing the location of each pixel!) so that the uncertainty of these controllable events is minimised and meaningful conclusions can be made; if we want to measure a cross-section for dark matter, which may be very low, these uncertainties could be either the defining point of discovery or just statistical variations.

The algorithms simulate particles moving through the detector, interacting with each other, showing decay channels to high perturbation theory orders, and in general being very precise with the locations and efficiency of each and every part of the detector. The main algorithms used in this simulation are Powheg and aMC@NLO, both of them built on Pythia. Afterwards, the software Geant4 simulates the particle interactions with the CMS detector. These algorithms provide SM-accurate processes for all the different backgrounds needed in the analysis, which we will explain next.

Now that we have collected real collision data and have data simulations, the next step is to define the process we want to study, like a certain decay of particles producing dark matter. To check if dark matter production is possible in this model, the investigator must include in its data all the possible backgrounds; in this context, a background is a decay that leaves the same traces in the detector as the main process we want to study.

It’s mandatory at this state to include blinding to the data, meaning that we shouldn’t include real data until the end of the study; this prevents the investigator from being biased.

Finally, the goal of these discovery projects can be summarised in one sentence: after including the simulation data, the investigator selects measures like the number of jets, tagged b quarks, missing transverse energy, etc. or defines new ones that they think could potentially be used to discriminate the signal (the studied process) against the backgrounds; this means, for example, selecting variable intervals where the dark matter processes are abundant while background processes are not.

Afterwards, the investigator includes the real data and checks if the results are in agreement with the SM. The way this is done is by a hypothesis test of:

  • H0: SM physics
  • H1: BSM physics

This includes doing some advanced statistical analysis, like creating a modified likelihood ratio to obtain confidence intervals (CI) of the cross-sections of these processes. This is very technical so I’ll leave some documentation here that hints at the entire process, but the basic idea behind it is that we can compare the obtained results with what is expected and:

If they’re similar, then the investigator has fine-tuned the limits of the cross-section of the model for future references, and further studies of that model are made easier.
If they aren’t similar, then they might have discovered something new! In particle physics, this case is presented when differences between theoretical and experimental cross-sections exceed 2𝝈 uncertainty, but it’s not classified as a discovery until the 5𝝈 threshold; 𝝈 in this context refers to the confidence interval associated, and this article gives a really good explanation of this concept. This shows that, in particle physics, we are really certain of the results obtained.

As an example, the most recent case of this 5𝝈 threshold was the discovery of the Higgs boson back in 2012.

Conclusion

The goal of this article is to show the reader the workflow of a CMS investigator researching a certain process, either a search for new particles like dark matter or already studied processes. From the detector components to the data collection, simulation and analysis, I hope you have acquired a general understanding of these concepts, albeit very superficial. In that regard, the literature regarding this subject is written by CERN physicists, so assumptions about all these steps are regularly made, and the average reader will be lost in the concepts and common slang employed.

I hope this article has helped you get a better understanding of this workflow, and that it’ll maybe spark some interest in particle physics, helping you understand further research done and news stories about it.

Back to Articles
Apr 27 — 2021

Quantum Physics I — the CERN data workflow in new Physics discoveries.

HOW DATA ANALYSIS AT CERN CAN HELP DETECT DARK MATTER A comprehensive guide of the CERN workflow in new Physics discoveries.

HOW DATA ANALYSIS AT CERN CAN HELP DETECT DARK MATTER

A comprehensive guide of the CERN workflow in new Physics discoveries.

Are you interested in learning more about particle physics? You might have heard terms like neutrinos, quarks, or dark matter mentioned before and want to know more about them. However, the literature and articles involved mainly use terms and concepts that the reader is supposed to already know, which makes them inaccessible to anyone not in this field.

In this article, I’ll provide an easy-to-understand explanation about everything you’d need to know to understand the main points of these articles: the main results, how they’ve been obtained, and the methodology behind their collection. It will be divided into two parts:

  • The first one will focus on the basic concepts regarding this subject: a basic Standard Model (SM) overview and a description of dark matter and the Compact Muon Solenoid (CMS) detector.
  • The second one will focus on the details of data collection and analysis at CERN; this one is where the Data Science component will be.

My experience with this subject comes from my undergraduate project in Physics dedicated to a dark matter model verification with LHC Run 2 data, where my love for experimental particle physics was consolidated. I hope this article will help you understand the basics of these studies at CERN and get you started in further publications on this topic.

Standard Model Basics

We’ll start with a short description of the Standard Model (SM), the theory that describes the structure of matter and its interactions. It’s worth noting that the terminology model comes from the 1970s, when this theory didn’t have enough experimental evidence to support it as such, while nowadays it does.

The SM postulates the existence of two kinds of particles: fermions, which compose all visible matter, and baryons, that mediate the fundamental interactions (ElectroMagnetism, Strong and Weak forces, and Gravity; integrating this last one is still one of the biggest mysteries of modern physics). According to this theory, every interaction between two particles is mediated via the exchange of a boson. Below you can see a simple diagram of these particles and their classifications. The main difference between these two types of particles is their spin: fermions have half-integer spins, while bosons have integer spins, and this is the main reason the physics around these particles are so different.

Standard Model of Elementary Particles
Fermions are further divided into two families of 6 particles each: leptons (electron, muon, tau, and their respective neutrinos; these are particles with little mass and no electric charge that appear in the decay of their associated lepton) and quarks; the main difference between these is that quarks are affected by the Strong force, while leptons aren’t, and the reason for this comes from the charges of these particles. Another kind of particle not shown in the previous graphic is the antiparticle, which is the same particle, but with opposite electric charge; you may know this by its much cooler name of antimatter. Furthermore, quarks cannot be found as independent particles and must form bigger particles called hadrons; the most famous ones being the proton and neutron, composed of quark configurations uud and udd respectively; this phenomenon of hadronization of quarks is called confinement and is a really interesting topic for those interested. Since the fundamental interactions are mediated by bosons, these must couple with the corresponding fermions, but that is only possible if they share the same type of charge. To further explain this, let’s examine each of these forces:
  • ElectroMagnetism: this force is mediated by the photon (γ) between particles that share electric charge; this includes all quarks, all leptons except neutrinos, and both bosons W⁺ and W⁻.
  • Weak force: this is the main force behind the decay of some particles into others, like radioactivity, and is mediated by the bosons W⁺, W⁻, and the boson Z⁰ (the superscript indicates their electric charge). The charge needed for this interaction is the weak charge, and all fermions have it.
  • Strong force: this is the force that binds the atomic nucleus together, and is mediated by the gluon (g), which has no electric or weak charges. However, a big difference with the previous interactions is that the gluon possesses colour (this is the name given to the strong charge). Its name comes from the fact that this interaction is several times stronger than electromagnetism, thus showing how the nucleus can exist even when it’s made out of protons that should be repelling one another. Only quarks have colour, and as such are the only particles affected by the strong force.
The last piece of the puzzle in the SM is the Higgs boson, which couples to every particle (including itself!) except neutrinos, and is responsible for the mass of the particle. This may seem like a lot of information at once, but don’t worry, the goal of this overview of the SM is to give some context and familiarise you with the forces and particles that rule our universe, you can always reread it later.

What is Dark Matter?

We’ve talked about dark matter in this article before, but we haven’t really given any sort of description of it; we’ll tackle this briefly in this section, discussing the evidence that supports its existence as well.

Several sources, even going back as far as the 1930s, show that astronomical calculations involving galaxy masses and rotational speeds don’t match with the expected results from the observable masses they have. One example of this is the Cosmic Microwave Background or CMB, where we learned that baryonic matter (meaning stars, planets, humans, etc.) only makes up ~5% of the total universe mass; here is some documentation on CMB, gravitational lensing and the Hubble Law that expands on this matter.

Is there a possibility that this is actually some sort of known particle? Some possible candidates could be :
  • Antimatter: this is impossible since the matter-antimatter annihilation process shows very characteristic 𝜸-rays, and these are not seen.
  • Black holes: again, it can’t be since black holes curve light around them, and dark matter doesn’t affect photons.
So, now we know most of the universe is made out of some sort of matter/energy that doesn’t interact electromagnetically, which makes it pretty much invisible to most detectors on Earth, but does interact gravitationally; if it didn’t, then we wouldn’t even know it existed. Since this is the case, the naming convention for this new type of matter is dark matter, the dark alluding to the fact that it doesn’t interact normally with baryonic matter. As you can guess, knowing that around 90% of the universe is made out of something we don’t know makes this one of the main research topics of modern-day physics, including searches in particle accelerators like LHC.

The CMS detector of the LHC

The LHC is the biggest, most powerful particle collider in the world, and provides the best opportunity to discover new particles, like dark matter. In this article, we’ll focus on CMS (Compact Muon Solenoid), but the data collection and general workflow apply to the other three main detectors: ATLAS, ALICE, and LHCb. Its name comes from the muon chambers that, combined with the solenoid, offer the best muon detection resolution available today. There are several components in CMS that can detect all traces of different particles leaving the collision point; if you want a more detailed explanation take a look at this article. There are lots of components, as you can see, but the important thing you have to remember is that their purpose is to analyse the different tracks that different particles leave: electric charge, movement traces, energy, etc. With all of these concepts introduced, we’re ready to dive deeper into the specifics of data collection at CERN in our second part!

Posts navigation

Older posts

Newsletter
Discover the latest on our work, scientific breakthroughs, content, and events.

By clicking "subscribe" I accept the privacy policy.

Data by Design
Our work
Podcast
Articles

Team
Careers
La Pipa
Contact

Let's talk

+34 984 886 003
+44 20 4525 1661

©BEDROCK Intelligence S.L.U. | hello@bedrockdbd.com | Privacy Policy & Cookies