Digital Ad Spending crossed 100 Billion Dollars last year in the US alone. Online Advertising over the past decade has become increasingly personalised due to more user data being available. This allows ad-rendering platforms to curate available advertisements to suit user’s preferences. This can prove to be a difficult task because first, there are a large number of advertisements to choose from. We need an algorithm that can alternate between different ads for different users while optimising the CTR. CTR stands for click-through rate: a metric that measures the number of clicks advertisers receive on their ads per number of impressions.
Higher CTR indicates the success of the algorithm. Second, the new advertisements are added and old ones are removed quickly from the active pool of ads. The algorithm initially has no knowledge about user behaviour for the new advertisements. Third, the model needs to learn in real time or initially the model has no knowledge about the user behaviour, it learns in real-time. To gain that knowledge the algorithm needs to explore, i.e randomly try out new ads for the users. Hence the problem comes down to learning from past user behaviour to show the best ads but at the same time exploring new advertisements while maximising Click Through Rate(CTR). There are multiple domains that face similar challenges, Research Funding, Clinical Trials and so on. All these problems fall under a category known as Contextual Multi-Armed Bandit Problem.
In the past, many strategies have been developed to model the Multi-armed bandit problem and provide optimised solutions, Banditron, Confidit, Random Forest to name a few. Most of these strategies have two components: The base learner and the Exploration algorithm. These allow the algorithm to balance exploitation and exploration to provide optimal results. Many efficient contextual bandit solutions have been proposed that provide strong theoretical guarantees. However, it is difficult to extend these solutions to real-world problems due to the presence of multiple tunable parameters that require prior knowledge of the data to set their values. We provide a new Contextual Bandit algorithm, which makes use of adaptive fuzzy inference systems using Thompson sampling-based approach. Several existing models are trained to validate the expected value of reward for a given arm. The proposed algorithm is tested against many existing bandit algorithms on multiple datasets namely, Statlog, Covertype, Mushroom and Adult Income.
Further, we have used our model to detect fraudulent transactions in credit card transactions. The motivation to model fraud detection in credit card transactions as a contextual multi-armed bandit problem is the nature of the learning environment. Normally, credit card transactions are collected on a day to day basis, which is then evaluated by an ensemble model which marks the transactions that are most likely to be fraudulent. These are then evaluated by human experts. However, the sooner a fraudulent transaction is detected the better it is for the company, hence a model that can predict fraudulent transactions in real time can reduce losses and improve performance. Multi-armed bandit problem (MAB) is a special class of sequential optimization problems.
It is a problem where a fixed set of resources should be allocated to a fixed number of competing choices in a way that maximises the expected reward. Consider a scenario where a gambler has to choose between k-bandit machines on each turn. He, however, doesn’t have any knowledge about the reward distribution for each bandit machine, which means he needs to play each machine at least once to gain information about it. He has to alternate between exploiting the best machine based on his current knowledge or explore new machines to expand his knowledge base. This problem gets its name from this scenario. It is generally applied to areas where resources have to be allocated with an optimizing objective. We run into the Multi-Armed bandit problem whenever we run into the dilemma of exploring or exploiting in the problem space.