Used Logistic Regression, Random Forest, and XGBoost to predict the outcome of Search & Destroy games from the Call of Duty World League for the 2018 and 2019 seasons.

Overview

Call of Duty World League: Search & Destroy Outcome Predictions

CWL Image

Growing up as an avid Call of Duty player, I was always curious about what factors led to a team winning or losing a match. Was it strictly based on the number of kills each player obtained? Was it who played the objective more? Or was it something different? Finally, after years of waiting, I decided that it was time to find my answers. Coupling my love for Call of Duty and my passion for data science, I began to investigate predicting the outcome of Search & Destroy games from the Call of Duty World League's 2018 and 2019 seasons.

Utilizing Python, I created a Logistic Regression binary classification model that provided insight into the significant factors that led teams to win Search and Destroy matches. Did you know that every time a player has exactly two kills in around a team's odds of winning increase by 59%? Or that every time a team defuses the bomb, their odds of winning the match increase by 54%? What about when someone on the team commits suicide? The team's odds of winning the match decreased by a whopping 43%!

I also built an XGBoost and a Random Forest model to see how accurately I could predict a Search & Destroy match outcome. The XGBoost model was ~89% accurate when predicting Search & Destroy match outcomes on test data! This model found that one of the least important variables for predicting a team's win or loss is if the team had a sneak defuse at any point during the match. Although sneak defuses are beneficial to a team's success, it would be more impactful if players removed all enemies from the round before defusing the bomb.

Project Goals

  1. Learn about essential factors that play into a team's outcome for Search & Destroy matches
  2. See how well I can predict a team's wins and losses for Search & Destroy matches

What did I do?

I used data from 17 different CWL tournaments spanning two years. If you are curious, you can find each dataset within this Activision repository hosted here. I excluded the data from the 2017 CWL Championships tournament because this set does not have all the Search & Destroy variables that the other datasets have. The final dataset had 3,128 observations with 30 variables. In total, there are 1,564 Search & Destroy matches in this dataset. All variables are continuous; there were no categorical variables within the final data used for modeling besides the binary indicator for the match's outcome.

To reach the first goal of this project, I created a Logistic Regression model to learn about the crucial factors that can either help a team win or pull a team toward a loss. To reach the second goal of this project, I elected to use both Random Forest and XGBoost models for classification to try and find the best model possible at predicting match outcomes.

How did I do it?

Logistic Regression

After joining the data, I first needed to group the observations by each match and team, then I filtered for only Search & Destroy games. That way, we have observations for both wins and losses of only Search & Destroy matches. I used a set of 14 variables for the model development process. The variables are as follows: Deaths, Assists, Headshots, Suicides, Hits, Bomb Plants, Bomb Defuses, Bomb Sneak Defuses, Snd Firstbloods, Snd 2-kill round, Snd 3-kill round, Snd 4-kill round, 2-piece, & 3-piece. If you are curious, you can find an explanation of each variable in the entire dataset in the Activision repository linked above.

Since we are using these models to classify wins and losses correctly, I elected to use the Area Under the Receiver Operating Characteristic (AUROC) curve as a metric for determining the best model. I used AUROC because of its balance between the True Positive Rate and the False Positive Rate. I found that the Logistic Regression model with the highest AUROC value on training data had the following variables: Assists, Headshots, Suicides, Defuses, Snd 2-kill round, Snd 3-kill round, & Snd 4-kill round. This model was then used to predict test data and produced the following AUROC curve:

Logistic AUROC Graph

It is worth noting that this model was 75% accurate when predicting wins and losses on test data. Overall, I expected this model to perform worse due to the small number of variables used. Still, it seems as if these variables do an excellent job at deciphering the wins and losses in Search & Destroy matches. You can find the actual values in the confusion matrix built by this model here.

Random Forest & XGBoost

For the second goal of this project, I used both Random Forest and XGBoost classification models to see just how well we could predict the outcome of a match. Neither of these algorithms has the same assumptions as Logistic Regression, so I used the complete set of 14 variables for each technique. Without optimizing hyperparameters, I first built both models to have a baseline model for both algorithms. After this, I decided to use a grid search on the hyperparameters in each model to find the best possible tune for the data.

I found that the optimized XGBoost model had a higher AUROC value than the optimized Random Forest model on training data, so I used the XGBoost model to predict the test data. This model produced the following AUROC curve:

XGBoost AUROC Graph

As expected, this model did much better than the Logistic Regression for predicting match outcomes! This model is ~89% accurate when predicting wins and losses on test data. You can find the confusion matrix for this model here.

What did I find?

From the Logistic Regression model, I found that a team's odds of winning the entire match increase by ~5% every time someone gets a kill with a headshot and ~54% every time the bomb gets defused. A team's odds of winning also increase by 59% every time a player has exactly two kills in a round, ~115% every time a player has precisely three kills in a round, and ~121% every time a player has precisely four kills in around. I also found that a team's odds of winning the entire match decrease by 43% every time a player commits suicide and (oddly enough) 0.34% every time a player receives an assist.

I recommend that professional COD teams looking to up their Search & Destroy win percentage need to find and recruit players with a high amount of bomb defuses and many headshots in Search & Destroy games. If I were a coach, I would be looking to grab Arcitys, Zer0, Clayster, Rated, & Silly. These are five players who have a high count of headshots and defuses in Search & Destroy matches.

If you are curious to learn about the essential variables in the XGBoost model, head over here!

Owner
Brett Vogelsang
M.S. Candidate at the Institute for Advanced Analytics at NC State University.
Brett Vogelsang
ML-powered Loan-Marketer Customer Filtering Engine

In Loan-Marketing business employees are required to call the user's to buy loans of several fields and in several magnitudes. If employees are calling everybody in the network it is also very length

Sagnik Roy 13 Jul 02, 2022
[HELP REQUESTED] Generalized Additive Models in Python

pyGAM Generalized Additive Models in Python. Documentation Official pyGAM Documentation: Read the Docs Building interpretable models with Generalized

daniel servén 747 Jan 05, 2023
Summer: compartmental disease modelling in Python

Summer: compartmental disease modelling in Python Summer is a Python-based framework for the creation and execution of compartmental (or "state-based"

6 May 13, 2022
Mortality risk prediction for COVID-19 patients using XGBoost models

Mortality risk prediction for COVID-19 patients using XGBoost models Using demographic and lab test data received from the HM Hospitales in Spain, I b

1 Jan 19, 2022
SPCL 48 Dec 12, 2022
BASTA: The BAyesian STellar Algorithm

BASTA: BAyesian STellar Algorithm Current stable version: v1.0 Important note: BASTA is developed for Python 3.8, but Python 3.7 should work as well.

BASTA team 16 Nov 15, 2022
Distributed Computing for AI Made Simple

Project Home Blog Documents Paper Media Coverage Join Fiber users email list Uber Open Source 997 Dec 30, 2022

Timeseries analysis for neuroscience data

=================================================== Nitime: timeseries analysis for neuroscience data ===============================================

NIPY developers 212 Dec 09, 2022
This is my implementation on the K-nearest neighbors algorithm from scratch using Python

K Nearest Neighbors (KNN) algorithm In this Machine Learning world, there are various algorithms designed for classification problems such as Logistic

sonny1902 1 Jan 08, 2022
Decision Weights in Prospect Theory

Decision Weights in Prospect Theory It's clear that humans are irrational, but how irrational are they? After some research into behavourial economics

Cameron Davidson-Pilon 32 Nov 08, 2021
A collection of machine learning examples and tutorials.

machine_learning_examples A collection of machine learning examples and tutorials.

LazyProgrammer.me 7.1k Jan 01, 2023
Create large-scale ML-driven multiscale simulation ensembles to study the interactions

MuMMI RAS v0.1 Released: Nov 16, 2021 MuMMI RAS is the application component of the MuMMI framework developed to create large-scale ML-driven multisca

4 Feb 16, 2022
Penguins species predictor app is used to classify penguins species created using python's scikit-learn, fastapi, numpy and joblib packages.

Penguins Classification App Penguins species predictor app is used to classify penguins species using their island, sex, bill length (mm), bill depth

Siva Prakash 3 Apr 05, 2022
Markov bot - A Writing bot based on Markov Chain for Data Structure Lab

基于马尔可夫链的写作机器人 前端 用html/css完成 Demo展示(已给出文本的相应展示) 用户提供相关的语料库后训练的成果 后端 要完成的几个接口 解析文

DysprosiumDy 9 May 05, 2022
An implementation of Relaxed Linear Adversarial Concept Erasure (RLACE)

Background This repository contains an implementation of Relaxed Linear Adversarial Concept Erasure (RLACE). Given a dataset X of dense representation

Shauli Ravfogel 4 Apr 13, 2022
Conducted ANOVA and Logistic regression analysis using matplot library to visualize the result.

Intro-to-Data-Science Conducted ANOVA and Logistic regression analysis. Project ANOVA The main aim of this project is to perform One-Way ANOVA analysi

Chris Yuan 1 Feb 06, 2022
Machine Learning Study 혼자 해보기

Machine Learning Study 혼자 해보기 기여자 (Contributors) ✨ Teddy Lee 🏠 HongJaeKwon 🏠 Seungwoo Han 🏠 Tae Heon Kim 🏠 Steve Kwon 🏠 SW Song 🏠 K1A2 🏠 Wooil

Teddy Lee 1.7k Jan 01, 2023
icepickle is to allow a safe way to serialize and deserialize linear scikit-learn models

icepickle It's a cooler way to store simple linear models. The goal of icepickle is to allow a safe way to serialize and deserialize linear scikit-lea

vincent d warmerdam 24 Dec 09, 2022
Crunchdao - Python API for the Crunchdao machine learning tournament

Python API for the Crunchdao machine learning tournament Interact with the Crunc

3 Jan 19, 2022
Crypto-trading - ML techiques are used to forecast short term returns in 14 popular cryptocurrencies

Crypto-trading - ML techiques are used to forecast short term returns in 14 popular cryptocurrencies. We have amassed a dataset of millions of rows of high-frequency market data dating back to 2018 w

Panagiotis (Panos) Mavritsakis 4 Sep 22, 2022