top of page
Search

Segmenting and Classifying the best Strikers

  • Writer: franklin obiefule
    franklin obiefule
  • Oct 30, 2024
  • 4 min read

Updated: Nov 9, 2024


ree

Link to Github


TOOL USED

PYTHON


DATASET SOURCE ATTRIBUTION

Shahriar's Sight Academy [Udemy: Data Analysis Career Path; 72 Days of Data Analyst Bootcamp]


INTRODUCTION

Soccer Strikers play a pivotal role in deciding the fate of matches and championships. Identifying the best strikers among a pool of talent involves comprehensive data analysis techniques of various factors ranging from performance metrics to personal attributes.


BACKGROUND & MOTIVATION

As a football lover, I am excited to take on this project as it gave me more insights on how coaches, football analysts and scouts can gain valuable insights into the characteristic of top performing strikers and make informed decisions in team selection, recruitment and strategic planning.


The dataset comprises of various variables related to 500 strikers, encompassing both demographic information and performance metrics.


DATA COLLECTION

Total number of strikers in this dataset is 500. Key metrics include nationality, footedness, goals scored, assists, shot accuracy, dribbling success, consistency, versatility, big game performance, off-field conduct and penalty success rate.


DATASET LIMITATIONS

  • Checking for missing values in any column and using median to replace numeric columns and 'most frequent' for nominal columns.

  • Check for correct data types and assign integer data types for specific variables.

  • Performing descriptive analysis on the dataset.


ANALYSIS

Data Cleaning

Download the attached dataset and load it into Jupyter notebook. Load all the relevant and necessary packages for the required tasks.



ree


ree

Check for missing values within any column and use SimpleImputer to impute the missing values.



ree

Use strategy 'median' for numeric and 'most frequent' for nominal columns.

ree

Check for correct data types and assign integer data types for specific variables: 'Goals Scored', 'Assists', 'Shots on Target', 'Movement off the Ball', 'Hold-up Play', 'Aerial Duels Won', 'Defensive Contribution', 'Big Game Performance', 'Impact on Team Performance', 'Off-field Conduct'.

ree

ree

ree

Descriptive Analysis

Perform descriptive analysis on the dataset. Round the output values by 2 decimal points.


ree

Data Visualization

Perform percentage analysis on the variable 'footedness' .


ree

Create a pie chart on the output using matplotlib. Visualize the distribution of players' footedness across different nationalities in a countplot of seaborn.


ree

Statistical Analysis

Determine which nationality strikers have the highest average number of goals scored?


ree

Calculate the average conversion rate for players based on their footedness


ree

What is the distribution of players' footedness across different nationalities?

ree

Visualize the distribution of players' footedness across different nationalities using a stacked bar chart

ree

Create a correlation matrix with a heatmap


ree

ree

Check if there is any significant difference in consistency rates among strikers from various nationalities. Before doing the appropriate tests, must check for assumptions.


ree

Check if there is any significant correlation between strikers' hold-up play and consistency rate. Must check for the assumptions.


ree

ree

Check if strikers' hold-up play significantly influences their consistent rate


ree

Feature Engineering

Create a new feature - Total contribution score


ree

Encode the Footedness and marital status by LabelEncoder


ree

Create the dummies for Nationality and add with the data


ree

Cluster Analysis

Perform KMeans Clustering


ree

ree

ree

ree

ree

Data Preprocessing for ML

New feature mapping


ree

Seleacting and Scaling features

ree

Train test Split and Predictive Classification Analytics(building a logistic regression ML Model to predict strikers type)


ree

Creating a confusion matrix


ree

RESULTS

  1. What is the maximum goal scored by an individual striker?

34

  1. What is the portion of Right-footed strikers within the dataset?

53.4%

  1. Which nationality strikers have the highest average number of goals scored?

Brazil and Spain

  1. What is the average conversion rate for left-footed player?

0.198086

  1. How many left footed players are from France?

42

  1. What is the correlation co-efficient between hold up play and consistency score?

0.147

  1. What is the p-value for the shapiro wilk test of consistency score? Is it normally distributed?

0.451. Yes, normally distributed (p > 0.05)

  1. What is the p-value for the levene's test of ANOVA analysis? Is the heteroscedasticity assumed?

0.808. Yes, the heteroscedasticity is accepted (p > 0.05)

  1. Is there any significant correlation between strikers' Hold-up play and consistency rate?

Yes, there is a weak positive but significant correlation between strikers' Hold-up play and consistency rate .

  1. Describe the beta value of Hold-up Play you have found in your regression analysis.

The beta value should be 0.0015. It describes if the Hold-up Play scores increases by 1 score, the Consistency score increases by 0.0015 points.

  1. What is the average Total contribution score you get for the best strikers?

123.39

  1. What is the accuracy score of your LGR model? How many regular strikers your model predicted correctly? How many best strikers your model predicted incorrectly?

97% accuracy.

42 regular strikers model predicted correctly.

3 best strikers  model predicted incorrectly.


CONCLUSION

Through a comprehensive analysis of the dataset, we've gained valuable insights into the characteristics and performance metrics of strikers. By segmenting and classifying the strikers based on their attributes and performance, we've provided a framework for identifying top-performing strikers and predicting their performance type. This project serves as a valuable resource for football professionals and enthusiasts alike, aiding in talent identification, team selection, and strategic planning.


REFERENCES

  • Shahriar's Sight Academy [Udemy: Data Analysis Career Path; 72 Days of Data Analyst Bootcamp][Transfermarkt](https://www.transfermarkt.com/) provide player statistics.

  • [FIFA](https://www.fifa.com/) and [UEFA](https://www.uefa.com/) for official player performance data. "Pattern Recognition and Machine Learning" by Christopher Bishop.



 
 
 

Comments

Couldn’t Load Comments
It looks like there was a technical problem. Try reconnecting or refreshing the page.
IMG_2214.jpg

 Franklin  Obiefule

I am a data analyst with specializations in Excel, SQL,  PowerBi and Python

bottom of page