top of page
Search

Segmenting and Classifying the best Strikers

  • Writer: franklin obiefule
    franklin obiefule
  • Oct 30, 2024
  • 4 min read

Updated: Nov 9, 2024


Link to Github


TOOL USED

PYTHON


DATASET SOURCE ATTRIBUTION

Shahriar's Sight Academy [Udemy: Data Analysis Career Path; 72 Days of Data Analyst Bootcamp]


INTRODUCTION

Soccer Strikers play a pivotal role in deciding the fate of matches and championships. Identifying the best strikers among a pool of talent involves comprehensive data analysis techniques of various factors ranging from performance metrics to personal attributes.


BACKGROUND & MOTIVATION

As a football lover, I am excited to take on this project as it gave me more insights on how coaches, football analysts and scouts can gain valuable insights into the characteristic of top performing strikers and make informed decisions in team selection, recruitment and strategic planning.


The dataset comprises of various variables related to 500 strikers, encompassing both demographic information and performance metrics.


DATA COLLECTION

Total number of strikers in this dataset is 500. Key metrics include nationality, footedness, goals scored, assists, shot accuracy, dribbling success, consistency, versatility, big game performance, off-field conduct and penalty success rate.


DATASET LIMITATIONS

  • Checking for missing values in any column and using median to replace numeric columns and 'most frequent' for nominal columns.

  • Check for correct data types and assign integer data types for specific variables.

  • Performing descriptive analysis on the dataset.


ANALYSIS

Data Cleaning

Download the attached dataset and load it into Jupyter notebook. Load all the relevant and necessary packages for the required tasks.






Check for missing values within any column and use SimpleImputer to impute the missing values.



Use strategy 'median' for numeric and 'most frequent' for nominal columns.

Check for correct data types and assign integer data types for specific variables: 'Goals Scored', 'Assists', 'Shots on Target', 'Movement off the Ball', 'Hold-up Play', 'Aerial Duels Won', 'Defensive Contribution', 'Big Game Performance', 'Impact on Team Performance', 'Off-field Conduct'.



Descriptive Analysis

Perform descriptive analysis on the dataset. Round the output values by 2 decimal points.


Data Visualization

Perform percentage analysis on the variable 'footedness' .


Create a pie chart on the output using matplotlib. Visualize the distribution of players' footedness across different nationalities in a countplot of seaborn.


Statistical Analysis

Determine which nationality strikers have the highest average number of goals scored?


Calculate the average conversion rate for players based on their footedness


What is the distribution of players' footedness across different nationalities?

Visualize the distribution of players' footedness across different nationalities using a stacked bar chart

Create a correlation matrix with a heatmap




Check if there is any significant difference in consistency rates among strikers from various nationalities. Before doing the appropriate tests, must check for assumptions.


Check if there is any significant correlation between strikers' hold-up play and consistency rate. Must check for the assumptions.



Check if strikers' hold-up play significantly influences their consistent rate


Feature Engineering

Create a new feature - Total contribution score


Encode the Footedness and marital status by LabelEncoder


Create the dummies for Nationality and add with the data


Cluster Analysis

Perform KMeans Clustering






Data Preprocessing for ML

New feature mapping


Seleacting and Scaling features

Train test Split and Predictive Classification Analytics(building a logistic regression ML Model to predict strikers type)


Creating a confusion matrix



RESULTS

  1. What is the maximum goal scored by an individual striker?

34

  1. What is the portion of Right-footed strikers within the dataset?

53.4%

  1. Which nationality strikers have the highest average number of goals scored?

Brazil and Spain

  1. What is the average conversion rate for left-footed player?

0.198086

  1. How many left footed players are from France?

42

  1. What is the correlation co-efficient between hold up play and consistency score?

0.147

  1. What is the p-value for the shapiro wilk test of consistency score? Is it normally distributed?

0.451. Yes, normally distributed (p > 0.05)

  1. What is the p-value for the levene's test of ANOVA analysis? Is the heteroscedasticity assumed?

0.808. Yes, the heteroscedasticity is accepted (p > 0.05)

  1. Is there any significant correlation between strikers' Hold-up play and consistency rate?

Yes, there is a weak positive but significant correlation between strikers' Hold-up play and consistency rate .

  1. Describe the beta value of Hold-up Play you have found in your regression analysis.

The beta value should be 0.0015. It describes if the Hold-up Play scores increases by 1 score, the Consistency score increases by 0.0015 points.

  1. What is the average Total contribution score you get for the best strikers?

123.39

  1. What is the accuracy score of your LGR model? How many regular strikers your model predicted correctly? How many best strikers your model predicted incorrectly?

97% accuracy.

42 regular strikers model predicted correctly.

3 best strikers  model predicted incorrectly.


CONCLUSION

Through a comprehensive analysis of the dataset, we've gained valuable insights into the characteristics and performance metrics of strikers. By segmenting and classifying the strikers based on their attributes and performance, we've provided a framework for identifying top-performing strikers and predicting their performance type. This project serves as a valuable resource for football professionals and enthusiasts alike, aiding in talent identification, team selection, and strategic planning.


REFERENCES

  • Shahriar's Sight Academy [Udemy: Data Analysis Career Path; 72 Days of Data Analyst Bootcamp][Transfermarkt](https://www.transfermarkt.com/) provide player statistics.

  • [FIFA](https://www.fifa.com/) and [UEFA](https://www.uefa.com/) for official player performance data. "Pattern Recognition and Machine Learning" by Christopher Bishop.



 
 
 

Comentários


IMG_2214.jpg

 Franklin  Obiefule

I am a data analyst with specializations in Excel, SQL,  PowerBi and Python

bottom of page