top of page
Search

Accenture Data Analytics Simulation

  • Writer: franklin obiefule
    franklin obiefule
  • Nov 8, 2024
  • 3 min read

Updated: Nov 9, 2024


LINK TO GITHUB


TOOL USED

Excel, PowerPoint


SOURCE ATTRIBUTION


INTRODUCTION

This is a virtual data analysis project simulation at ACCENTURE for a client called SOCIAL BUZZ. SOCIAL BUZZ is a social media and content creation industry established in 2010. They are located in San Francisco with a staff number of 250.


BACKGROUND & MOTIVATION

Social Buzz has reached over 500 million active users each month in the last 5 years. They have scaled quicker than anticipated and need the help of an advisory firm to oversee their scaling process effectively. Due to their rapid growth and digital nature of their core product, the amount of data they create, collect and must analyse is huge. Everyday, over 100,000 pieces of content ranging from text, images, videos and GIFS are posted. All this data are highly unstructured and requires extremely sophisticated and expensive technology to manage and maintain.

Our job at Accenture include:

  1. An audit of their big data practice

  2. Recommendations for a successful IPO

  3. An analysis of their content categories that highlights the top 5 categories with the largest aggregate popularity.


I was motivated to take on this project as it gave me a new learning curve on how a real life project could look like. A summary of the processes needed for a successful data analysis project include:

  • Data Understanding

  • Data Cleaning

  • Data Modelling

  • Data Analysis

  • Discover & Present Insights


DATA COLLECTION

7 datasets were collected which include User, Profile, Location, Session, Content, Reaction and Reaction Types.



DATASET LIMITATION

  • Removing rows that have missing values

  • Changing the data types of some values within a column

  • Removing irrelevant columns


ANALYSIS

Here's a model of the 7 datasets



Definitions of different data types:

  • String - Sequence of characters, digits, or symbols—always treated as text

  • UUID - Universally Unique Identifiers

  • Array - List with a number of elements in a specific order—typically of the same type

  • Integer - Numeric data type for numbers without fractions

  • Timestamp - Number of seconds that have elapsed since midnight (00:00:00 UTC), 1st January 1970 (Unix time)

Source: Direct extract from Amplitude


The datasets; Reaction, Content, and Reaction Types are our relevant data sets.

To clarify why I made this selection:

  • The brief carefully it states that the client wanted to see “An analysis of their content categories showing the top 5 categories with the largest popularity”.

  • As explained in the data model, popularity is quantified by the “Score” given to each reaction type.

  • We therefore need data showing the content ID, category, content type, reaction type, and reaction score.

  • So, to figure out popularity, we’ll have to add up which content categories have the largest score.

But! Before we begin to work with the data sets, we’ll need to ensure that the data is clean and ready for analysis…

Raw dataset for Reaction

Raw dataset for Reaction Types

Raw dataset for Content

I cleaned the data by:

  • removing rows that have values which are missing,

  • changing the data type of some values within a column, and

  • removing columns which are not relevant to this task.


Data Modelling

Okay, I'm nearly there!. 

Now I want to figure out the top 5 categories. To complete data modelling, I followed these steps:


1. Create a final data set by merging your three tables together

  • Using the Reaction table as my base table, I first joined the relevant columns from Content data set, and then the Reaction Types data set.

  • Function used: VLOOKUP

     

2. Figure out the Top 5 performing categories

  • Add up the total scores for each category.

  • Function used: SUMIF


The end result should be one spreadsheet which contains:

  1. A cleaned dataset

  2. The top 5 categories


The cleaned dataset


The top 5 categories


Data Visualisation and Storytelling


Insights





Video Presentation


RESULTS

The top 5 categories in a descending order are:

  1. Animals- 21.3% (aggregate score of 68,624)

  2. Science- 20.3% (aggregate score of 65,405)

  3. Healthy eating- 19.6% (aggregate score of 63,138)

  4. Technology- 19.6% (aggregate score of 63,035)

  5. Food- 19.1% (aggregate score of 61,598)


CONCLUSION

Animals and science are the top 2 categories showing people enjoy “real-life” and factual content.

Technology is also among the top 5 categories showing people love scientific-related and practical content.

Since food and healthy eating are popular categories, insights from this could be used to boost engagement. For example, you could run a campaign with content focused on this category or work with healthy eating brands to promote content.


REFERENCES




 
 
 

Comments


IMG_2214.jpg

 Franklin  Obiefule

I am a data analyst with specializations in Excel, SQL,  PowerBi and Python

bottom of page