Accenture Data Analytics Simulation
- franklin obiefule
- Nov 8, 2024
- 3 min read
Updated: Nov 9, 2024

LINK TO GITHUB
TOOL USED
Excel, PowerPoint
SOURCE ATTRIBUTION
INTRODUCTION
This is a virtual data analysis project simulation at ACCENTURE for a client called SOCIAL BUZZ. SOCIAL BUZZ is a social media and content creation industry established in 2010. They are located in San Francisco with a staff number of 250.
BACKGROUND & MOTIVATION
Social Buzz has reached over 500 million active users each month in the last 5 years. They have scaled quicker than anticipated and need the help of an advisory firm to oversee their scaling process effectively. Due to their rapid growth and digital nature of their core product, the amount of data they create, collect and must analyse is huge. Everyday, over 100,000 pieces of content ranging from text, images, videos and GIFS are posted. All this data are highly unstructured and requires extremely sophisticated and expensive technology to manage and maintain.
Our job at Accenture include:
An audit of their big data practice
Recommendations for a successful IPO
An analysis of their content categories that highlights the top 5 categories with the largest aggregate popularity.
I was motivated to take on this project as it gave me a new learning curve on how a real life project could look like. A summary of the processes needed for a successful data analysis project include:
Data Understanding
Data Cleaning
Data Modelling
Data Analysis
Discover & Present Insights
DATA COLLECTION
7 datasets were collected which include User, Profile, Location, Session, Content, Reaction and Reaction Types.


DATASET LIMITATION
Removing rows that have missing values
Changing the data types of some values within a column
Removing irrelevant columns
ANALYSIS
Here's a model of the 7 datasets

Definitions of different data types:
String - Sequence of characters, digits, or symbols—always treated as text
UUID - Universally Unique Identifiers
Array - List with a number of elements in a specific order—typically of the same type
Integer - Numeric data type for numbers without fractions
Timestamp - Number of seconds that have elapsed since midnight (00:00:00 UTC), 1st January 1970 (Unix time)
Source: Direct extract from Amplitude
The datasets; Reaction, Content, and Reaction Types are our relevant data sets.
To clarify why I made this selection:
The brief carefully it states that the client wanted to see “An analysis of their content categories showing the top 5 categories with the largest popularity”.
As explained in the data model, popularity is quantified by the “Score” given to each reaction type.
We therefore need data showing the content ID, category, content type, reaction type, and reaction score.
So, to figure out popularity, we’ll have to add up which content categories have the largest score.
But! Before we begin to work with the data sets, we’ll need to ensure that the data is clean and ready for analysis…
Raw dataset for Reaction

Raw dataset for Reaction Types

Raw dataset for Content

I cleaned the data by:
removing rows that have values which are missing,
changing the data type of some values within a column, and
removing columns which are not relevant to this task.
Data Modelling
Okay, I'm nearly there!.
Now I want to figure out the top 5 categories. To complete data modelling, I followed these steps:
1. Create a final data set by merging your three tables together
Using the Reaction table as my base table, I first joined the relevant columns from Content data set, and then the Reaction Types data set.
Function used: VLOOKUP
2. Figure out the Top 5 performing categories
Add up the total scores for each category.
Function used: SUMIF
The end result should be one spreadsheet which contains:
A cleaned dataset
The top 5 categories
The cleaned dataset

The top 5 categories

Data Visualisation and Storytelling
Link to PowerPoint file on Github https://github.com/franklintheanalyst1/franklintheanalyst1/blob/main/Top%205%20categories%20of%20Social%20Media%20Buzz.pptx
Insights



Video Presentation
RESULTS
The top 5 categories in a descending order are:
Animals- 21.3% (aggregate score of 68,624)
Science- 20.3% (aggregate score of 65,405)
Healthy eating- 19.6% (aggregate score of 63,138)
Technology- 19.6% (aggregate score of 63,035)
Food- 19.1% (aggregate score of 61,598)
CONCLUSION
Animals and science are the top 2 categories showing people enjoy “real-life” and factual content.
Technology is also among the top 5 categories showing people love scientific-related and practical content.
Since food and healthy eating are popular categories, insights from this could be used to boost engagement. For example, you could run a campaign with content focused on this category or work with healthy eating brands to promote content.
REFERENCES
Google Analytics
Comments