A project aimed at identifying trends in consumer usage of Bellabeat devices, with the goal of leveraging these insights to foster new growth opportunities for Bellabeat and enhance its marketing strategy.
Author: Jesslyn Jane. This is a project I have worked through as part of the final capstone in Google Data Analytics Professional Certificate.
Introduction
I am wearing the hat of a junior data analyst, working in the marketing analyst team at Bellabeat, a high-tech manufacturer of health-focused products for women. Bellabeat is a successful small company, but they have the potential to become a larger player in the global smart device market. Urška Sršen, cofounder and Chief Creative Officer of Bellabeat, believes that analyzing smart device fitness data could help unlock new growth opportunities for the company. They offer different smart devices that collect data on activity, sleep, stress, and reproductive health to empower women with knowledge about their own health and habits. The main objective of this case is to analyze smart devices fitness data and determine how it could help unlock new growth opportunities for Bellabeat. We will focus on one of Bellabeat’s products: Bellabeat app. The Bellabeat app provides users with health data related to their activity, sleep, stress, menstrual cycle, and mindfulness habits. This data can help users better understand their current habits and make healthy decisions. The Bellabeat app connects to their line of smart wellness products.
Method Approach
In this project, I will use 6 steps to ensure its completion:
- Ask
- Prepare
- Process
- Analyze
- Share
- Act
Phase 1: Ask
Business Task: Identify some trends in how consumers use the Bellabeat devices and how these trends can help improve new opportunities growth for Bellabeat as well as marketing strategy.
Phase 2: Prepare
2.1 Dataset Used
In this project, I will be using datasets from FitBit Fitness Tracker Data, which has been published by Möbius in Kaggle under the CC0: Public Domain Creative Common License.
2.2 Dataset Summary
I downloaded zip files provided, then extracted to csv files. There are 18 datasets, but I only used 6 datasets for this study. Each document represents different quantitative data tracked by Fitbit.
2.3 Dataset Limitations and Integrity
I used excel to take a glimpse of the data. The data collected was only during a 31-day period (04-12-2016 to 05-12-2016), so it was quite outdated as fitness trackers matured a lot since then. No demographic information (gender, location, age) collected makes the data bias even higher.
Phase 3: Process
To begin processing the data, I used SQL in BigQuery as one of the data analytics tools, to import the dataset, do the process of cleaning and organizing. The cleaning process included adjusting data type formats, removing duplicates and null data. I extracted the clean data to new csv and stored it. I documented the whole process of cleaning.
3.1 Using Spreadsheet to Convert Data Type
Converting data type can be done by using Microsoft SQL Server Management Studio, which can be downloaded here. If you are interested in doing SQL in this server, you can seek assistance from Sayantan Bagchi’s Bellabeat Case Study.
However, using BigQuery to convert the data type is quite challenging, as you need to convert it using CAST() function everytime in each query statement. As I am still learning and finding ways up until now, I decided to use Spreadsheet to change the data type of all date column into “Date” format.
3.2 Importing Datasets
After converting some date-related data into “Date” format, I opened Bigquery Console, then select “Create Project”. Typed down the name of the project you are going to explore, in this case I used data-analytics-00. I created a new dataset for Bellabeat and named it bellabeat. Inside bellabeat dataset, I imported the .csv datasets I previously downloaded from FitBit Fitness Tracker Data (Note: Don’t forget to unzip the file).
After that, I started my work by finding the total number of users’ id. All the query syntax can be found here.
3.3 Comparing Datasets
There are 3 datasets that seem to have some similarities: daily_calories, daily_intensities, daily_steps with daily_activity. I checked if the id and date of those 3 tables are the same with the one in daily_activity.
It showed that data in calories, intensities, and steps have the same number of rows and described the exact data. We could conclude that the daily data of calories, intensities, and steps are the same with daily_activity, so we only focus on daily_activity data.
The remaining datasets that would be used in this study:
- daily_activity
- sleep_day
- weight_log
3.4 Checking Start-End Date and Id
I checked the start and end date in each dataset, as well as the length of id.
It showed that all datasets have the same start and end date: start 2016-04-12 and end 2016-05-12. In term of id’s length, all datasets also showed the same length: 10 characters.
3.5 Cleaning Data
3.5.1 Find Duplicates
Duplicates might decrease the quality of data, hereby I had to find and remove them.
According to the result of finding duplicates, it showed that there are 3 duplicate rows in sleep_day dataset. We need to create a new sleep_day table, and remove the duplicates in the new table. In this case, I named the new table: sleep_day_new.
3.5.2 Find Unsuitable Data
During the checking and cleaning process, I found that there were some zero data in TotalSteps column inside the daily_activity dataset. Therefore, I decided to check and remove those zero value. I created new table and named it daily_activity_new, so that the previous dataset still remained.
3.5.3 Find Null Data
Null values also contributed in the quality of the data. I checked whether the datasets have null data, then removed them.
Phase 4: Analyze
After cleaning the data, 3-clean dataset will be used to do the analysis process. In this process, I organized and formatted the data, performed some calculations, and identified trends as well as relationships between each variable.
4.1 User Type Distribution by Tracker-Wear
Firstly, I wanted to breakdown the users by how much they wore their FitBit Fitness Tracker. I created three user types:
- Vigorous User → wear their tracker for 21 – 31 days
- Moderate User → wear their tracker for 11 – 20 days
- Gentle User → wear their tracker for 1 -10 days
Key Takeaway: More than 80% of the users used Fitbit Fitness Tracker frequently – between 21 to 31 days (Vigorous User).
4.2 Look Deep into Activity
4.2.1 Average activity minutes by day of week
In this daily_activity_new dataset, I wanted to analyze some parameters. The first one was the average active minutes by day of week.
Key Takeaway:
- Total average active minutes shows slightly difference throughout the week
- Sedentary Minutes are the highest type of active minutes
4.2.2 User type distribution by total steps
Tyron (2003) pointed out in his journal that having steps everyday might be a perfect parameter to acquire data about our physical activity. Therefore, we will use total steps data as part of our analysis.
Tudor-Locke and Bassett (2004) proposed a steps-per-day classification, such as:
- Sedentary Lifestyle → total steps per day <5000
- Physically Inactive → total steps per day 5000 - 7499
- Moderately Active → total steps per day 7500 - 9999
- Physically Active → total steps per day 10000 – 12499
- Physically Active → total steps per day >= 12500
I made the user type distribution based on the above categories.
Key Takeaway:
- More than a third of the users are considered moderately active
4.2.3 Average Steps, distances, calories by day of week
Next, I wanted to know the average of steps, distances, and calories spent per week.
Key Takeaway:
- The highest average step and distance per day was on Sunday and Wednesday with almost 9 thousand steps and 6 km distance.
- Average calories shows little difference throughout the week.
4.2.4 Average total steps vs calories
I wanted to look if there was a correlation between total steps and calories.
Key Takeaway:
- There is high correlation between average total step and calories
4.3 Look Deep into Sleep
4.3.1 Average sleep time and awake time by day of week
In this sleep_day_new dataset, I wanted to analyze 2 parameters. The first one was the average sleep time and awake per week.
Key Takeaway:
- There isn’t a whole lot of difference between each day in terms of average time in bed and time awake.
- The highest average time in bed within a week falls on Tuesday.
Further analysis:
- Check on whether this pattern was the same every month
4.3.2 Average total minutes asleep, steps, calories
Next, I combined daily_activity_new and sleep_day_new to compare the total minutes asleep, steps, and calories. I used inner join formula because not all the sleep data are in the daily activity table, so I only needed to calculate the overlayed data.
Key Takeaway:
- There is no correlation between average total steps and average amount of minutes users sleep.
- No correlation is found between average total minutes of sleep and average calories.
4.4 Look Deep into Weight
4.4.1 Average weight vs non-sedentary minutes
In this weight_log_new dataset, I wanted to see the trend of individuals’ weight by their activity minutes.
Key Takeaway:
- User with low exercise shows overweighted (134 kg).
- Users with weight of <70 kg had been more active.
Phase 5: Share
The share phase is done by presenting this report.
Some conclusions I can draw from the data:
- In a period of a month (April to May 2016), there were 83% of users in Vigorous category when it came to how frequently they wore their tracker.
- Most of the users were not active, as sedentary minutes show the highest value.
- 73,5% users were not reaching 10.000 steps a day.
- Wednesday and Sunday made up to the most active day.
- Higher steps spent means that higher calories was burnt.
- The highest average time in bed within a week falls on Tuesday.
- There is no significant correlation between steps, calories, and total minutes asleep.
- The more intense users worked out, the better for them to reach their weight goal.
Phase 6: Act
The act phase would be done by the production and marketing team of the company. The main takeaway will be the top three recommendations:
-
Marketing team can showcase about the importance of doing sport activities daily, by highlighting the strong correlation between total active minutes with healthy weight, so that users might build a self-conscious about using the product regularly.
-
Production team can improve the notification feature of the tracker app as reminders for the users to achieve their goal and increase total steps each day.
-
Production team can provide reward system, based on total amount of steps reaching daily, weekly, monthly, to boost the energy and motivation of workout.
Journal Reference
- Tryon WW. Activity measurement in psychology and medicine. New York: Springer Science & Business Media; 2013.
- Tudor-Locke CE, Bassett DR. How many steps are enough? Pedometer-determined physical activity indices. Sports Med. 2004;34(1):1–8.