Project Overview
Cyclistic is a successful American bicycle-sharing program that was established in 2016. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system at any time.
Cyclistic’s marketing strategy has primarily focused on building general awareness and appealing to broad consumer segments. The program offers a variety of pricing plans, including single-ride passes, full-day passes, and annual memberships. Cyclistic classifies its riders into two groups based on pricing plans: casual riders (users who purchase single-ride passes or full-day passes) and annual members (users who purchase an annual membership).
Cyclistic’s flexible pricing plans attract a larger customer base, but financial analysts have determined that annual members are more profitable. However, casual riders are already aware of the Cyclistic program and have chosen Cyclistic to meet their mobility needs. This suggests that a marketing campaign that targets existing customers is likely to be more effective at expanding the business than a campaign that targets only new customers.
Therefore, Cyclistic’s marketing analytics team is interested in understanding how casual riders and annual members use Cyclistic bikes differently. By understanding these differences, the marketing analytics team can develop more targeted marketing strategies to convert casual riders into annual members.
Dataset
This project analyzes Cyclistic’s historical bike trip data from January to December 2022, downloaded and organized in CSV format from the Divvy system data website. The 2022 data comprises twelve monthly datasets, each containing detailed trip information. These datasets hold the following fields for each trip:
- ride_id: unique ID number for all rides
- rideable_type: type of bike
- started_at: date and time the ride started 4, ended_at: date and time the ride ended
- start_station_name: name of the station where the ride started
- start_station_id: ID number of the station where the ride started
- end_station_name: name of the station where the ride started
- end_station_id: ID number of the station where the ride started
- start_lat: latitude of the location where the ride started
- start_lng: longitude of the location where the ride started
- end_lat: latitude of the location where the ride ended
- end_lng: longitude of the location where the ride ended
- member_casual: type of user
These credible public datasets, collected and updated monthly by the company since 2013, offer valuable insights into Cyclistic’s usage patterns across different customers. While data privacy regulations restrict access to personally identifiable information (e.g., linking pass purchases to credit cards), the rich data remains crucial for understanding how customer types use Cyclistic bikes.
However, before utilizing the data for analysis, it requires processing. This involves cleaning inconsistencies and errors, and transforming it into a suitable format for analysis.
Note: The datasets have a different name because Cyclistic is a fictional company. For the purposes of this case study, the datasets are appropriate and will enable you to answer the business questions. The data has been made available by Motivate International Inc. under this license.
Methodology
This project leverages the six-step data analysis framework to unlock transformative insights for Cyclistic’s growth. From pinpointing key business questions to extracting actionable recommendations, each step is meticulously designed to yield valuable discoveries.
To efficiently manage and analyze the extensive data, this project utilizes the powerful capabilities of R and RStudio. These tools enable us to delve deeper into the data, extracting meaningful insights that will inform critical decisions and shape Cyclistic’s future.
Data Cleaning
This section dives into the data cleaning process, essential for preparing the dataset for analysis. It covers key steps such as importing the data, removing duplicates, correcting structural errors, handling missing values, adding relevant columns, and dropping unnecessary data. Each step contributes to ensuring high-quality, reliable data for accurate insights.
Import the Data
I begin by equipping myself with essential libraries like tidyverse, providing me with the ability to model, transform, and visualize data throughout the journey.
library(tidyverse)
library(skimr)
library(scales)
Then, I start importing all 12 monthly datasets and merging them into a single dataset named trip_data. To achieve this, I leverage two key functions:
list.files
to generate a list of all file paths for efficient reading, andread_csv
to iteratively read each file into memory.
Once all files are read, I use the
bind_rows
function to seamlessly stitch
them together into a unified dataset.
trip_data <- list.files(path = "./data/", pattern = "*-divvy-tripdata.csv", full.names = TRUE) %>%
lapply(read_csv) %>%
bind_rows %>%
arrange(started_at)
I would like to take a quick look with the dataset. Cyclists pedaled their way to more than 5 million trips on Cyclistic bikes in 2022, demonstrating the significant impact of bike-sharing programs.
dim(trip_data)
## [1] 5667717 13
head(trip_data)
Remove Duplicates
I use the duplicated
and
sum
functions to meticulously check for
duplicate trip data. The results reveal no duplicate values, indicating
excellent data integrity and providing a reliable foundation for further
analysis.
sum(duplicated(trip_data$ride_id))
## [1] 0
Correct Structural Errors
The next step is to address structural errors, those inconsistencies in data formatting that can lead to misinterpretations. This often takes the form of typos, inconsistent capitalization, or duplicate entries. With this in mind, I’ll primarily focus on cleaning up the member type, bike type, and start and end station columns, as they contain text data and are particularly susceptible to such issues.
Member Type
The member_casual column has two distinct values: casual and member. This perfectly aligns with our goal of comparing bike usage between annual members and casual riders, as distinct user types are clearly identified in the data
member_type <- count(trip_data, member_casual, name = "count")
member_type
Bike Type
To understand the distribution of bike types used, I take a look at the rideable_type column. While classic_bike and electric_bike have over 2.6 million rides, docked_bike shows a significantly lower count with only 170,000.
bike_type <- count(trip_data, rideable_type, name = "count")
bike_type
Investigating further, I can confirm that docked_bike refers to the same type of bike as docked_bike. Therefore, I replace docked_bike with classic_bike in the column. This correction leaves only classic bike and electric_bike as valid categories for further analysis.
trip_data_v2 <- trip_data %>%
mutate(rideable_type = str_replace_all(rideable_type, "docked_bike", "classic_bike"))
bike_type_v2 <- count(trip_data_v2, rideable_type, name = "count")
bike_type_v2
Start and End Station
To thoroughly examine station names, I create separate start_station and end_station data.
start_station <- trip_data_v2 %>%
count(start_station_name, name = "count") %>%
arrange(start_station_name)
start_station
After verifying their validity, I identify two areas requiring correction:
- Removing Test Stations: A total of eight test
stations are found in the data: “Pawel Bialowas - Test- PBSC charging
station”, “Hastings WH 2”, “DIVVY CASSETTE REPAIR MOBILE STATION”, “Base
- 2132 W Hubbard Warehouse”, “Base - 2132 W Hubbard”, “NewHastings”,
“WestChi”, and “WEST CHI-WATSON”. I filter these out using the
filter
function to ensure accuracy in our analysis.
test_station_list <- c("Pawel Bialowas - Test- PBSC charging station",
"Hastings WH 2",
"DIVVY CASSETTE REPAIR MOBILE STATION",
"Base - 2132 W Hubbard Warehouse",
"Base - 2132 W Hubbard",
"NewHastings",
"WestChi",
"WEST CHI-WATSON")
trip_data_v2 <- trip_data_v2 %>%
filter(!(trip_data_v2$start_station_name %in% test_station_list |
trip_data_v2$end_station_name %in% test_station_list))
- Addressing Inconsistencies within Station Names:
Typos (e.g., “Michgan” instead of “Michigan”), special symbols (e.g.,
‘*’), and directional words (e.g., “north” or “south”) are present in
station names. To address these inconsistencies, I use the
str_replace_all
function to remove them, ensuring consistent data for further analysis.
words <- c("*", " - Charging", " (Temp)", "amp;", "Public Rack - ",
" - north corner", " - south corner", " - midblock south", " - midblock",
" - North", " - South", " - East", " - West",
" - NE", " - NW", " - SE", " - SW",
" - N", " - S", " - E", " - W",
" NE", " NW", " SE", " SW")
for (word in words) {
trip_data_v2 <- trip_data_v2 %>%
mutate(start_station_name = str_replace_all(start_station_name, fixed(word, ignore_case = TRUE), "")) %>%
mutate(end_station_name = str_replace_all(end_station_name, fixed(word, ignore_case = TRUE), ""))
}
trip_data_v2 <- trip_data_v2 %>%
mutate(start_station_name = str_replace_all(start_station_name, regex(" (?<=\\s)[N|S|E|W]$", ignore_case = TRUE), "")) %>%
mutate(end_station_name = str_replace_all(end_station_name, regex(" (?<=\\s)[N|S|E|W]$", ignore_case = TRUE), ""))
After applying these corrections, a re-examination of the start_station_v2 and end_station_v2 data confirms the successful removal of test stations and inconsistencies, providing a clean and consistent dataset for subsequent analysis.
start_station_v2 <- trip_data_v2 %>%
count(start_station_name, name = "count") %>%
arrange(start_station_name)
start_station_v2
Handle Missing Data
I use the colSum
and
is.na
functions to meticulously identify
missing values within the dataset. The analysis reveals six columns with
missing data: start_station_name (833,025
rows), start_station_id (833,025 rows),
end_station_name (891,896 rows),
end_station_id (891,896 rows),
end_lat (5,858 rows), and
end_lng (5,858 rows).
colSums(is.na(trip_data_v2))
## ride_id rideable_type started_at ended_at
## 0 0 0 0
## start_station_name start_station_id end_station_name end_station_id
## 833025 833025 891896 891896
## start_lat start_lng end_lat end_lng
## 0 0 5858 5858
## member_casual
## 0
To address these missing values, I prioritize start_station_name and end_station_name, as their missing entries can be imputed using geographic coordinates as a reference. I refrain from imputing the remaining four columns: start_station_id and end_station_id will be removed in the subsequent step due to their irrelevance to the analysis, while end_lat and end_lng lack suitable reference data for imputation and will also be discarded.
start_station_location <- trip_data_v2 %>%
count(start_lat, start_lng , start_station_name, name = "count") %>%
arrange(start_lat, start_lng)
start_station_location
To effectively handle missing station names, I create four temporary columns representing increasingly granular levels of start and end coordinates (ranging from 5 digits to 1 digit). These columns facilitate accurate matching and assignment of station names to missing entries, prioritizing those with more precise location data. This iterative approach ensures that the most reliable matches are filled in first.
digit <- 5
while (digit > 1) {
trip_data_v2 <- trip_data_v2 %>%
mutate(start_lat_round = round(start_lat, digits = digit),
start_lng_round = round(start_lng, digits = digit),
end_lat_round = round(end_lat, digits = digit),
end_lng_round = round(end_lng, digits = digit))
trip_data_v2 <- trip_data_v2 %>%
group_by(start_lat_round, start_lng_round) %>%
fill(start_station_name, .direction = "downup") %>%
fill(start_station_id, .direction = "downup") %>%
ungroup()
trip_data_v2 <- trip_data_v2 %>%
group_by(end_lat_round, end_lng_round) %>%
fill(end_station_name, .direction = "downup") %>%
fill(end_station_id, .direction = "downup") %>%
ungroup()
digit <- digit - 1
}
trip_data_v2 <- trip_data_v2 %>%
select(!c(start_lat_round, start_lng_round, end_lat_round, end_lng_round))
The imputation process yields a significant reduction in missing values: start_station_name dropped by 11,821 records each, while end_station_name saw a decrease of 41,386 records each.
colSums(is.na(trip_data_v2))
## ride_id rideable_type started_at ended_at
## 0 0 0 0
## start_station_name start_station_id end_station_name end_station_id
## 11821 11821 41386 41386
## start_lat start_lng end_lat end_lng
## 0 0 5858 5858
## member_casual
## 0
start_station_location_v2 <- trip_data_v2 %>%
count(start_lat, start_lng, start_station_name, name = "count") %>%
arrange(start_lat, start_lng)
start_station_location_v2
To ensure a focus on complete trips, I subsequently remove all
remaining rows with missing values using the
drop_na
function. This results in
5,614,669 remaining trips, representing a relatively minor reduction
from the original dataset.
trip_data_v2 <- drop_na(trip_data_v2)
dim(trip_data_v2)
## [1] 5614669 13
Add Relevant Columns
To facilitate time-series analysis, I create three new columns: ride_length_min, day_of_week, and month.
trip_data_v2$ride_length_min <- as.double(difftime(trip_data_v2$ended_at, trip_data_v2$started_at, units = "mins"))
trip_data_v2$day_of_week <- wday(trip_data_v2$started_at, label = TRUE)
trip_data_v2$month <- format(trip_data_v2$started_at, "%b")
However, both day_of_week and
month are initially stored as unsorted text
strings and not categorized in their natural order. Therefore, I utilize
the ordered
function to transform them
based on their natural order (Monday to Sunday for days and January to
December for months), ensuring their suitability for further
analysis.
trip_data_v2 <- within(trip_data_v2, {
day_of_week <- ordered(day_of_week, levels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"))
month <- ordered(month, levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
})
Drop Unnecessary Data
The final step involves identifying and addressing outliers in the data. This is crucial to ensure the validity of the numbers and prevent distortion of the analysis. Outliers are identified using interquartile range (IQR) method, and their relevance is assessed based on their impact on the analysis and potential for being errors. If an outlier is deemed irrelevant or a mistake, it’s removed from the dataset. This process may involve dropping irrelevant columns, correcting error inputs, and filtering out potential outliers.
Irrelevant Columns
Following the previous step’s conclusion, I use the
select
function eliminate the
start_station_id and
end_station_id columns from the dataset, as
they are not meaningful for this project’s analysis.
trip_data_v2 <- trip_data_v2 %>%
select( !c(start_station_id, end_station_id) )
Error Inputs
The output reveals inconsistencies in ride lengths, particularly with unusually short or long durations. To address these inconsistencies, I filter out invalid rides based on the following criteria:
- Rides with zero start/end geographic coordinates
- Rides with less than 60 seconds in duration (including negative times) could indicate false starts or attempts to secure the bike, and will be excluded from further analysis.
- Rides exceeding 24 hours in length are considered invalid outliers, as users are not expected to keep bikes for longer than a day.
Note: See the Divvy System Data and Divvy article for a detailed explanation of trip durations.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -10353.35 5.80 10.25 16.29 18.38 32035.45
trip_data_v2 <- trip_data_v2 %>%
filter( !(trip_data_v2$start_lat == 0 | trip_data_v2$start_lng == 0 | trip_data_v2$end_lat == 0 | trip_data_v2$end_lng == 0 |
trip_data_v2$ride_length_min < 1 | trip_data_v2$ride_length_min > 1440) )
A thorough re-examination of the data confirms the successful removal of all previously identified errors.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 6.05 10.48 16.55 18.67 1439.37
Potential Outliers
Before removing outliers, I examine the distribution of ride_length_min using a boxplot. The numerous outliers obscure the underlying data trend. These extreme values, while technically part of the dataset, don’t reflect the typical ride duration.
ggplot(data = trip_data_v2, aes(x = member_casual, y = ride_length_min, fill = member_casual)) +
geom_boxplot() +
coord_flip() +
theme(legend.position = "none") +
labs(x = "Member type",
y = "Ride length (in minutes)",
title = "Box plot showing 'ride_length_min' before removing outliers")
To identify them, I use the Interquartile Range (IQR) method. The IQR measures the spread of the middle 50% of the data, excluding the lower and upper quartiles (Q1 and Q3). In this case, Q1 is 6.05 minutes and Q3 is 18.67 minutes, making the IQR 12.62 minutes.
quantiles <- as.numeric(quantile(trip_data_v2$ride_length_min, probs = c(0.25, 0.50, 0.75), na.rm = FALSE))
iqr_value <- IQR(trip_data_v2$ride_length_min)
## [1] "Q1: 6.05 minutes"
## [1] "Q3: 18.66667 minutes"
## [1] "IQR: 12.61667 minutes"
After this, I need to set up a fence outside of Q1 and Q3. Outliers lie outside a “fence” built around this central portion of the data. This fence is constructed by adding and subtracting 1.5 times the IQR to Q1 and Q3, respectively. From the output below, the lower fence is -12.88 minutes and the upper fence is 37.59 minutes.
lower_fence <- quantiles[1] - ( 1.5 * iqr_value )
upper_fence <- quantiles[3] + ( 1.5 * iqr_value )
## [1] "Lower Fence: -12.875 minutes"
## [1] "Upper Fence: 37.59167 minutes"
So, any value less than -12.88 minutes or greater than 37.59 minutes is considered an outlier and removed from the data.
trip_data_v2 <- trip_data_v2 %>%
filter(!(trip_data_v2$ride_length_min < lower_fence | trip_data_v2$ride_length_min > upper_fence))
Filtering out these outliers significantly cleans the data. The maximum ride length after removal drops to 37.58 minutes, confirming the successful elimination of extreme values. This refined dataset provides a more accurate representation of typical ride durations for further analysis.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 5.783 9.717 11.958 16.150 37.583
ggplot(data = trip_data_v2, aes(x = member_casual, y = ride_length_min, fill = member_casual)) +
geom_boxplot() +
coord_flip() +
theme(legend.position = "none") +
labs(x = "Member type",
y = "Ride length (in minutes)",
title = "Box plot showing 'ride_length_min' after removing outliers")
Validate the Data
After data cleaning, I re-examined the dataset, finding it now contains 5,091,142 trip records, a reduction of 576,575 (10%) compared to the original 5,667,717. There are 14 columns in the data, all of which have no missing values, ensuring a complete dataset for thorough analysis.
skim_without_charts(trip_data_v2)
Name | trip_data_v2 |
Number of rows | 5091142 |
Number of columns | 14 |
_______________________ | |
Column type frequency: | |
character | 5 |
factor | 2 |
numeric | 5 |
POSIXct | 2 |
________________________ | |
Group variables | None |
Variable type: character
skim_variable | n_missing | complete_rate | min | max | empty | n_unique | whitespace |
---|---|---|---|---|---|---|---|
ride_id | 0 | 1 | 16 | 16 | 0 | 5091142 | 0 |
rideable_type | 0 | 1 | 12 | 13 | 0 | 2 | 0 |
start_station_name | 0 | 1 | 9 | 53 | 0 | 1337 | 0 |
end_station_name | 0 | 1 | 9 | 53 | 0 | 1343 | 0 |
member_casual | 0 | 1 | 6 | 6 | 0 | 2 | 0 |
Variable type: factor
skim_variable | n_missing | complete_rate | ordered | n_unique | top_counts |
---|---|---|---|---|---|
day_of_week | 0 | 1 | TRUE | 7 | Sat: 793689, Thu: 770594, Wed: 734805, Fri: 725683 |
month | 0 | 1 | TRUE | 12 | Jul: 721924, Aug: 699739, Jun: 678645, Sep: 633443 |
Variable type: numeric
skim_variable | n_missing | complete_rate | mean | sd | p0 | p25 | p50 | p75 | p100 |
---|---|---|---|---|---|---|---|---|---|
start_lat | 0 | 1 | 41.90 | 0.05 | 41.65 | 41.88 | 41.90 | 41.93 | 42.06 |
start_lng | 0 | 1 | -87.65 | 0.03 | -87.83 | -87.66 | -87.64 | -87.63 | -87.53 |
end_lat | 0 | 1 | 41.90 | 0.05 | 41.65 | 41.88 | 41.90 | 41.93 | 42.06 |
end_lng | 0 | 1 | -87.65 | 0.03 | -87.83 | -87.66 | -87.65 | -87.63 | -87.53 |
ride_length_min | 0 | 1 | 11.96 | 8.08 | 1.00 | 5.78 | 9.72 | 16.15 | 37.58 |
Variable type: POSIXct
skim_variable | n_missing | complete_rate | min | max | median | n_unique |
---|---|---|---|---|---|---|
started_at | 0 | 1 | 2022-01-01 00:00:05 | 2022-12-31 23:59:26 | 2022-07-23 13:00:58 | 4352010 |
ended_at | 0 | 1 | 2022-01-01 00:01:48 | 2023-01-01 00:27:52 | 2022-07-23 13:15:00 | 4362055 |
To verify if the data follows a normal distribution, I use the
Empirical Rule, which states that 99.7% of data points in a normal
distribution cluster within three standard deviations of the mean (68%
within 1 SD, 95% within 2 SD). I create the
calculate_percentage
function to measure
the data percentage within specific standard deviation ranges. Its
output reveals that 72.10%, 94.08%, and 99.38% of the data fall within
one, two, and three standard deviations of the mean, respectively,
confirming a close resemblance to a normal distribution. This suggests
the data can be reliably analyzed with various statistical methods.
summary_stats <- summarise(trip_data_v2,
sd = sd(ride_length_min),
mean = mean(ride_length_min),
count = n())
calculate_percentage <- function(n_sd) {
filtered_count <- trip_data_v2 %>%
filter(between(ride_length_min,
summary_stats$mean - n_sd * summary_stats$sd,
summary_stats$mean + n_sd * summary_stats$sd)) %>%
summarise(count = n())
round((filtered_count$count / summary_stats$count) * 100, 2)
}
percentage_sd1 <- calculate_percentage(1)
percentage_sd2 <- calculate_percentage(2)
percentage_sd3 <- calculate_percentage(3)
## [1] "One standard deviation: 72.10%"
## [1] "Two standard deviations: 94.08%"
## [1] "Three standard deviations: 99.38%"
Analysis
This section presents an analysis of Cyclistic’s historical trip data from January to December 2022, with the objective of identifying the differences in the use of Cyclistic bikes between annual members and casual riders.
Overall Summary of Cyclistic Rides in 2022
As shown in the below table, it presents a statistical summary of time duration for Cyclistic’s users, including casual riders and annual members.
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 5.783 9.717 11.958 16.150 37.583
The time duration of Cyclistic’s rides can also be visualized by a histogram. The histogram shows that the distribution of ride times is positively skewed, with a high cluster of lower values and a spread-out tail on the right. This means that there are a few rides that are much longer than the majority of rides.
Because of the positive skew, the mean ride time is not a very accurate measure of the typical ride time. The median ride time is a more accurate measure, as it is not as affected by the outliers. So, the median ride time will be chosen to conduct further descriptive analysis.
Casual Riders vs Annual Members
Casual riders appear to ride for longer periods of time than annual members. The median time duration for casual riders is 11.52 minutes, which is higher compared to the median time duration of 8.72 minutes for annual members. However, the data also shows that casual riders took a total of 1,946,822 rides, while annual members took 3,144,320 rides. From these findings, it can be concluded that annual members tend to be more regular users of the bike-sharing service. On the other hand, casual riders are more likely to take longer trips when they do use the service, even though they use it less frequently.
Time Duration by Users
Ride Counts by Users
Day of Week
Casual riders tend to take longer bike rides than annual members, especially on the weekend. On average, casual riders spend about 13 minutes longer on bikes than annual members on the weekend, but annual members take time cycling at a consistent level throughout the day. In contrast, annual members use Cyclistic bikes more frequently on the weekday and their usage gradually decreases as the weekend approaches. Casual riders, on the other hand, tend to take more bike rides on weekends. This suggests that the longer ride times of casual riders are highly correlated with their increased usage during the weekends.
Time Duration by Users and Day of Week
Ride Counts by Users and Day of Week
Hour
Casual riders are most likely to take longer bike rides after the midday hours, from 1 PM to 3 PM. After that, their usage decreases slightly. Annual members, on the other hand, take time on bikes at a consistent level throughout the day, with no significant spikes or dips in usage. By the number of rides, annual members use Cyclistic bikes at a consistent rate throughout the day, with three peaks in usage: at 8 AM, at 12 PM, and at 5 PM, whereas casual riders start using Cyclistic bikes at 5 AM and gradually increase their usage until they peak at 5 PM.
Time Duration by Users and Hour
Ride Counts by Users and Hour
Month
The findings show that the median ride time for both casual and member riders is relatively consistent throughout the year, with the exception of three months: March, April and May. However, casual riders tend to spend more time on bikes than annual members. They also shows that the number of Cyclistic rides increases from month to month, with a peak in July for casual riders and August for annual members. After that, the number of rides suddenly declines. This can be concluded that
Time Duration by Users and Month
Ride Counts by Users and Month
Conclusion
This section provides concluding thought and recommendations of this report.
Key Findings
Based on the data collected, it can be concluded that
- Casual riders: They are more likely to use Cyclistic bikes for leisure purpose, including enjoyment, fitness, exploring, and much more. They spend longer periods of time and have a higher number of rides on weekends. They take a ride throughout the day, but have higher rides at the evening.
- Annual members: They tend to use Cyclistic bikes for specific purpose, such as commute to work. They spend shorter periods of time and have a higher number of rides on weekdays. They take a ride during rush hour , especially at 8 AM, 12 PM, and 5 PM.
- Warmer months: Both casual riders and annual members take more rides during Spring (in the beginning of March) and peak usage at the end of Summer (July and August).
Recommendations
The report makes the following recommendations:
- Price Incentives: Offer incentives for casual riders to become annual members, such as discounts, free ride credits, or gift cards. This would make it a more attractive option for casual riders who are not sure if they will use Cyclistic bikes enough to justify the full annual membership price.
- Personalized Marketing: Personalize the marketing messages that we send to casual riders. This will help to ensure that we are reaching them with the right message at the right time. For example, we could send a message to casual riders who have taken a certain number of rides in a month, or who have used Cyclistic bikes during rush hour.
- New Member Plans: Consider to add a monthly or quarterly membership option. It would also be a good option for casual riders who only bike during certain times of the year, such as springtime and summertime.