Cyclistic Bike-Share Data Analysis

Thanakorn Thanakraikiti

Published on 15 August 2023 | Updated on 27 December 2023

Project Overview

Cyclistic is a successful American bicycle-sharing program that was established in 2016. Since then, the program has grown to a fleet of 5,824 bicycles that are geotracked and locked into a network of 692 stations across Chicago. The bikes can be unlocked from one station and returned to any other station in the system at any time.

Cyclistic’s marketing strategy has primarily focused on building general awareness and appealing to broad consumer segments. The program offers a variety of pricing plans, including single-ride passes, full-day passes, and annual memberships. Cyclistic classifies its riders into two groups based on pricing plans: casual riders (users who purchase single-ride passes or full-day passes) and annual members (users who purchase an annual membership).

Cyclistic’s flexible pricing plans attract a larger customer base, but financial analysts have determined that annual members are more profitable. However, casual riders are already aware of the Cyclistic program and have chosen Cyclistic to meet their mobility needs. This suggests that a marketing campaign that targets existing customers is likely to be more effective at expanding the business than a campaign that targets only new customers.

Therefore, Cyclistic’s marketing analytics team is interested in understanding how casual riders and annual members use Cyclistic bikes differently. By understanding these differences, the marketing analytics team can develop more targeted marketing strategies to convert casual riders into annual members.

Dataset

This project analyzes Cyclistic’s historical bike trip data from January to December 2022, downloaded and organized in CSV format from the Divvy system data website. The 2022 data comprises twelve monthly datasets, each containing detailed trip information. These datasets hold the following fields for each trip:

  1. ride_id: unique ID number for all rides
  2. rideable_type: type of bike
  3. started_at: date and time the ride started 4, ended_at: date and time the ride ended
  4. start_station_name: name of the station where the ride started
  5. start_station_id: ID number of the station where the ride started
  6. end_station_name: name of the station where the ride started
  7. end_station_id: ID number of the station where the ride started
  8. start_lat: latitude of the location where the ride started
  9. start_lng: longitude of the location where the ride started
  10. end_lat: latitude of the location where the ride ended
  11. end_lng: longitude of the location where the ride ended
  12. member_casual: type of user

These credible public datasets, collected and updated monthly by the company since 2013, offer valuable insights into Cyclistic’s usage patterns across different customers. While data privacy regulations restrict access to personally identifiable information (e.g., linking pass purchases to credit cards), the rich data remains crucial for understanding how customer types use Cyclistic bikes.

However, before utilizing the data for analysis, it requires processing. This involves cleaning inconsistencies and errors, and transforming it into a suitable format for analysis.

Note: The datasets have a different name because Cyclistic is a fictional company. For the purposes of this case study, the datasets are appropriate and will enable you to answer the business questions. The data has been made available by Motivate International Inc. under this license.

Methodology

This project leverages the six-step data analysis framework to unlock transformative insights for Cyclistic’s growth. From pinpointing key business questions to extracting actionable recommendations, each step is meticulously designed to yield valuable discoveries.

To efficiently manage and analyze the extensive data, this project utilizes the powerful capabilities of R and RStudio. These tools enable us to delve deeper into the data, extracting meaningful insights that will inform critical decisions and shape Cyclistic’s future.

Data Cleaning

This section dives into the data cleaning process, essential for preparing the dataset for analysis. It covers key steps such as importing the data, removing duplicates, correcting structural errors, handling missing values, adding relevant columns, and dropping unnecessary data. Each step contributes to ensuring high-quality, reliable data for accurate insights.

Import the Data

I begin by equipping myself with essential libraries like tidyverse, providing me with the ability to model, transform, and visualize data throughout the journey.

library(tidyverse)
library(skimr)
library(scales)

Then, I start importing all 12 monthly datasets and merging them into a single dataset named trip_data. To achieve this, I leverage two key functions:

  1. list.files to generate a list of all file paths for efficient reading, and
  2. read_csv to iteratively read each file into memory.

Once all files are read, I use the bind_rows function to seamlessly stitch them together into a unified dataset.

trip_data <- list.files(path = "./data/", pattern = "*-divvy-tripdata.csv", full.names = TRUE) %>% 
  lapply(read_csv) %>% 
  bind_rows %>% 
  arrange(started_at)

I would like to take a quick look with the dataset. Cyclists pedaled their way to more than 5 million trips on Cyclistic bikes in 2022, demonstrating the significant impact of bike-sharing programs.

dim(trip_data)
## [1] 5667717      13
head(trip_data)

Remove Duplicates

I use the duplicated and sum functions to meticulously check for duplicate trip data. The results reveal no duplicate values, indicating excellent data integrity and providing a reliable foundation for further analysis.

sum(duplicated(trip_data$ride_id))
## [1] 0

Correct Structural Errors

The next step is to address structural errors, those inconsistencies in data formatting that can lead to misinterpretations. This often takes the form of typos, inconsistent capitalization, or duplicate entries. With this in mind, I’ll primarily focus on cleaning up the member type, bike type, and start and end station columns, as they contain text data and are particularly susceptible to such issues.

Member Type

The member_casual column has two distinct values: casual and member. This perfectly aligns with our goal of comparing bike usage between annual members and casual riders, as distinct user types are clearly identified in the data

member_type <- count(trip_data, member_casual, name = "count")
member_type

Bike Type

To understand the distribution of bike types used, I take a look at the rideable_type column. While classic_bike and electric_bike have over 2.6 million rides, docked_bike shows a significantly lower count with only 170,000.

bike_type <- count(trip_data, rideable_type, name = "count")
bike_type

Investigating further, I can confirm that docked_bike refers to the same type of bike as docked_bike. Therefore, I replace docked_bike with classic_bike in the column. This correction leaves only classic bike and electric_bike as valid categories for further analysis.

trip_data_v2 <- trip_data %>% 
  mutate(rideable_type = str_replace_all(rideable_type, "docked_bike", "classic_bike"))
bike_type_v2 <- count(trip_data_v2, rideable_type, name = "count")
bike_type_v2

Start and End Station

To thoroughly examine station names, I create separate start_station and end_station data.

start_station <- trip_data_v2 %>% 
  count(start_station_name, name = "count") %>% 
  arrange(start_station_name)

start_station

After verifying their validity, I identify two areas requiring correction:

  1. Removing Test Stations: A total of eight test stations are found in the data: “Pawel Bialowas - Test- PBSC charging station”, “Hastings WH 2”, “DIVVY CASSETTE REPAIR MOBILE STATION”, “Base - 2132 W Hubbard Warehouse”, “Base - 2132 W Hubbard”, “NewHastings”, “WestChi”, and “WEST CHI-WATSON”. I filter these out using the filter function to ensure accuracy in our analysis.
test_station_list <- c("Pawel Bialowas - Test- PBSC charging station", 
                       "Hastings WH 2", 
                       "DIVVY CASSETTE REPAIR MOBILE STATION", 
                       "Base - 2132 W Hubbard Warehouse", 
                       "Base - 2132 W Hubbard", 
                       "NewHastings", 
                       "WestChi", 
                       "WEST CHI-WATSON")

trip_data_v2 <- trip_data_v2 %>% 
  filter(!(trip_data_v2$start_station_name %in% test_station_list | 
           trip_data_v2$end_station_name %in% test_station_list))
  1. Addressing Inconsistencies within Station Names: Typos (e.g., “Michgan” instead of “Michigan”), special symbols (e.g., ‘*’), and directional words (e.g., “north” or “south”) are present in station names. To address these inconsistencies, I use the str_replace_all function to remove them, ensuring consistent data for further analysis.
words <- c("*", " - Charging", " (Temp)", "amp;", "Public Rack - ", 
           " - north corner", " - south corner", " - midblock south", " - midblock", 
           " - North", " - South", " - East", " - West", 
           " - NE", " - NW", " - SE", " - SW", 
           " - N", " - S", " - E", " - W", 
           " NE", " NW", " SE", " SW")

for (word in words) {
  trip_data_v2 <- trip_data_v2 %>% 
    mutate(start_station_name = str_replace_all(start_station_name, fixed(word, ignore_case = TRUE), "")) %>%
    mutate(end_station_name   = str_replace_all(end_station_name, fixed(word, ignore_case = TRUE), ""))
}

trip_data_v2 <- trip_data_v2 %>% 
  mutate(start_station_name = str_replace_all(start_station_name, regex(" (?<=\\s)[N|S|E|W]$", ignore_case = TRUE), "")) %>% 
  mutate(end_station_name   = str_replace_all(end_station_name, regex(" (?<=\\s)[N|S|E|W]$", ignore_case = TRUE), ""))

After applying these corrections, a re-examination of the start_station_v2 and end_station_v2 data confirms the successful removal of test stations and inconsistencies, providing a clean and consistent dataset for subsequent analysis.

start_station_v2 <- trip_data_v2 %>% 
  count(start_station_name, name = "count") %>% 
  arrange(start_station_name)

start_station_v2

Handle Missing Data

I use the colSum and is.na functions to meticulously identify missing values within the dataset. The analysis reveals six columns with missing data: start_station_name (833,025 rows), start_station_id (833,025 rows), end_station_name (891,896 rows), end_station_id (891,896 rows), end_lat (5,858 rows), and end_lng (5,858 rows).

colSums(is.na(trip_data_v2))
##            ride_id      rideable_type         started_at           ended_at 
##                  0                  0                  0                  0 
## start_station_name   start_station_id   end_station_name     end_station_id 
##             833025             833025             891896             891896 
##          start_lat          start_lng            end_lat            end_lng 
##                  0                  0               5858               5858 
##      member_casual 
##                  0

To address these missing values, I prioritize start_station_name and end_station_name, as their missing entries can be imputed using geographic coordinates as a reference. I refrain from imputing the remaining four columns: start_station_id and end_station_id will be removed in the subsequent step due to their irrelevance to the analysis, while end_lat and end_lng lack suitable reference data for imputation and will also be discarded.

start_station_location <- trip_data_v2 %>% 
  count(start_lat, start_lng , start_station_name, name = "count") %>% 
  arrange(start_lat, start_lng)

start_station_location

To effectively handle missing station names, I create four temporary columns representing increasingly granular levels of start and end coordinates (ranging from 5 digits to 1 digit). These columns facilitate accurate matching and assignment of station names to missing entries, prioritizing those with more precise location data. This iterative approach ensures that the most reliable matches are filled in first.

digit <- 5

while (digit > 1) {
  trip_data_v2 <- trip_data_v2 %>% 
    mutate(start_lat_round = round(start_lat, digits = digit), 
           start_lng_round = round(start_lng, digits = digit), 
           end_lat_round   = round(end_lat,   digits = digit), 
           end_lng_round   = round(end_lng,   digits = digit))
  
  trip_data_v2 <- trip_data_v2 %>% 
    group_by(start_lat_round, start_lng_round) %>% 
    fill(start_station_name, .direction = "downup") %>% 
    fill(start_station_id,   .direction = "downup") %>% 
    ungroup()
  
  trip_data_v2 <- trip_data_v2 %>% 
    group_by(end_lat_round, end_lng_round) %>%  
    fill(end_station_name, .direction = "downup") %>% 
    fill(end_station_id,   .direction = "downup") %>% 
    ungroup()
  
  digit <- digit - 1
}

trip_data_v2 <- trip_data_v2 %>% 
  select(!c(start_lat_round, start_lng_round, end_lat_round, end_lng_round))

The imputation process yields a significant reduction in missing values: start_station_name dropped by 11,821 records each, while end_station_name saw a decrease of 41,386 records each.

colSums(is.na(trip_data_v2))
##            ride_id      rideable_type         started_at           ended_at 
##                  0                  0                  0                  0 
## start_station_name   start_station_id   end_station_name     end_station_id 
##              11821              11821              41386              41386 
##          start_lat          start_lng            end_lat            end_lng 
##                  0                  0               5858               5858 
##      member_casual 
##                  0
start_station_location_v2 <- trip_data_v2 %>% 
  count(start_lat, start_lng, start_station_name, name = "count") %>% 
  arrange(start_lat, start_lng)

start_station_location_v2

To ensure a focus on complete trips, I subsequently remove all remaining rows with missing values using the drop_na function. This results in 5,614,669 remaining trips, representing a relatively minor reduction from the original dataset.

trip_data_v2 <- drop_na(trip_data_v2)
dim(trip_data_v2)
## [1] 5614669      13

Add Relevant Columns

To facilitate time-series analysis, I create three new columns: ride_length_min, day_of_week, and month.

trip_data_v2$ride_length_min <- as.double(difftime(trip_data_v2$ended_at, trip_data_v2$started_at, units = "mins"))
trip_data_v2$day_of_week <- wday(trip_data_v2$started_at, label = TRUE)
trip_data_v2$month <- format(trip_data_v2$started_at, "%b")

However, both day_of_week and month are initially stored as unsorted text strings and not categorized in their natural order. Therefore, I utilize the ordered function to transform them based on their natural order (Monday to Sunday for days and January to December for months), ensuring their suitability for further analysis.

trip_data_v2 <- within(trip_data_v2, {
  day_of_week <- ordered(day_of_week, levels = c("Mon", "Tue", "Wed", "Thu", "Fri", "Sat", "Sun"))
  month <- ordered(month, levels = c("Jan", "Feb", "Mar", "Apr", "May", "Jun", "Jul", "Aug", "Sep", "Oct", "Nov", "Dec"))
})

Drop Unnecessary Data

The final step involves identifying and addressing outliers in the data. This is crucial to ensure the validity of the numbers and prevent distortion of the analysis. Outliers are identified using interquartile range (IQR) method, and their relevance is assessed based on their impact on the analysis and potential for being errors. If an outlier is deemed irrelevant or a mistake, it’s removed from the dataset. This process may involve dropping irrelevant columns, correcting error inputs, and filtering out potential outliers.

Irrelevant Columns

Following the previous step’s conclusion, I use the select function eliminate the start_station_id and end_station_id columns from the dataset, as they are not meaningful for this project’s analysis.

trip_data_v2 <- trip_data_v2 %>% 
  select( !c(start_station_id, end_station_id) )

Error Inputs

The output reveals inconsistencies in ride lengths, particularly with unusually short or long durations. To address these inconsistencies, I filter out invalid rides based on the following criteria:

  • Rides with zero start/end geographic coordinates
  • Rides with less than 60 seconds in duration (including negative times) could indicate false starts or attempts to secure the bike, and will be excluded from further analysis.
  • Rides exceeding 24 hours in length are considered invalid outliers, as users are not expected to keep bikes for longer than a day.

Note: See the Divvy System Data and Divvy article for a detailed explanation of trip durations.

##      Min.   1st Qu.    Median      Mean   3rd Qu.      Max. 
## -10353.35      5.80     10.25     16.29     18.38  32035.45
trip_data_v2 <- trip_data_v2 %>% 
  filter( !(trip_data_v2$start_lat == 0 | trip_data_v2$start_lng == 0 | trip_data_v2$end_lat == 0 | trip_data_v2$end_lng == 0 | 
            trip_data_v2$ride_length_min < 1 | trip_data_v2$ride_length_min > 1440) )

A thorough re-examination of the data confirms the successful removal of all previously identified errors.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    6.05   10.48   16.55   18.67 1439.37

Potential Outliers

Before removing outliers, I examine the distribution of ride_length_min using a boxplot. The numerous outliers obscure the underlying data trend. These extreme values, while technically part of the dataset, don’t reflect the typical ride duration.

ggplot(data = trip_data_v2, aes(x = member_casual, y = ride_length_min, fill = member_casual)) + 
  geom_boxplot() +
  coord_flip() + 
  theme(legend.position = "none") + 
  labs(x = "Member type", 
       y = "Ride length (in minutes)", 
       title = "Box plot showing 'ride_length_min' before removing outliers")

To identify them, I use the Interquartile Range (IQR) method. The IQR measures the spread of the middle 50% of the data, excluding the lower and upper quartiles (Q1 and Q3). In this case, Q1 is 6.05 minutes and Q3 is 18.67 minutes, making the IQR 12.62 minutes.

quantiles <- as.numeric(quantile(trip_data_v2$ride_length_min, probs = c(0.25, 0.50, 0.75), na.rm = FALSE))
iqr_value <- IQR(trip_data_v2$ride_length_min)
## [1] "Q1: 6.05 minutes"
## [1] "Q3: 18.66667 minutes"
## [1] "IQR: 12.61667 minutes"

After this, I need to set up a fence outside of Q1 and Q3. Outliers lie outside a “fence” built around this central portion of the data. This fence is constructed by adding and subtracting 1.5 times the IQR to Q1 and Q3, respectively. From the output below, the lower fence is -12.88 minutes and the upper fence is 37.59 minutes.

lower_fence <- quantiles[1] - ( 1.5 * iqr_value )
upper_fence <- quantiles[3] + ( 1.5 * iqr_value )
## [1] "Lower Fence: -12.875 minutes"
## [1] "Upper Fence: 37.59167 minutes"

So, any value less than -12.88 minutes or greater than 37.59 minutes is considered an outlier and removed from the data.

trip_data_v2 <- trip_data_v2 %>%
  filter(!(trip_data_v2$ride_length_min < lower_fence | trip_data_v2$ride_length_min > upper_fence))

Filtering out these outliers significantly cleans the data. The maximum ride length after removal drops to 37.58 minutes, confirming the successful elimination of extreme values. This refined dataset provides a more accurate representation of typical ride durations for further analysis.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   5.783   9.717  11.958  16.150  37.583
ggplot(data = trip_data_v2, aes(x = member_casual, y = ride_length_min, fill = member_casual)) + 
  geom_boxplot() +
  coord_flip() + 
  theme(legend.position = "none") + 
  labs(x = "Member type", 
       y = "Ride length (in minutes)", 
       title = "Box plot showing 'ride_length_min' after removing outliers")

Validate the Data

After data cleaning, I re-examined the dataset, finding it now contains 5,091,142 trip records, a reduction of 576,575 (10%) compared to the original 5,667,717. There are 14 columns in the data, all of which have no missing values, ensuring a complete dataset for thorough analysis.

skim_without_charts(trip_data_v2)
Data summary
Name trip_data_v2
Number of rows 5091142
Number of columns 14
_______________________
Column type frequency:
character 5
factor 2
numeric 5
POSIXct 2
________________________
Group variables None

Variable type: character

skim_variable n_missing complete_rate min max empty n_unique whitespace
ride_id 0 1 16 16 0 5091142 0
rideable_type 0 1 12 13 0 2 0
start_station_name 0 1 9 53 0 1337 0
end_station_name 0 1 9 53 0 1343 0
member_casual 0 1 6 6 0 2 0

Variable type: factor

skim_variable n_missing complete_rate ordered n_unique top_counts
day_of_week 0 1 TRUE 7 Sat: 793689, Thu: 770594, Wed: 734805, Fri: 725683
month 0 1 TRUE 12 Jul: 721924, Aug: 699739, Jun: 678645, Sep: 633443

Variable type: numeric

skim_variable n_missing complete_rate mean sd p0 p25 p50 p75 p100
start_lat 0 1 41.90 0.05 41.65 41.88 41.90 41.93 42.06
start_lng 0 1 -87.65 0.03 -87.83 -87.66 -87.64 -87.63 -87.53
end_lat 0 1 41.90 0.05 41.65 41.88 41.90 41.93 42.06
end_lng 0 1 -87.65 0.03 -87.83 -87.66 -87.65 -87.63 -87.53
ride_length_min 0 1 11.96 8.08 1.00 5.78 9.72 16.15 37.58

Variable type: POSIXct

skim_variable n_missing complete_rate min max median n_unique
started_at 0 1 2022-01-01 00:00:05 2022-12-31 23:59:26 2022-07-23 13:00:58 4352010
ended_at 0 1 2022-01-01 00:01:48 2023-01-01 00:27:52 2022-07-23 13:15:00 4362055

To verify if the data follows a normal distribution, I use the Empirical Rule, which states that 99.7% of data points in a normal distribution cluster within three standard deviations of the mean (68% within 1 SD, 95% within 2 SD). I create the calculate_percentage function to measure the data percentage within specific standard deviation ranges. Its output reveals that 72.10%, 94.08%, and 99.38% of the data fall within one, two, and three standard deviations of the mean, respectively, confirming a close resemblance to a normal distribution. This suggests the data can be reliably analyzed with various statistical methods.

summary_stats <- summarise(trip_data_v2,
                           sd = sd(ride_length_min),
                           mean = mean(ride_length_min),
                           count = n())

calculate_percentage <- function(n_sd) {
  filtered_count <- trip_data_v2 %>%
  filter(between(ride_length_min, 
                 summary_stats$mean - n_sd * summary_stats$sd, 
                 summary_stats$mean + n_sd * summary_stats$sd)) %>%
  summarise(count = n())
  round((filtered_count$count / summary_stats$count) * 100, 2)
}

percentage_sd1 <- calculate_percentage(1)
percentage_sd2 <- calculate_percentage(2)
percentage_sd3 <- calculate_percentage(3)
## [1] "One standard deviation: 72.10%"
## [1] "Two standard deviations: 94.08%"
## [1] "Three standard deviations: 99.38%"

Analysis

This section presents an analysis of Cyclistic’s historical trip data from January to December 2022, with the objective of identifying the differences in the use of Cyclistic bikes between annual members and casual riders.

Overall Summary of Cyclistic Rides in 2022

As shown in the below table, it presents a statistical summary of time duration for Cyclistic’s users, including casual riders and annual members.

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   5.783   9.717  11.958  16.150  37.583

The time duration of Cyclistic’s rides can also be visualized by a histogram. The histogram shows that the distribution of ride times is positively skewed, with a high cluster of lower values and a spread-out tail on the right. This means that there are a few rides that are much longer than the majority of rides.

Because of the positive skew, the mean ride time is not a very accurate measure of the typical ride time. The median ride time is a more accurate measure, as it is not as affected by the outliers. So, the median ride time will be chosen to conduct further descriptive analysis.

Casual Riders vs Annual Members

Casual riders appear to ride for longer periods of time than annual members. The median time duration for casual riders is 11.52 minutes, which is higher compared to the median time duration of 8.72 minutes for annual members. However, the data also shows that casual riders took a total of 1,946,822 rides, while annual members took 3,144,320 rides. From these findings, it can be concluded that annual members tend to be more regular users of the bike-sharing service. On the other hand, casual riders are more likely to take longer trips when they do use the service, even though they use it less frequently.

Time Duration by Users

Ride Counts by Users

Day of Week

Casual riders tend to take longer bike rides than annual members, especially on the weekend. On average, casual riders spend about 13 minutes longer on bikes than annual members on the weekend, but annual members take time cycling at a consistent level throughout the day. In contrast, annual members use Cyclistic bikes more frequently on the weekday and their usage gradually decreases as the weekend approaches. Casual riders, on the other hand, tend to take more bike rides on weekends. This suggests that the longer ride times of casual riders are highly correlated with their increased usage during the weekends.

Time Duration by Users and Day of Week

Ride Counts by Users and Day of Week

Hour

Casual riders are most likely to take longer bike rides after the midday hours, from 1 PM to 3 PM. After that, their usage decreases slightly. Annual members, on the other hand, take time on bikes at a consistent level throughout the day, with no significant spikes or dips in usage. By the number of rides, annual members use Cyclistic bikes at a consistent rate throughout the day, with three peaks in usage: at 8 AM, at 12 PM, and at 5 PM, whereas casual riders start using Cyclistic bikes at 5 AM and gradually increase their usage until they peak at 5 PM.

Time Duration by Users and Hour

Ride Counts by Users and Hour

Month

The findings show that the median ride time for both casual and member riders is relatively consistent throughout the year, with the exception of three months: March, April and May. However, casual riders tend to spend more time on bikes than annual members. They also shows that the number of Cyclistic rides increases from month to month, with a peak in July for casual riders and August for annual members. After that, the number of rides suddenly declines. This can be concluded that

Time Duration by Users and Month

Ride Counts by Users and Month

Conclusion

This section provides concluding thought and recommendations of this report.

Key Findings

Based on the data collected, it can be concluded that

  • Casual riders: They are more likely to use Cyclistic bikes for leisure purpose, including enjoyment, fitness, exploring, and much more. They spend longer periods of time and have a higher number of rides on weekends. They take a ride throughout the day, but have higher rides at the evening.
  • Annual members: They tend to use Cyclistic bikes for specific purpose, such as commute to work. They spend shorter periods of time and have a higher number of rides on weekdays. They take a ride during rush hour , especially at 8 AM, 12 PM, and 5 PM.
  • Warmer months: Both casual riders and annual members take more rides during Spring (in the beginning of March) and peak usage at the end of Summer (July and August).

Recommendations

The report makes the following recommendations:

  1. Price Incentives: Offer incentives for casual riders to become annual members, such as discounts, free ride credits, or gift cards. This would make it a more attractive option for casual riders who are not sure if they will use Cyclistic bikes enough to justify the full annual membership price.
  2. Personalized Marketing: Personalize the marketing messages that we send to casual riders. This will help to ensure that we are reaching them with the right message at the right time. For example, we could send a message to casual riders who have taken a certain number of rides in a month, or who have used Cyclistic bikes during rush hour.
  3. New Member Plans: Consider to add a monthly or quarterly membership option. It would also be a good option for casual riders who only bike during certain times of the year, such as springtime and summertime.