Author: Nele Albers
Date: August 2024
This notebook is meant to reproduce:
Required files:
Created files:
Authored by Nele Albers, Francisco S. Melo, Mark A. Neerincx, Olya Kudina, and Willem-Paul Brinkman.
Let's first import the packages we need.
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import scipy
import seaborn as sns
Let's define some constants we need.
feat_sel = [0, 1, 2] # selected base state features
NUM_ACTIONS = 2
NUM_VALUES_PER_STATE_FEATURE = [3, 2, 2]
states = [[i, j, k] for i in range(NUM_VALUES_PER_STATE_FEATURE[0]) for j in range(NUM_VALUES_PER_STATE_FEATURE[1]) for k in range(NUM_VALUES_PER_STATE_FEATURE[2])]
num_states = len(states)
We load the data frame.
data = pd.read_csv("Data/data_rl_samples_abstracted" + str(feat_sel) + str(NUM_VALUES_PER_STATE_FEATURE) + ".csv",
converters={'s0': eval, 's1': eval})
Let's set the style for the plots.
med_fontsize = 22
small_fontsize = 18
extrasmall_fontsize = 15
sns.set()
sns.set_style("white")
sns.set_context("paper", rc={"font.size":small_fontsize,"axes.titlesize":med_fontsize,"axes.labelsize":med_fontsize,
'xtick.labelsize':small_fontsize, 'ytick.labelsize':small_fontsize,
'legend.fontsize':extrasmall_fontsize,'legend.title_fontsize': extrasmall_fontsize})
all_people = data["rand_id"].unique()
NUM_PEOPLE = len(all_people)
NUM_SAMPLES = len(data)
print("Total number of samples: " + str(NUM_SAMPLES) + ".")
print("Total number of people: " + str(NUM_PEOPLE) + ".")
Total number of samples: 2326. Total number of people: 679.
Now we look at the mean effort for each of the 37 preparatory activities.
activities = ["Creating motivational slogans/quotes for quitting smoking",
"Creating motivational slogans/quotes for becoming more physically active",
"Testimonial on becoming more physically active",
"Desired future self after quitting smoking - Writing",
"Desired future self after becoming more physically active - Writing",
"Reasons for quitting smoking",
"Reasons for becoming more physically active",
"Personal rule for not smoking",
"Personal rule for becoming more physically active",
"How friends and/or family will receive one's desired future self after quitting smoking",
"How friends and/or family will receive one's desired future self after becoming more physically active",
"Focusing on past successes for quitting smoking",
"Focusing on past successes for becoming more physically active",
"Role model for others by quitting smoking",
"Role model for others by becoming more physically active",
"Tracking smoking behavior",
"Tracking physical activity behavior",
"Feared future self when not quitting smoking - Writing",
"Feared future self when not becoming more physically active - Writing",
"Feared future self when not quitting smoking - Picture",
"Feared future self when not becoming more physically active - Picture",
"Visualizing smoking as a battle",
"Visualizing becoming more physically active as a battle",
"Desired future self after quitting smoking - Picture",
"Desired future self after becoming more physically active - Picture",
"Education on sleep",
"Routines that cause cravings",
"Thinking of high-risk situations and how to cope with them",
"Alternative behaviors for cravings",
"Progressive muscle relaxation",
"Breathing exercise",
"Exchanging a passive activity for an active one",
"Thinking of solutions to barriers to becoming physically active",
"Education on recommended physical activity",
"Plan for becoming more physically active",
"Positive diary",
"Focusing on past success in general"]
for activity, activity_name in enumerate(activities):
data_activity = data[data["activity"] == activity]
print("***" + activity_name + "***")
print("Mean:", np.round(data_activity["effort"].mean(), 2), "SD:", np.round(data_activity["effort"].std(), 2), "(N=", len(data_activity), ")")
***Creating motivational slogans/quotes for quitting smoking*** Mean: 6.04 SD: 2.91 (N= 71 ) ***Creating motivational slogans/quotes for becoming more physically active*** Mean: 5.45 SD: 3.34 (N= 78 ) ***Testimonial on becoming more physically active*** Mean: 5.41 SD: 3.3 (N= 76 ) ***Desired future self after quitting smoking - Writing*** Mean: 6.06 SD: 2.76 (N= 70 ) ***Desired future self after becoming more physically active - Writing*** Mean: 5.25 SD: 2.74 (N= 72 ) ***Reasons for quitting smoking*** Mean: 6.0 SD: 2.76 (N= 72 ) ***Reasons for becoming more physically active*** Mean: 5.48 SD: 2.83 (N= 60 ) ***Personal rule for not smoking*** Mean: 5.27 SD: 3.28 (N= 70 ) ***Personal rule for becoming more physically active*** Mean: 5.7 SD: 2.56 (N= 74 ) ***How friends and/or family will receive one's desired future self after quitting smoking*** Mean: 6.62 SD: 2.42 (N= 21 ) ***How friends and/or family will receive one's desired future self after becoming more physically active*** Mean: 5.94 SD: 2.9 (N= 17 ) ***Focusing on past successes for quitting smoking*** Mean: 5.51 SD: 2.73 (N= 65 ) ***Focusing on past successes for becoming more physically active*** Mean: 5.4 SD: 2.36 (N= 73 ) ***Role model for others by quitting smoking*** Mean: 4.8 SD: 2.72 (N= 71 ) ***Role model for others by becoming more physically active*** Mean: 6.0 SD: 2.68 (N= 72 ) ***Tracking smoking behavior*** Mean: 6.05 SD: 2.52 (N= 80 ) ***Tracking physical activity behavior*** Mean: 5.12 SD: 2.91 (N= 74 ) ***Feared future self when not quitting smoking - Writing*** Mean: 6.21 SD: 2.58 (N= 70 ) ***Feared future self when not becoming more physically active - Writing*** Mean: 5.89 SD: 2.61 (N= 75 ) ***Feared future self when not quitting smoking - Picture*** Mean: 5.55 SD: 2.68 (N= 75 ) ***Feared future self when not becoming more physically active - Picture*** Mean: 5.9 SD: 2.67 (N= 73 ) ***Visualizing smoking as a battle*** Mean: 5.2 SD: 2.45 (N= 80 ) ***Visualizing becoming more physically active as a battle*** Mean: 5.73 SD: 2.38 (N= 60 ) ***Desired future self after quitting smoking - Picture*** Mean: 5.86 SD: 2.54 (N= 77 ) ***Desired future self after becoming more physically active - Picture*** Mean: 6.34 SD: 2.45 (N= 71 ) ***Education on sleep*** Mean: 5.58 SD: 3.07 (N= 79 ) ***Routines that cause cravings*** Mean: 6.11 SD: 3.1 (N= 18 ) ***Thinking of high-risk situations and how to cope with them*** Mean: 5.58 SD: 2.07 (N= 12 ) ***Alternative behaviors for cravings*** Mean: 6.6 SD: 2.29 (N= 57 ) ***Progressive muscle relaxation*** Mean: 5.32 SD: 3.17 (N= 75 ) ***Breathing exercise*** Mean: 6.05 SD: 2.39 (N= 64 ) ***Exchanging a passive activity for an active one*** Mean: 6.13 SD: 2.45 (N= 82 ) ***Thinking of solutions to barriers to becoming physically active*** Mean: 5.71 SD: 2.7 (N= 83 ) ***Education on recommended physical activity*** Mean: 5.67 SD: 2.9 (N= 12 ) ***Plan for becoming more physically active*** Mean: 5.77 SD: 2.89 (N= 13 ) ***Positive diary*** Mean: 5.86 SD: 3.04 (N= 71 ) ***Focusing on past success in general*** Mean: 6.25 SD: 2.68 (N= 63 )
And now the overall mean effort.
print("Mean:", np.round(data["effort"].mean(), 2), "SD:", np.round(data["effort"].std(), 2), "(N=", len(data), ")")
Mean: 5.74 SD: 2.75 (N= 2326 )
Now we want to compute the mean effort for each of the combinations of values for the selected state features.
Let's first define the variables we need.
data_train = data[["s0", "s1", "a", "effort"]].values.tolist()
And now we compute the mean effort for each of the combinations we want.
reward_func = np.zeros((num_states, NUM_ACTIONS))
reward_func_count = np.zeros((num_states, NUM_ACTIONS))
all_effort_values = []
for s_ind, s in enumerate(states):
all_effort_values_state = [[], [], [], [], []]
for data_index in range(NUM_SAMPLES):
if list(np.take(np.array(data_train[data_index][0]), feat_sel)) == s:
r = data_train[data_index][3]
a = data_train[data_index][2]
reward_func[s_ind, a] += r
reward_func_count[s_ind, a] += 1
all_effort_values_state[a].append(r)
# Normalize reward function
for a in range(NUM_ACTIONS):
reward_func[s_ind, a] /= reward_func_count[s_ind, a]
all_effort_values.append(all_effort_values_state)
And let's plot this.
patterns = ["", "/"]
colors = ["#cceeff", "#ffcccc"]
action_labels = ["No feedback", "Feedback"]
x_vals = np.arange(num_states)
width = 1/NUM_ACTIONS - 0.02
fig, ax = plt.subplots(figsize=(10,5))
alpha = 0.95
for action in range(NUM_ACTIONS):
plt.bar(x_vals + action * width, reward_func[:, action], width, color = colors[action],
hatch = patterns[action], label=action_labels[action])
# Add Bayesian credible intervals
for state in range(num_states):
vals = all_effort_values[state][action]
# Need at least two values
if len(vals) > 1:
conf = scipy.stats.bayes_mvs(all_effort_values[state][action], alpha = alpha)[0].minmax
ax.vlines(x=state + action * width, ymin=conf[0], ymax=conf[1], color='black')
ax.set_ylim([0, 9.99])
ax.set_ylabel("Mean effort")
ax.set_xlabel("Combination of state feature values")
plt.xticks(x_vals + 0.5*width, ["000", "001", "010", "011", "100", "101", "110", "111", "200", "201", "210", "211"])
plt.legend(ncols=1, loc='upper left', bbox_to_anchor=(1, 0.7))
plt.savefig("Figures/mean_effort.pdf", dpi=1500,
bbox_inches='tight', pad_inches=0)
And let's compute the number of samples we got per action and combination of selected state feature values.
count = np.zeros((num_states, NUM_ACTIONS))
for s_ind, s in enumerate(states):
for data_index in range(NUM_SAMPLES):
if list(np.take(np.array(data_train[data_index][0]), feat_sel)) == s:
a = data_train[data_index][2]
count[s_ind, a] += 1
Let's plot this.
patterns = ["", "/"]
colors = ["#cceeff", "#ffcccc"]
action_labels = ["No feedback", "Feedback"]
x_vals = np.arange(num_states)
width = 1/NUM_ACTIONS - 0.02
fig, ax = plt.subplots(figsize=(10,5))
# Create a horizontal line for 10 samples
ax.hlines(y=10, color='gray', linestyle = '--', xmin = 0, xmax = x_vals[-1] + NUM_ACTIONS * width)
# Plot the numbers of samples
for action_idx in range(NUM_ACTIONS):
plt.bar(x_vals + action_idx * width, count[:, action_idx], width, color = colors[action_idx],
hatch = patterns[action_idx], label = action_labels[action_idx])
ax.set_ylim([0, 600])
ax.set_ylabel("Number of samples")
ax.set_xlabel("Combination of state feature values")
plt.xticks(x_vals + 0.5 * width, ["000", "001", "010", "011", "100", "101", "110", "111", "200", "201", "210", "211"])
plt.legend(ncols=1, loc='upper left', bbox_to_anchor=(1, 1.1))
plt.savefig("Figures/number_of_samples.pdf", dpi=1500,
bbox_inches='tight', pad_inches=0)
In sessions 2-5, participants were asked how likely they would have returned to the session if they were taking part in an unpaid smoking cessation program on a scale from -5 to 5.
Let's compute the mean rating per session.
for session in range(1, 5):
mean_dropout = data[data['session'] == session]["dropout_response"].mean()
sd_dropout = data[data['session'] == session]["dropout_response"].std()
min_dropout = data[data['session'] == session]["dropout_response"].min()
max_dropout = data[data['session'] == session]["dropout_response"].max()
# Need to add 1 to the session because the return likelihoods are part of the transition samples that begin in the previous session
print("Session " + str(session + 1) + ":", "M =", round(mean_dropout, 2), ", SD =", round(sd_dropout, 2), "Range:", min_dropout,
"-", max_dropout, ", N =", len(data[data['session'] == session]))
Session 2: M = 1.57 , SD = 2.73 Range: -5 - 5 , N = 679 Session 3: M = 1.79 , SD = 2.59 Range: -5 - 5 , N = 599 Session 4: M = 1.81 , SD = 2.78 Range: -5 - 5 , N = 544 Session 5: M = 2.11 , SD = 2.68 Range: -5 - 5 , N = 504
Now we reproduce the participant characteristics we report in Table C5 in the Appendix.
We only want one entry per person, so we only use the data from the first session.
data_s1 = data[data['session'] == 1]
Let's merge the smoking and vaping frequency columns.
data_s1["PIV_Smoking_Freq"] = data_s1["PIV_Smoking_Freq"].fillna('')
data_s1["Vaping_Freq"] = data_s1["Vaping_Freq"].fillna('')
data_s1["Freq"] = data_s1["PIV_Smoking_Freq"] + data_s1["Vaping_Freq"]
/tmp/ipykernel_60/945386589.py:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_s1["PIV_Smoking_Freq"] = data_s1["PIV_Smoking_Freq"].fillna('') /tmp/ipykernel_60/945386589.py:2: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_s1["Vaping_Freq"] = data_s1["Vaping_Freq"].fillna('') /tmp/ipykernel_60/945386589.py:3: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame. Try using .loc[row_indexer,col_indexer] = value instead See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy data_s1["Freq"] = data_s1["PIV_Smoking_Freq"] + data_s1["Vaping_Freq"]
First the age.
print("***Age***")
print("Mean =", np.round(data_s1["Age"].mean(), 2), "SD =", np.round(data_s1["Age"].std(), 2))
print("Range:", data_s1["Age"].min(), "-", data_s1["Age"].max())
***Age*** Mean = 36.3 SD = 11.21 Range: 19 - 71
Now we consider the gender.
possible_values = ["Man (including Trans Male/Trans Man)", "Woman (including Trans Female/Trans Woman)", "Non-binary (would like to give more detail)"]
print("***Gender***")
for v in possible_values:
num = len(data_s1[data_s1["Gender"] == v])
print(v + ":", num, "(" + str(round(num/len(data_s1) * 100, 2)) + "%)")
***Gender*** Man (including Trans Male/Trans Man): 335 (49.34%) Woman (including Trans Female/Trans Woman): 330 (48.6%) Non-binary (would like to give more detail): 14 (2.06%)
Now we look at the highest completed education level, smoking frequency, TTM-stage for becoming physically active, and weekly exercise amount.
For the TTM-stage for becoming physically active, the responses are mapped to the stages as follows:
chars = {"Highest education level completed": ["No formal qualifications", "Secondary education (e.g. GED/GCSE)",
"High school diploma/A-levels", "Technical/community college",
"Undergraduate degree (BA/BSc/other)", "Graduate degree (MA/MSc/MPhil/other)",
"Doctorate degree (PhD/other)", "Don't know / not applicable"],
"Freq": ["Once a day", "2-5 times a day", "6-10 times a day", "11-19 times a day", "More than 20 times a day"],
"ttm_pa": ["No, and I do NOT intend to in the next 6 months", "No, but I intend to in the next 6 months", "No, but I intend to in the next 30 days", "Yes, I have been for LESS than 6 months",
"Yes, I have been for MORE than 6 months"],
"Weekly_Exercise": ["Never (0 – 60 minutes per week)", "Sometimes (60 – 150 minutes per week)", "Often (more than 150 minutes per week)"]}
for char in chars:
print("\n***" + char + "***")
for v in chars[char]:
num = len(data_s1[data_s1[char] == v])
print(v + ":", num, "(" + str(round(num/len(data_s1) * 100, 2)) + "%)")
***Highest education level completed*** No formal qualifications: 5 (0.74%) Secondary education (e.g. GED/GCSE): 61 (8.98%) High school diploma/A-levels: 139 (20.47%) Technical/community college: 90 (13.25%) Undergraduate degree (BA/BSc/other): 263 (38.73%) Graduate degree (MA/MSc/MPhil/other): 107 (15.76%) Doctorate degree (PhD/other): 9 (1.33%) Don't know / not applicable: 5 (0.74%) ***Freq*** Once a day: 30 (4.42%) 2-5 times a day: 109 (16.05%) 6-10 times a day: 136 (20.03%) 11-19 times a day: 160 (23.56%) More than 20 times a day: 244 (35.94%) ***ttm_pa*** No, and I do NOT intend to in the next 6 months: 24 (3.53%) No, but I intend to in the next 6 months: 182 (26.8%) No, but I intend to in the next 30 days: 146 (21.5%) Yes, I have been for LESS than 6 months: 98 (14.43%) Yes, I have been for MORE than 6 months: 228 (33.58%) ***Weekly_Exercise*** Never (0 – 60 minutes per week): 190 (27.98%) Sometimes (60 – 150 minutes per week): 301 (44.33%) Often (more than 150 minutes per week): 187 (27.54%)
And now the number of vapers and smokers.
num_smokers = len(data_s1[data_s1["PIV_Smoking_Freq"] == ""])
num_vapers = len(data_s1[data_s1["PIV_Smoking_Freq"] != ""])
print("Number of people participating as smokers:", num_smokers, "(" + str(round(num_smokers/(num_smokers + num_vapers) * 100, 2)) + "%)")
print("Number of people participating as vapers:", num_vapers, "(" + str(round(num_vapers/(num_smokers + num_vapers) * 100, 2)) + "%)")
Number of people participating as smokers: 352 (51.84%) Number of people participating as vapers: 327 (48.16%)