Author: Nele Albers
Date: June 2024
Here we select three of the basis state features.
Required files:
Authored by Nele Albers, Francisco S. Melo, Mark A. Neerincx, Olya Kudina, and Willem-Paul Brinkman.
Let's first import the packages we need.
import numpy as np
import pandas as pd
import pickle
import random
# For RL-related computations
import compute_dynamics_feat_sel as dyn
And we define constants we use throughout.
NUM_ACTIONS = 2
Let's load the data. This includes the dataframe with the non-abstracted RL samples as well as a file that contains all non-abstracted states (i.e., also the states of people who did not arrive in session 2). We need the latter to compute percentiles for the state features.
def eval_with_nan(x):
x = x.replace('nan', '"nan"')
lst = pd.eval(x)
lst = [np.nan if i == 'nan' else i for i in lst]
return lst
with open("Data/all_states", "rb") as f:
all_states_for_abstraction = pickle.load(f)
data = pd.read_csv("Data/data_rl_samples.csv",
converters={'s0': eval_with_nan, 's1': eval_with_nan})
data.head()
rand_id | session | s0 | a | effort | s1 | activity | dropout_response | s0_imp | s0_se | ... | s0_diff | s0_session | s1_imp | s1_se | s1_hs | s1_energy | s1_diff | s1_session | cons_id | Prev_Feedback_Count | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | P622 | 1 | [4, 4, 5, 5, 1.75, 0] | 0 | 8 | [6.0, 5.0, 9.0, 7.0, 2.6319444444444446, 1] | 4 | 0 | 4 | 4 | ... | 1.750000 | 0 | 6.0 | 5.0 | 9.0 | 7.0 | 2.631944 | 1 | 0 | 0 |
1 | P904 | 1 | [1, 4, 0, 6, 1.8125, 0] | 0 | 0 | [3.0, 4.0, -1.0, 8.0, 2.25, 1] | 15 | 0 | 1 | 4 | ... | 1.812500 | 0 | 3.0 | 4.0 | -1.0 | 8.0 | 2.250000 | 1 | 2 | 0 |
2 | P665 | 1 | [8, 9, 9, 3, 2.2916666666666665, 0] | 0 | 8 | [9.0, 8.0, 10.0, 7.0, 1.8125, 1] | 29 | 3 | 8 | 9 | ... | 2.291667 | 0 | 9.0 | 8.0 | 10.0 | 7.0 | 1.812500 | 1 | 1 | 0 |
3 | P991 | 1 | [9, 9, 10, 5, 1.8611111111111112, 0] | 0 | 8 | [10.0, 8.0, 10.0, 5.0, 0.5902777777777778, 1] | 36 | 0 | 9 | 9 | ... | 1.861111 | 0 | 10.0 | 8.0 | 10.0 | 5.0 | 0.590278 | 1 | 3 | 0 |
4 | P239 | 1 | [9, 6, 9, 6, 1.0138888888888888, 0] | 0 | 9 | [10.0, 8.0, 10.0, 7.0, 1.7430555555555556, 1] | 22 | 4 | 9 | 6 | ... | 1.013889 | 0 | 10.0 | 8.0 | 10.0 | 7.0 | 1.743056 | 1 | 4 | 0 |
5 rows × 22 columns
Below we give an overview of the number of samples we have and the number of people they are from.
all_people = list(data['rand_id'].unique())
NUM_PEOPLE = len(all_people)
print("Total number of samples: " + str(len(data)) + ".")
print("Total number of people: " + str(NUM_PEOPLE) + ".")
Total number of samples: 2326. Total number of people: 679.
Now we select base state features.
We have these candidate features:
0: importance of preparing for quitting smoking/vaping
1: self-efficacy for preparing for quitting smoking/vaping
2: appreciation of human feedback
3: energy
4: difficulty of assigned activity
5: phase of the intervention
MAX_TRAIN_SESSION = 4 # 4th session
MIN_TRAIN_SESSION = 1 # First session
NUM_FEAT_TO_SELECT = 3 # Number of features to select
CANDIDATE_FEATURES = [0, 1, 2, 3, 4, 5] # Features to select from
VALS_PER_FEAT_TO_SELECT = [3, 2, 2] # Number of values per selected feature
# Use only data on specified sessions as training data
data_train = data.copy(deep=True)
data_train = data_train[(data_train['session'] <= MAX_TRAIN_SESSION) & (data_train["session"] >= MIN_TRAIN_SESSION)]
# Mean effort spent
effort_mean = data_train["effort"].mean()
print("Average effort response: " + str(round(effort_mean, 2)))
# Select state features
random.seed(1) # For reproducibility
feat_sel, _ = dyn.feature_selection_notabstracted(data_train[["s0", "s1", "a", "effort"]].values.tolist(),
effort_mean,
CANDIDATE_FEATURES,
vals_per_feat_to_select = VALS_PER_FEAT_TO_SELECT,
all_states_for_abstraction = all_states_for_abstraction,
num_act = NUM_ACTIONS)
# In case we remove non-last features from candidate list, indices must be adapted.
feat_sel = [CANDIDATE_FEATURES[feat_sel[i]] for i in range(NUM_FEAT_TO_SELECT)]
print("\nChosen features:", feat_sel)
Average effort response: 5.74 First feature selected: 0 First feature -> min. p-value: 0.0007 Feature selected: 2 Criterion: Min p-value: 0.0 Feature selected: 1 Criterion: Min p-value: 0.1444 Chosen features: [0, 2, 1]