Human feedback overview¶

Author: Nele Albers

Date: May 2025

Here we give an overview of:

  • the number of people noticing human feedback according to the post-questionnaire,
  • the number of people reading human feedback messages according to the post-questionnaire,
  • the number of people clicking on the reading confirmation links in the feedback messages, and
  • the number of people with data from the next session after receiving vs. not receiving human feedback.

Required files:

  • Data/data_rl_samples_abstracted[0, 1, 2][3, 2, 2].csv
  • Data/feedback_reading_confirmation_anonym.csv
  • Data/postquestionnaire_anonym.csv
  • Data/sessionsdata_anonym.csv

Authored by Nele Albers, Francisco S. Melo, Mark A. Neerincx, Olya Kudina, and Willem-Paul Brinkman.

Load packages¶

Let's load the packages we need.

In [2]:
import numpy as np
import pandas as pd

Load data¶

Let's load the data we need.

In [3]:
df = pd.read_csv("Data/data_rl_samples_abstracted[0, 1, 2][3, 2, 2].csv")
df_post = pd.read_csv("Data/postquestionnaire_anonym.csv")
df_session = pd.read_csv("Data/sessionsdata_anonym.csv")
df_conf = pd.read_csv("Data/feedback_reading_confirmation_anonym.csv")

Let's separately save the data where people did vs. did not get human feedback.

In [4]:
df_0 = df[df["a"] == 0]
df_0 = df_0.reset_index(drop=True)

df_1 = df[df["a"] == 1]
df_1 = df_1.reset_index(drop=True)
print("Number of interaction samples with human feedback:", len(df_1))
Number of interaction samples with human feedback: 465

Noticing human feedback according to post-questionnaire¶

Here we check how many people out of the people who finished the post-questionnaire and got human feedback said that they noticed the human feedback.

In [4]:
ids_human_feedback = list(df_1["rand_id"].unique())  # IDs of people who got human feedback
ids_postq_human_feedback = df_post[df_post["rand_id"].isin(ids_human_feedback)]["rand_id"].to_list()
ids_postq_noticed_feedback_before = df_post[(df_post["Support_noticed"] == 1) & (df_post["rand_id"].isin(ids_human_feedback))]["rand_id"].to_list()
ids_postq_noticed_feedback_after = df_post[(df_post["Support_noticed"] == 2) & (df_post["rand_id"].isin(ids_human_feedback))]["rand_id"].to_list()

print("***Noticed human feedback***")
print("Number of people who got human feedback:", len(ids_human_feedback))
print("Number of people in post-questionnaire who got human feedback:", len(ids_postq_human_feedback))
print("Percentage who noticed human feedback before next session:", round(len(ids_postq_noticed_feedback_before)/len(ids_postq_human_feedback) * 100, 2), "%")
print("Percentage who noticed human feedback after next session:", round(len(ids_postq_noticed_feedback_after)/len(ids_postq_human_feedback) * 100, 2), "%")
print("Percentage who did notice human feedback overall:", round((len(ids_postq_noticed_feedback_before) + len(ids_postq_noticed_feedback_after))/len(ids_postq_human_feedback) * 100, 2), "%")
    
***Noticed human feedback***
Number of people who got human feedback: 359
Number of people in post-questionnaire who got human feedback: 270
Percentage who noticed human feedback before next session: 55.93 %
Percentage who noticed human feedback after next session: 26.67 %
Percentage who did notice human feedback overall: 82.59 %

Reading human feedback according to post-questionnaire¶

Here we check how many people out of the people who finished the post-questionnaire and got human feedback said that they read the human feedback messages.

In [5]:
ids_postq_read_feedback_always = df_post[(df_post["Support_read"] == 1) & (df_post["rand_id"].isin(ids_human_feedback))]["rand_id"].to_list()
ids_postq_read_feedback_sometimes = df_post[(df_post["Support_read"] == 2) & (df_post["rand_id"].isin(ids_human_feedback))]["rand_id"].to_list()

print("***Read human feedback***")
print("Number of people who got human feedback:", len(ids_human_feedback))
print("Number of people in post-questionnaire who got human feedback:", len(ids_postq_human_feedback))
print("Percentage who read human feedback always:", round(len(ids_postq_read_feedback_always)/len(ids_postq_human_feedback) * 100, 2), "%")
print("Percentage who read human feedback sometimes:", round(len(ids_postq_read_feedback_sometimes)/len(ids_postq_human_feedback) * 100, 2), "%")
print("Percentage who did read human feedback overall:", round((len(ids_postq_read_feedback_always) + len(ids_postq_read_feedback_sometimes))/len(ids_postq_human_feedback) * 100, 2), "%")
***Read human feedback***
Number of people who got human feedback: 359
Number of people in post-questionnaire who got human feedback: 270
Percentage who read human feedback always: 70.37 %
Percentage who read human feedback sometimes: 11.48 %
Percentage who did read human feedback overall: 81.85 %

Clicking on reading confirmation link in feedback messages¶

Here we check the number of times people clicked on the reading confirmation links in the feedback messages.

In [6]:
print("***Percentage clicking on reading confirmation link***")

num_samples_clicked = 0
for i in range(len(df_1)):
    df_conf_i = df_conf[(df_conf["SESSION"] == df_1.at[i, "session"]) & (df_conf["rand_id"] == df_1.at[i, "rand_id"])]
    if len(df_conf_i) > 0:
        num_samples_clicked += 1
print("Number of interaction samples with confirmed reading:", num_samples_clicked)
print("Percentage of interaction samples with confirmed reading:", round(num_samples_clicked/len(df_1) * 100, 2), "%")
    
***Percentage clicking on reading confirmation link***
Number of interaction samples with confirmed reading: 380
Percentage of interaction samples with confirmed reading: 81.72 %

Dropout right after receiving human feedback vs. not receiving human feedback¶

Based on the RL-samples, we check how many people we do not have data on from the next session if they did vs. did not receive human feedback. This is a proxy for dropout after receiving vs. not receiving human feedback. This data is noisy, because some people may have started the next session but not provided at least the effort response, in which case we did not pay them for the next session and have no data on them from the next session. Moreover, our dataframe with RL-samples only includes samples of people who provided at least a response for the first state variable (i.e., importance) in the next session.

In [7]:
print("***Percentage of people with data in next session***")
df_session_1 = df_session[df_session["session_num"] == 1]
df_session1_support = df_session_1[(df_session_1["response_type"] == "human_support_after_session") & (df_session_1["response_value"] == "1")]
df_session1_nosupport = df_session_1[(df_session_1["response_type"] == "human_support_after_session") & (df_session_1["response_value"] == "0")]
df_session_2 = df_session[df_session["session_num"] == 2]
ids_s1_support_s2 = df_session1_support[df_session1_support["rand_id"].isin(df_session_2["rand_id"].to_list())]["rand_id"].to_list()
ids_s1_nosupport_s2 = df_session1_nosupport[df_session1_nosupport["rand_id"].isin(df_session_2["rand_id"].to_list())]["rand_id"].to_list()
print("Percentage of people with human support in session 1 with data in session 2:", round(len(ids_s1_support_s2)/len(df_session1_support) * 100, 2), "%")
print("Percentage of people with no human support in session 1 with data in session 2:", round(len(ids_s1_nosupport_s2)/len(df_session1_nosupport) * 100, 2), "%")

df_session2_support = df_session_2[(df_session_2["response_type"] == "human_support_after_session") & (df_session_2["response_value"] == "1")]
df_session2_nosupport = df_session_2[(df_session_2["response_type"] == "human_support_after_session") & (df_session_2["response_value"] == "0")]
df_session12_nosupport = df_session2_nosupport[df_session2_nosupport["rand_id"].isin(df_session1_nosupport["rand_id"].to_list())]
df_session12_somesupport_a = df_session2_nosupport[df_session2_nosupport["rand_id"].isin(df_session1_support["rand_id"].to_list())]
df_session12_somesupport = pd.concat([df_session12_somesupport_a, df_session2_support])
df_session_3 = df_session[df_session["session_num"] == 3]
ids_s2_support_s3 = df_session2_support[df_session2_support["rand_id"].isin(df_session_3["rand_id"].to_list())]["rand_id"].to_list()
ids_s2_nosupport_s3 = df_session2_nosupport[df_session2_nosupport["rand_id"].isin(df_session_3["rand_id"].to_list())]["rand_id"].to_list()
ids_s1s2_nosupport_s3 = df_session12_nosupport[df_session12_nosupport["rand_id"].isin(df_session_3["rand_id"].to_list())]["rand_id"].to_list()
ids_s1s2_somesupport_s3 = df_session12_somesupport[df_session12_somesupport["rand_id"].isin(df_session_3["rand_id"].to_list())]["rand_id"].to_list()
print("\nPercentage of people with human support in session 2 with data in session 3:", round(len(ids_s2_support_s3)/len(df_session2_support) * 100, 2), "%")
print("Percentage of people with no human support in session 2 with data in session 3:", round(len(ids_s2_nosupport_s3)/len(df_session2_nosupport) * 100, 2), "%")
print("Percentage of people with no human support in session 1 + 2 with data in session 3:", round(len(ids_s1s2_nosupport_s3)/len(df_session12_nosupport) * 100, 2), "%")
print("Percentage of people with some human support in session 1 + 2 with data in session 3:", round(len(ids_s1s2_somesupport_s3)/len(df_session12_somesupport) * 100, 2), "%")

df_session3_support = df_session_3[(df_session_3["response_type"] == "human_support_after_session") & (df_session_3["response_value"] == "1")]
df_session3_nosupport = df_session_3[(df_session_3["response_type"] == "human_support_after_session") & (df_session_3["response_value"] == "0")]
df_session123_nosupport = df_session3_nosupport[df_session3_nosupport["rand_id"].isin(df_session12_nosupport["rand_id"].to_list())]
df_session123_somesupport_a = df_session3_nosupport[df_session3_nosupport["rand_id"].isin(df_session12_somesupport["rand_id"].to_list())]
df_session123_somesupport = pd.concat([df_session123_somesupport_a, df_session3_support])
df_session_4 = df_session[df_session["session_num"] == 4]
ids_s3_support_s4 = df_session3_support[df_session3_support["rand_id"].isin(df_session_4["rand_id"].to_list())]["rand_id"].to_list()
ids_s3_nosupport_s4 = df_session3_nosupport[df_session3_nosupport["rand_id"].isin(df_session_4["rand_id"].to_list())]["rand_id"].to_list()
ids_s1s2s3_nosupport_s4 = df_session123_nosupport[df_session123_nosupport["rand_id"].isin(df_session_4["rand_id"].to_list())]["rand_id"].to_list()
ids_s1s2s3_somesupport_s4 = df_session123_somesupport[df_session123_somesupport["rand_id"].isin(df_session_4["rand_id"].to_list())]["rand_id"].to_list()
print("\nPercentage of people with human support in session 3 with data in session 4:", round(len(ids_s3_support_s4)/len(df_session3_support) * 100, 2), "%")
print("Percentage of people with no human support in session 3 with data in session 4:", round(len(ids_s3_nosupport_s4)/len(df_session3_nosupport) * 100, 2), "%")
print("Percentage of people with no human support in session 1 + 2 + 3 with data in session 4:", round(len(ids_s1s2s3_nosupport_s4)/len(df_session123_nosupport) * 100, 2), "%")
print("Percentage of people with some human support in session 1 + 2 + 3 with data in session 4:", round(len(ids_s1s2s3_somesupport_s4)/len(df_session123_somesupport) * 100, 2), "%")

df_session4_support = df_session_4[(df_session_4["response_type"] == "human_support_after_session") & (df_session_4["response_value"] == "1")]
df_session4_nosupport = df_session_4[(df_session_4["response_type"] == "human_support_after_session") & (df_session_4["response_value"] == "0")]
df_session1234_nosupport = df_session4_nosupport[df_session4_nosupport["rand_id"].isin(df_session123_nosupport["rand_id"].to_list())]
df_session1234_somesupport_a = df_session4_nosupport[df_session4_nosupport["rand_id"].isin(df_session123_somesupport["rand_id"].to_list())]
df_session1234_somesupport = pd.concat([df_session1234_somesupport_a, df_session4_support])
df_session_5 = df_session[df_session["session_num"] == 5]
ids_s4_support_s5 = df_session4_support[df_session4_support["rand_id"].isin(df_session_5["rand_id"].to_list())]["rand_id"].to_list()
ids_s4_nosupport_s5 = df_session4_nosupport[df_session4_nosupport["rand_id"].isin(df_session_5["rand_id"].to_list())]["rand_id"].to_list()
ids_s1s2s3s4_nosupport_s5 = df_session1234_nosupport[df_session1234_nosupport["rand_id"].isin(df_session_5["rand_id"].to_list())]["rand_id"].to_list()
ids_s1s2s3s4_somesupport_s5 = df_session1234_somesupport[df_session1234_somesupport["rand_id"].isin(df_session_5["rand_id"].to_list())]["rand_id"].to_list()
print("\nPercentage of people with human support in session 4 with data in session 5:", round(len(ids_s4_support_s5)/len(df_session4_support) * 100, 2), "%")
print("Percentage of people with no human support in session 4 with data in session 5:", round(len(ids_s4_nosupport_s5)/len(df_session4_nosupport) * 100, 2), "%")
print("Percentage of people with no human support in session 1 + 2 + 3 + 4 with data in session 5:", round(len(ids_s1s2s3s4_nosupport_s5)/len(df_session1234_nosupport) * 100, 2), "%")
print("Percentage of people with some human support in session 1 + 2 + 3 + 4 with data in session 5:", round(len(ids_s1s2s3s4_somesupport_s5)/len(df_session1234_somesupport) * 100, 2), "%")
    
***Percentage of people with data in next session***
Percentage of people with human support in session 1 with data in session 2: 87.26 %
Percentage of people with no human support in session 1 with data in session 2: 86.09 %

Percentage of people with human support in session 2 with data in session 3: 87.69 %
Percentage of people with no human support in session 2 with data in session 3: 89.19 %
Percentage of people with no human support in session 1 + 2 with data in session 3: 90.16 %
Percentage of people with some human support in session 1 + 2 with data in session 3: 86.61 %

Percentage of people with human support in session 3 with data in session 4: 91.38 %
Percentage of people with no human support in session 3 with data in session 4: 91.84 %
Percentage of people with no human support in session 1 + 2 + 3 with data in session 4: 91.94 %
Percentage of people with some human support in session 1 + 2 + 3 with data in session 4: 91.55 %

Percentage of people with human support in session 4 with data in session 5: 91.67 %
Percentage of people with no human support in session 4 with data in session 5: 93.63 %
Percentage of people with no human support in session 1 + 2 + 3 + 4 with data in session 5: 92.76 %
Percentage of people with some human support in session 1 + 2 + 3 + 4 with data in session 5: 93.5 %
In [ ]: