Overview of Creating Metrics
metric_overview.Rmd
Calculating metrics with make_metric
tntpmetrics
contains a simple function called make_metric
to attach a new column/variable to your data with the value of the scored common metric. The new column will always have the prefix cm_
followed by the name of the metric. For example, the engagement metric is simply the sum of the four engagement survey items. To use make_metric
, simply provide the data set you want to use, and the metric you want calculated, making sure to put the latter in quotes. The result is your data but with a new variable cm_engagement
. The function also tells you how many rows of data did not have a construct created because at least one of the survey items was missing.
make_metric(data = ss_data_initial, metric = "engagement") %>%
select(response_id, starts_with("eng_"), starts_with("cm_")) %>%
head() %>%
kable()
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
response_id | eng_interest | eng_like | eng_losttrack | eng_moreabout | cm_engagement | cm_binary_engagement |
---|---|---|---|---|---|---|
3 | 2 | 2 | 2 | 2 | 8 | TRUE |
39 | 2 | 3 | 2 | 2 | 9 | TRUE |
73 | 2 | 2 | 2 | 2 | 8 | TRUE |
84 | 2 | 2 | 2 | 2 | 8 | TRUE |
85 | 2 | 3 | 2 | 2 | 9 | TRUE |
94 | 2 | 2 | 2 | 2 | 8 | TRUE |
Binary version of common metric
Note that in the above, there were two new variables created: cm_engagement
and cm_binary_engagement
. For many common metrics, there is a cut-point on the metric scale above which, scores take on a special meaning. For engagement, for example, scores of 8 or above imply that the student in that particular response was “engaged”. The variable cm_binary_engagement
will be TRUE
if the engagement score is above this cut-point and FALSE if not. For most common metrics, the guidance is to set goals around the actual metric score, not the binary version, as the binary version reduces the nuance of the data. However, we know teams are interested in these binary classifications to cm_binary_
variables are always created when you run make_metric
as long as the metric has a defined cut-point. (The metric tntpcore does not have a defined cut-point.)
Checking for common errors
make_metric
automatically checks for the most common data issues.
Misspelled Variables
First, it requires that the data have the variable names spelled exactly as above. There is nothing special about these variable names, and the function had to choose some as the default. If your data has the variable names spelled differently, then you’ll have to change them before using make_metric
. Otherwise, you’ll get an error:
ss_data_initial %>%
rename(eng_interest_wrongspelling = eng_interest) %>%
make_metric(metric = "engagement")
#> Error: Data set data is missing the following variable(s): eng_interest
#> Make sure spelling is correct.
Which variable names are needed for each metric can always be found by typing ? make_metric
; they are also detailed in the articles for each metric.
Items on the wrong scale
Second, make_metric
will check each item to ensure it’s on the expected scale. For student survey items, it expects the scale of 0-3 outlined above. If any data value is outside of this scale, you’ll get an error telling you which variables are out of scale and the proper scale on which they should be:
ss_data_initial %>%
mutate(
eng_interest = eng_interest + 1,
eng_like = eng_like - 1
) %>%
make_metric(metric = "engagement")
#> Error: In data the following variable(s) have a value out of scale:
#> eng_like, eng_interest
#> They should only take values of 0, 1, 2, 3
You will also get an error if your scales are not numeric:
ss_data_initial %>%
mutate(
eng_interest = case_when(
eng_like == 0 ~ "Not True",
eng_like == 1 ~ "A Little True",
eng_like == 2 ~ "Mostly True",
eng_like == 3 ~ "Very True"
)
) %>%
make_metric(metric = "engagement")
#> Error: In data the following variable(s) have a value out of scale:
#> eng_interest
#> They should only take values of 0, 1, 2, 3
The scales needed for each metric are detailed in the metric articles.
(Optional) Censored scale use
There are times where items may be on the wrong scale, but in a way that is undetectable. For example, what if the student survey data was provided to you with each item on a scale of 1-4, but because students never responded “Very True”, the data only actually has values of 1-3. Values of 1-3 are all in scale for student surveys, so that the preceding error will not occur. To account for this, make_metric
automatically checks that each possible value on the scale is used and gives you a warning if that is not the case by indicating the affected variables and which value(s) they did not use:
ss_data_initial %>%
mutate(eng_interest = ifelse(eng_interest == 0, NA, eng_interest)) %>%
make_metric(metric = "engagement") %>%
head() %>%
kable()
#> Warning: Not all the possible values for each variable were used in data The following variables did NOT use the following values:
#> eng_interest: 0
#> This is not an error, but you should confirm that all values are on the scale: 0, 1, 2, 3
#> [1] "199 Row(s) in data were NOT used because missing at least one value needed to create common measure."
class_id | response_id | class_frl_cat | class_soc_cat | bel_fitin | bel_ideas | eng_interest | eng_like | eng_losttrack | eng_moreabout | rel_asmuch | rel_future | rel_outside | rel_rightnow | tch_interestedideas | tch_problem | cm_engagement | cm_binary_engagement |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
A | 3 | Under 50% FRL | 0-25% SOC | 3 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 8 | TRUE |
A | 39 | Under 50% FRL | 0-25% SOC | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 2 | 9 | TRUE |
A | 73 | Under 50% FRL | 0-25% SOC | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 8 | TRUE |
A | 84 | Under 50% FRL | 0-25% SOC | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 2 | 8 | TRUE |
A | 85 | Under 50% FRL | 0-25% SOC | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 2 | 9 | TRUE |
A | 94 | Under 50% FRL | 0-25% SOC | 2 | 2 | 2 | 2 | 2 | 2 | 3 | 2 | 2 | 2 | 3 | 2 | 8 | TRUE |
Because this is not technically an error, you can turn off this default warning by setting scaleusewarning = F
:
ss_data_initial %>%
mutate(eng_interest = ifelse(eng_interest == 0, NA, eng_interest)) %>%
make_metric(metric = "engagement", scaleusewarning = F) %>%
head()
#> [1] "199 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> class_id response_id class_frl_cat class_soc_cat bel_fitin bel_ideas
#> 1 A 3 Under 50% FRL 0-25% SOC 3 3
#> 2 A 39 Under 50% FRL 0-25% SOC 2 2
#> 3 A 73 Under 50% FRL 0-25% SOC 2 2
#> 4 A 84 Under 50% FRL 0-25% SOC 2 3
#> 5 A 85 Under 50% FRL 0-25% SOC 2 2
#> 6 A 94 Under 50% FRL 0-25% SOC 2 2
#> eng_interest eng_like eng_losttrack eng_moreabout rel_asmuch rel_future
#> 1 2 2 2 2 2 2
#> 2 2 3 2 2 2 3
#> 3 2 2 2 2 2 2
#> 4 2 2 2 2 2 3
#> 5 2 3 2 2 2 2
#> 6 2 2 2 2 3 2
#> rel_outside rel_rightnow tch_interestedideas tch_problem cm_engagement
#> 1 2 2 2 2 8
#> 2 2 2 2 2 9
#> 3 2 2 2 3 8
#> 4 2 2 2 2 8
#> 5 2 2 2 2 9
#> 6 2 2 3 2 8
#> cm_binary_engagement
#> 1 TRUE
#> 2 TRUE
#> 3 TRUE
#> 4 TRUE
#> 5 TRUE
#> 6 TRUE
Goals Analysis
In most cases, making the common metric is just an intermediate step to scoring the metric for goals purposes. tntpmetric
has two functions that should make all the necessary goals calculations for you. In both cases, you do not need to create the metric ahead of time. Just provide the function your raw data, indicate the metric of interest and type of analysis needed.
Calculating the average common metric score
If you want to calculate the average common metric score at a single point in time, you can use the function metric_mean
. For example, to calculate the average Sense of Belonging score in the initial survey data, simply give it your data and indicate it’s the “belonging” metric. (The by_class
option will be discussed below)
metric_mean(ss_data_initial, metric = "belonging", by_class = T)
#> [1] "4 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Overall mean`
#> 1 emmean SE df lower.CL upper.CL
#> overall 4.9 0.473 25 3.93 5.87
#>
#> Degrees-of-freedom method: satterthwaite
#> Confidence level used: 0.95
#>
#> $`Number of data points`
#> [1] 996
#>
#> $`Number of included classes`
#> [1] 26
metric_means
estimates this mean using a multilevel model framework, and takes advantage of the R package emmeans
to print the output. The overall mean is displayed in the first element of the returned list under emmean
. For a more robust result, you are also provided the appropriate Standard Error (SE
) and the lower and upper bounds of the 95% Confidence Interval (lower.CL
and upper.cl
)
Using the binary version of the variable
The function metric_mean
also works on the binary version of the construct. Simply set the option use_binary
to TRUE
:
metric_mean(ss_data_initial, metric = "engagement", use_binary = T, by_class = T)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Overall mean`
#> 1 emmean SE df lower.CL upper.CL
#> overall 0.238 0.0714 24.9 0.0906 0.385
#>
#> Degrees-of-freedom method: satterthwaite
#> Confidence level used: 0.95
#>
#> $`Number of data points`
#> [1] 994
#>
#> $`Number of included classes`
#> [1] 26
Because the outcome is not a TRUE/FALSE binary, the mean will always be a proportion between 0 and 1. In the above example, the value 0.238 implies that 23.8% of responses in this data set were “engaged”.
Calculating the average common metric score for different groups
Many projects have equity-based goals that require looking at mean common metric scores for different types of classrooms. For example, the student survey data has a variable class_frl_cat
indicating whether the response comes from a class with at least 50% of students receiving free or reduced price lunch or a class where fewer than 50% of students receive FRL. To look at the results for each group, simply include the column name as the equity_group
:
metric_mean(ss_data_initial, metric = "belonging", equity_group = "class_frl_cat", by_class = T)
#> [1] "4 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Group means`
#> equity_group emmean SE df lower.CL upper.CL
#> At least 50% FRL 2.87 0.351 24 2.15 3.60
#> Under 50% FRL 6.93 0.351 24 6.20 7.65
#>
#> Degrees-of-freedom method: satterthwaite
#> Confidence level used: 0.95
#>
#> $`Difference(s) between groups`
#> contrast estimate SE df t.ratio p.value
#> At least 50% FRL - Under 50% FRL -4.05 0.496 24 -8.164 <.0001
#>
#> Degrees-of-freedom method: satterthwaite
#>
#> $`Number of data points`
#> [1] 996
#>
#> $`Number of included classes`
#> [1] 26
Now, the results show the mean for both types of classes, and include another entry to the returned list called “Difference(s) between groups” the calculates the contrast, or the difference between these group means, and gives a standard error and p-value in case it’s of interest. Note that the contrast is always represented as the first group listed minus the second group listed. In this case, because the reported difference is negative, it means that classes with under 50% FRL students tended to have a higher sense of belonging score.
Equity group comparisons work even when there are more than two group values, like in the variable class_soc_cat
:
metric_mean(ss_data_initial, metric = "belonging", equity_group = "class_soc_cat", by_class = T)
#> [1] "4 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Group means`
#> equity_group emmean SE df lower.CL upper.CL
#> 0-25% SOC 8.01 0.314 22.2 7.36 8.66
#> 26-50% SOC 6.18 0.312 21.8 5.54 6.83
#> 51-75% SOC 4.20 0.312 21.8 3.56 4.85
#> 76-100% SOC 2.12 0.271 22.0 1.56 2.69
#>
#> Degrees-of-freedom method: satterthwaite
#> Confidence level used: 0.95
#>
#> $`Difference(s) between groups`
#> contrast estimate SE df t.ratio p.value
#> (0-25% SOC) - (26-50% SOC) 1.82 0.443 22.0 4.120 0.0024
#> (0-25% SOC) - (51-75% SOC) 3.80 0.443 22.0 8.590 <.0001
#> (0-25% SOC) - (76-100% SOC) 5.88 0.415 22.1 14.183 <.0001
#> (26-50% SOC) - (51-75% SOC) 1.98 0.442 21.8 4.482 0.0010
#> (26-50% SOC) - (76-100% SOC) 4.06 0.414 21.9 9.814 <.0001
#> (51-75% SOC) - (76-100% SOC) 2.08 0.414 21.9 5.026 0.0003
#>
#> Degrees-of-freedom method: satterthwaite
#> P value adjustment: tukey method for comparing a family of 4 estimates
#>
#> $`Number of data points`
#> [1] 996
#>
#> $`Number of included classes`
#> [1] 26
Because it’s rare for projects to set equity goals for factors that have many different groups, metric_mean
warns you if your equity_group
variable has more than 5 categories; usually that means something is wrong with your variable.
The by_class option
Some metrics collect multiple data points from a single class. For example, student surveys will survey multiple students in the same class, and in many cases multiple times. Because different classes will almost surely have a different number of associated data points – some classes might get 10 surveys, while another might get 50 – we need an approach that doesn’t over- or under-represent some classes because of differences in sample sizes. Fortunately, the multilevel models under-girding the the functions in tntpmetrics
account for differences in sample sizes between classes automatically. But to make them work, you must have a variable in your data titled class_id representing each classroom’s unique identifier. You must also set by_class = T
as we did in the above examples.
If you do not set by_class = T
and/or you do not have a class_id variable, metric_mean
will not account for differences in sample sizes by class. In cases where you have multiple rows of data associated with the same class, not accounting for class IDs is statistically inappropriate and the standard errors and confidence intervals will likely be too small. Because some projects will surely forget to collect a class ID, metric_means
will still give you the results even if you set by_class = F
(or do not specify this option, as FALSE is the default), but will warn you about this statistical issue if you are using a metric that is expecting a class ID, like student surveys or assignments:
metric_mean(ss_data_initial, metric = "belonging")
#> Warning: To properly analyze the belonging metric, you should have a variable
#> called class_id in your data, and set by_class = TRUE. If you did not collect
#> a class ID your results might not be appropriate. Contact Cassie Coddington to
#> discuss.
#> [1] "4 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Overall mean`
#> 1 emmean SE df lower.CL upper.CL
#> overall 4.85 0.0794 995 4.69 5.01
#>
#> Confidence level used: 0.95
#>
#> $`Number of data points`
#> [1] 996
You will not get this warning if you set by_class = F
and you are analyzing a metric that is less likely to have multiple responses per class, like expectations or observations.
Calculating average growth over time
To examine how the average metric score has changed between two time points, use the function metric_growth
. This function works the same as metric_mean
but expects you to provide two data sets: one for the first time point (data1
) and one for the later time point (data2
). For example, to look at how engagement has changed over time, we can use:
metric_growth(
data1 = ss_data_initial,
data2 = ss_data_final,
metric = "engagement",
by_class = T
)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> [1] "2 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Means at each timepoint`
#> time emmean SE df lower.CL upper.CL
#> Final 0.326 0.0743 25.3 0.1731 0.479
#> Initial 0.234 0.0743 25.3 0.0809 0.387
#>
#> Degrees-of-freedom method: satterthwaite
#> Confidence level used: 0.95
#>
#> $`Differences between timepoints`
#> contrast estimate SE df t.ratio p.value
#> Final - Initial 0.0922 0.0115 1965 8.035 <.0001
#>
#> Degrees-of-freedom method: satterthwaite
#>
#> $`Number of data points`
#> [1] 1992
#>
#> $`Number of included classes`
#> [1] 26
In this example, the mean engagement score initially was 4.93, but increased to 5.99 by the final data collection. This difference was a growth of 1.06 points.
Using the binary version of the variable
The function metric_growth
also works on the binary version of the construct. Simply set the option use_binary
to TRUE
:
metric_growth(
data1 = ss_data_initial,
data2 = ss_data_final,
metric = "engagement",
use_binary = T,
by_class = T
)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> [1] "2 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Means at each timepoint`
#> time emmean SE df lower.CL upper.CL
#> Final 0.326 0.0743 25.3 0.1731 0.479
#> Initial 0.234 0.0743 25.3 0.0809 0.387
#>
#> Degrees-of-freedom method: satterthwaite
#> Confidence level used: 0.95
#>
#> $`Differences between timepoints`
#> contrast estimate SE df t.ratio p.value
#> Final - Initial 0.0922 0.0115 1965 8.035 <.0001
#>
#> Degrees-of-freedom method: satterthwaite
#>
#> $`Number of data points`
#> [1] 1992
#>
#> $`Number of included classes`
#> [1] 26
As before, remember that the values represent proportions between 0 and 1. In the example above, 23% of responses in the initial data were engaging and 33% were engaging in the final data. The difference (0.0922) represents about 9 percentage points.
Calculating differences in growth over time between equity groups
You can also examine how growth compared between different groups by specifying he equity group:
metric_growth(
data1 = ss_data_initial,
data2 = ss_data_final,
metric = "engagement",
equity_group = "class_frl_cat",
by_class = T
)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> [1] "2 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Group means at each timepoint`
#> time = Initial:
#> equity_group emmean SE df lower.CL upper.CL
#> At least 50% FRL 0.00365 0.0714 24.6 -0.143 0.151
#> Under 50% FRL 0.46356 0.0714 24.6 0.316 0.611
#>
#> time = Final:
#> equity_group emmean SE df lower.CL upper.CL
#> At least 50% FRL 0.00645 0.0714 24.6 -0.141 0.154
#> Under 50% FRL 0.64546 0.0714 24.6 0.498 0.793
#>
#> Degrees-of-freedom method: satterthwaite
#> Confidence level used: 0.95
#>
#> $`Differences between groups at each timepoint`
#> time = Initial:
#> contrast estimate SE df t.ratio p.value
#> At least 50% FRL - Under 50% FRL -0.460 0.101 24.6 -4.557 0.0001
#>
#> time = Final:
#> contrast estimate SE df t.ratio p.value
#> At least 50% FRL - Under 50% FRL -0.639 0.101 24.6 -6.332 <.0001
#>
#> Degrees-of-freedom method: satterthwaite
#>
#> $`Change in differences between groups over time`
#> contrast = At least 50% FRL - Under 50% FRL:
#> contrast1 estimate SE df t.ratio p.value
#> Final - Initial -0.179 0.0226 1965 -7.923 <.0001
#>
#> Degrees-of-freedom method: satterthwaite
#>
#> $`Number of data points`
#> [1] 1992
#>
#> $`Number of included classes`
#> [1] 26
In this example, classes with at least 50% of students receiving FRL had an initial engagement score of 2.93, and then grew to 4.02 at the final data collection. Classrooms with under 50% FRL students also grew, from 6.94 to 7.97. Adding this equity_group option will directly show how the difference between the two groups varied at each time point. In this case, classes with at least 50% FRL students had engagement scores that were 4.01 points lower than other classes initially, and 3.95 points lower at the final data collection. The difference of these differences (i.e., -3.95 - -4.01 = 0.0659) is shown in the list element “Change in differences between groups over time”. In this case, this difference is small and not significantly different from 0 (the p-value is 0.48), implying that the gap between these types of classrooms did not change meaningfully over time.
You must have the same group definitions in both data sets, or you’ll get an error:
# Renaming FRL class variable so it doesn't match initial data
ss_data_final_error <- ss_data_final %>%
mutate(
class_frl_cat = ifelse(
class_frl_cat == "At least 50% FRL",
">= 50% FRL",
class_frl_cat
)
)
metric_growth(
data1 = ss_data_initial,
data2 = ss_data_final_error,
metric = "engagement",
equity_group = "class_frl_cat",
by_class = T
)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> [1] "2 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> Error: Some values of equity group are not present in BOTH data sets.