Overview of Creating Metrics

library(tntpmetrics)
library(dplyr)
library(knitr)

Calculating metrics with `make_metric`

tntpmetrics contains a simple function called make_metric to attach a new column/variable to your data with the value of the scored common metric. The new column will always have the prefix cm_ followed by the name of the metric. For example, the engagement metric is simply the sum of the four engagement survey items. To use make_metric, simply provide the data set you want to use, and the metric you want calculated, making sure to put the latter in quotes. The result is your data but with a new variable cm_engagement. The function also tells you how many rows of data did not have a construct created because at least one of the survey items was missing.

make_metric(data = ss_data_initial, metric = "engagement") %>%
  select(response_id, starts_with("eng_"), starts_with("cm_")) %>%
  head() %>%
  kable()
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."

response_id	eng_interest	eng_like	eng_losttrack	eng_moreabout	cm_engagement	cm_binary_engagement
3	2	2	2	2	8	TRUE
39	2	3	2	2	9	TRUE
73	2	2	2	2	8	TRUE
84	2	2	2	2	8	TRUE
85	2	3	2	2	9	TRUE
94	2	2	2	2	8	TRUE

Binary version of common metric

Note that in the above, there were two new variables created: cm_engagement and cm_binary_engagement. For many common metrics, there is a cut-point on the metric scale above which, scores take on a special meaning. For engagement, for example, scores of 8 or above imply that the student in that particular response was “engaged”. The variable cm_binary_engagement will be TRUE if the engagement score is above this cut-point and FALSE if not. For most common metrics, the guidance is to set goals around the actual metric score, not the binary version, as the binary version reduces the nuance of the data. However, we know teams are interested in these binary classifications to cm_binary_ variables are always created when you run make_metric as long as the metric has a defined cut-point. (The metric tntpcore does not have a defined cut-point.)

Checking for common errors

make_metric automatically checks for the most common data issues.

Misspelled Variables

First, it requires that the data have the variable names spelled exactly as above. There is nothing special about these variable names, and the function had to choose some as the default. If your data has the variable names spelled differently, then you’ll have to change them before using make_metric. Otherwise, you’ll get an error:

ss_data_initial %>%
  rename(eng_interest_wrongspelling = eng_interest) %>%
  make_metric(metric = "engagement")
#> Error: Data set data is missing the following variable(s): eng_interest 
#>  Make sure spelling is correct.

Which variable names are needed for each metric can always be found by typing ? make_metric; they are also detailed in the articles for each metric.

Items on the wrong scale

Second, make_metric will check each item to ensure it’s on the expected scale. For student survey items, it expects the scale of 0-3 outlined above. If any data value is outside of this scale, you’ll get an error telling you which variables are out of scale and the proper scale on which they should be:

ss_data_initial %>%
  mutate(
    eng_interest = eng_interest + 1,
    eng_like = eng_like - 1
  ) %>%
  make_metric(metric = "engagement")
#> Error: In data the following variable(s) have a value out of scale: 
#>  eng_like, eng_interest 
#>  They should only take values of 0, 1, 2, 3

You will also get an error if your scales are not numeric:

ss_data_initial %>%
  mutate(
    eng_interest = case_when(
        eng_like == 0 ~ "Not True",
        eng_like == 1 ~ "A Little True",
        eng_like == 2 ~ "Mostly True",
        eng_like == 3 ~ "Very True"
    )
  ) %>%
  make_metric(metric = "engagement")
#> Error: In data the following variable(s) have a value out of scale: 
#>  eng_interest 
#>  They should only take values of 0, 1, 2, 3

The scales needed for each metric are detailed in the metric articles.

(Optional) Censored scale use

There are times where items may be on the wrong scale, but in a way that is undetectable. For example, what if the student survey data was provided to you with each item on a scale of 1-4, but because students never responded “Very True”, the data only actually has values of 1-3. Values of 1-3 are all in scale for student surveys, so that the preceding error will not occur. To account for this, make_metric automatically checks that each possible value on the scale is used and gives you a warning if that is not the case by indicating the affected variables and which value(s) they did not use:

ss_data_initial %>%
  mutate(eng_interest = ifelse(eng_interest == 0, NA, eng_interest)) %>%
  make_metric(metric = "engagement") %>%
  head() %>%
  kable()
#> Warning: Not all the possible values for each variable were used in data The following variables did NOT use the following values: 
#>  eng_interest: 0 
#>  This is not an error, but you should confirm that all values are on the scale: 0, 1, 2, 3
#> [1] "199 Row(s) in data were NOT used because missing at least one value needed to create common measure."

class_id	response_id	class_frl_cat	class_soc_cat	bel_fitin	bel_ideas	eng_interest	eng_like	eng_losttrack	eng_moreabout	rel_asmuch	rel_future	rel_outside	rel_rightnow	tch_interestedideas	tch_problem	cm_engagement	cm_binary_engagement
A	3	Under 50% FRL	0-25% SOC	3	3	2	2	2	2	2	2	2	2	2	2	8	TRUE
A	39	Under 50% FRL	0-25% SOC	2	2	2	3	2	2	2	3	2	2	2	2	9	TRUE
A	73	Under 50% FRL	0-25% SOC	2	2	2	2	2	2	2	2	2	2	2	3	8	TRUE
A	84	Under 50% FRL	0-25% SOC	2	3	2	2	2	2	2	3	2	2	2	2	8	TRUE
A	85	Under 50% FRL	0-25% SOC	2	2	2	3	2	2	2	2	2	2	2	2	9	TRUE
A	94	Under 50% FRL	0-25% SOC	2	2	2	2	2	2	3	2	2	2	3	2	8	TRUE

Because this is not technically an error, you can turn off this default warning by setting scaleusewarning = F:

ss_data_initial %>%
  mutate(eng_interest = ifelse(eng_interest == 0, NA, eng_interest)) %>%
  make_metric(metric = "engagement", scaleusewarning = F) %>%
  head()
#> [1] "199 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#>   class_id response_id class_frl_cat class_soc_cat bel_fitin bel_ideas
#> 1        A           3 Under 50% FRL     0-25% SOC         3         3
#> 2        A          39 Under 50% FRL     0-25% SOC         2         2
#> 3        A          73 Under 50% FRL     0-25% SOC         2         2
#> 4        A          84 Under 50% FRL     0-25% SOC         2         3
#> 5        A          85 Under 50% FRL     0-25% SOC         2         2
#> 6        A          94 Under 50% FRL     0-25% SOC         2         2
#>   eng_interest eng_like eng_losttrack eng_moreabout rel_asmuch rel_future
#> 1            2        2             2             2          2          2
#> 2            2        3             2             2          2          3
#> 3            2        2             2             2          2          2
#> 4            2        2             2             2          2          3
#> 5            2        3             2             2          2          2
#> 6            2        2             2             2          3          2
#>   rel_outside rel_rightnow tch_interestedideas tch_problem cm_engagement
#> 1           2            2                   2           2             8
#> 2           2            2                   2           2             9
#> 3           2            2                   2           3             8
#> 4           2            2                   2           2             8
#> 5           2            2                   2           2             9
#> 6           2            2                   3           2             8
#>   cm_binary_engagement
#> 1                 TRUE
#> 2                 TRUE
#> 3                 TRUE
#> 4                 TRUE
#> 5                 TRUE
#> 6                 TRUE

Goals Analysis

In most cases, making the common metric is just an intermediate step to scoring the metric for goals purposes. tntpmetric has two functions that should make all the necessary goals calculations for you. In both cases, you do not need to create the metric ahead of time. Just provide the function your raw data, indicate the metric of interest and type of analysis needed.

Calculating the average common metric score

If you want to calculate the average common metric score at a single point in time, you can use the function metric_mean. For example, to calculate the average Sense of Belonging score in the initial survey data, simply give it your data and indicate it’s the “belonging” metric. (The by_class option will be discussed below)

metric_mean(ss_data_initial, metric = "belonging", by_class = T)
#> [1] "4 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Overall mean`
#>  1       emmean    SE df lower.CL upper.CL
#>  overall    4.9 0.473 25     3.93     5.87
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Number of data points`
#> [1] 996
#> 
#> $`Number of included classes`
#> [1] 26

metric_means estimates this mean using a multilevel model framework, and takes advantage of the R package emmeans to print the output. The overall mean is displayed in the first element of the returned list under emmean. For a more robust result, you are also provided the appropriate Standard Error (SE) and the lower and upper bounds of the 95% Confidence Interval (lower.CL and upper.cl)

Using the binary version of the variable

The function metric_mean also works on the binary version of the construct. Simply set the option use_binary to TRUE:

metric_mean(ss_data_initial, metric = "engagement", use_binary = T, by_class = T)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Overall mean`
#>  1       emmean     SE   df lower.CL upper.CL
#>  overall  0.238 0.0714 24.9   0.0906    0.385
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Number of data points`
#> [1] 994
#> 
#> $`Number of included classes`
#> [1] 26

Because the outcome is not a TRUE/FALSE binary, the mean will always be a proportion between 0 and 1. In the above example, the value 0.238 implies that 23.8% of responses in this data set were “engaged”.

Calculating the average common metric score for different groups

Many projects have equity-based goals that require looking at mean common metric scores for different types of classrooms. For example, the student survey data has a variable class_frl_cat indicating whether the response comes from a class with at least 50% of students receiving free or reduced price lunch or a class where fewer than 50% of students receive FRL. To look at the results for each group, simply include the column name as the equity_group:

metric_mean(ss_data_initial, metric = "belonging", equity_group = "class_frl_cat", by_class = T)
#> [1] "4 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Group means`
#>  equity_group     emmean    SE df lower.CL upper.CL
#>  At least 50% FRL   2.87 0.351 24     2.15     3.60
#>  Under 50% FRL      6.93 0.351 24     6.20     7.65
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Difference(s) between groups`
#>  contrast                         estimate    SE df t.ratio p.value
#>  At least 50% FRL - Under 50% FRL    -4.05 0.496 24  -8.164  <.0001
#> 
#> Degrees-of-freedom method: satterthwaite 
#> 
#> $`Number of data points`
#> [1] 996
#> 
#> $`Number of included classes`
#> [1] 26

Now, the results show the mean for both types of classes, and include another entry to the returned list called “Difference(s) between groups” the calculates the contrast, or the difference between these group means, and gives a standard error and p-value in case it’s of interest. Note that the contrast is always represented as the first group listed minus the second group listed. In this case, because the reported difference is negative, it means that classes with under 50% FRL students tended to have a higher sense of belonging score.

Equity group comparisons work even when there are more than two group values, like in the variable class_soc_cat:

metric_mean(ss_data_initial, metric = "belonging", equity_group = "class_soc_cat", by_class = T)
#> [1] "4 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Group means`
#>  equity_group emmean    SE   df lower.CL upper.CL
#>  0-25% SOC      8.01 0.314 22.2     7.36     8.66
#>  26-50% SOC     6.18 0.312 21.8     5.54     6.83
#>  51-75% SOC     4.20 0.312 21.8     3.56     4.85
#>  76-100% SOC    2.12 0.271 22.0     1.56     2.69
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Difference(s) between groups`
#>  contrast                     estimate    SE   df t.ratio p.value
#>  (0-25% SOC) - (26-50% SOC)       1.82 0.443 22.0   4.120  0.0024
#>  (0-25% SOC) - (51-75% SOC)       3.80 0.443 22.0   8.590  <.0001
#>  (0-25% SOC) - (76-100% SOC)      5.88 0.415 22.1  14.183  <.0001
#>  (26-50% SOC) - (51-75% SOC)      1.98 0.442 21.8   4.482  0.0010
#>  (26-50% SOC) - (76-100% SOC)     4.06 0.414 21.9   9.814  <.0001
#>  (51-75% SOC) - (76-100% SOC)     2.08 0.414 21.9   5.026  0.0003
#> 
#> Degrees-of-freedom method: satterthwaite 
#> P value adjustment: tukey method for comparing a family of 4 estimates 
#> 
#> $`Number of data points`
#> [1] 996
#> 
#> $`Number of included classes`
#> [1] 26

Because it’s rare for projects to set equity goals for factors that have many different groups, metric_mean warns you if your equity_group variable has more than 5 categories; usually that means something is wrong with your variable.

The by_class option

Some metrics collect multiple data points from a single class. For example, student surveys will survey multiple students in the same class, and in many cases multiple times. Because different classes will almost surely have a different number of associated data points – some classes might get 10 surveys, while another might get 50 – we need an approach that doesn’t over- or under-represent some classes because of differences in sample sizes. Fortunately, the multilevel models under-girding the the functions in tntpmetrics account for differences in sample sizes between classes automatically. But to make them work, you must have a variable in your data titled class_id representing each classroom’s unique identifier. You must also set by_class = T as we did in the above examples.

If you do not set by_class = T and/or you do not have a class_id variable, metric_mean will not account for differences in sample sizes by class. In cases where you have multiple rows of data associated with the same class, not accounting for class IDs is statistically inappropriate and the standard errors and confidence intervals will likely be too small. Because some projects will surely forget to collect a class ID, metric_means will still give you the results even if you set by_class = F (or do not specify this option, as FALSE is the default), but will warn you about this statistical issue if you are using a metric that is expecting a class ID, like student surveys or assignments:

metric_mean(ss_data_initial, metric = "belonging")
#> Warning: To properly analyze the belonging metric, you should have a variable
#> called class_id in your data, and set by_class = TRUE. If you did not collect
#> a class ID your results might not be appropriate. Contact Cassie Coddington to
#> discuss.
#> [1] "4 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Overall mean`
#>  1       emmean     SE  df lower.CL upper.CL
#>  overall   4.85 0.0794 995     4.69     5.01
#> 
#> Confidence level used: 0.95 
#> 
#> $`Number of data points`
#> [1] 996

You will not get this warning if you set by_class = F and you are analyzing a metric that is less likely to have multiple responses per class, like expectations or observations.

Calculating average growth over time

To examine how the average metric score has changed between two time points, use the function metric_growth. This function works the same as metric_mean but expects you to provide two data sets: one for the first time point (data1) and one for the later time point (data2). For example, to look at how engagement has changed over time, we can use:

metric_growth(
  data1 = ss_data_initial, 
  data2 = ss_data_final, 
  metric = "engagement", 
  by_class = T
)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> [1] "2 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Means at each timepoint`
#>  time    emmean     SE   df lower.CL upper.CL
#>  Final    0.326 0.0743 25.3   0.1731    0.479
#>  Initial  0.234 0.0743 25.3   0.0809    0.387
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Differences between timepoints`
#>  contrast        estimate     SE   df t.ratio p.value
#>  Final - Initial   0.0922 0.0115 1965   8.035  <.0001
#> 
#> Degrees-of-freedom method: satterthwaite 
#> 
#> $`Number of data points`
#> [1] 1992
#> 
#> $`Number of included classes`
#> [1] 26

In this example, the mean engagement score initially was 4.93, but increased to 5.99 by the final data collection. This difference was a growth of 1.06 points.

Using the binary version of the variable

The function metric_growth also works on the binary version of the construct. Simply set the option use_binary to TRUE:

metric_growth(
  data1 = ss_data_initial, 
  data2 = ss_data_final, 
  metric = "engagement",
  use_binary = T,
  by_class = T
)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> [1] "2 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Means at each timepoint`
#>  time    emmean     SE   df lower.CL upper.CL
#>  Final    0.326 0.0743 25.3   0.1731    0.479
#>  Initial  0.234 0.0743 25.3   0.0809    0.387
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Differences between timepoints`
#>  contrast        estimate     SE   df t.ratio p.value
#>  Final - Initial   0.0922 0.0115 1965   8.035  <.0001
#> 
#> Degrees-of-freedom method: satterthwaite 
#> 
#> $`Number of data points`
#> [1] 1992
#> 
#> $`Number of included classes`
#> [1] 26

As before, remember that the values represent proportions between 0 and 1. In the example above, 23% of responses in the initial data were engaging and 33% were engaging in the final data. The difference (0.0922) represents about 9 percentage points.

Calculating differences in growth over time between equity groups

You can also examine how growth compared between different groups by specifying he equity group:

metric_growth(
  data1 = ss_data_initial, 
  data2 = ss_data_final, 
  metric = "engagement",
  equity_group = "class_frl_cat",
  by_class = T
)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> [1] "2 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Group means at each timepoint`
#> time = Initial:
#>  equity_group      emmean     SE   df lower.CL upper.CL
#>  At least 50% FRL 0.00365 0.0714 24.6   -0.143    0.151
#>  Under 50% FRL    0.46356 0.0714 24.6    0.316    0.611
#> 
#> time = Final:
#>  equity_group      emmean     SE   df lower.CL upper.CL
#>  At least 50% FRL 0.00645 0.0714 24.6   -0.141    0.154
#>  Under 50% FRL    0.64546 0.0714 24.6    0.498    0.793
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Differences between groups at each timepoint`
#> time = Initial:
#>  contrast                         estimate    SE   df t.ratio p.value
#>  At least 50% FRL - Under 50% FRL   -0.460 0.101 24.6  -4.557  0.0001
#> 
#> time = Final:
#>  contrast                         estimate    SE   df t.ratio p.value
#>  At least 50% FRL - Under 50% FRL   -0.639 0.101 24.6  -6.332  <.0001
#> 
#> Degrees-of-freedom method: satterthwaite 
#> 
#> $`Change in differences between groups over time`
#> contrast = At least 50% FRL - Under 50% FRL:
#>  contrast1       estimate     SE   df t.ratio p.value
#>  Final - Initial   -0.179 0.0226 1965  -7.923  <.0001
#> 
#> Degrees-of-freedom method: satterthwaite 
#> 
#> $`Number of data points`
#> [1] 1992
#> 
#> $`Number of included classes`
#> [1] 26

In this example, classes with at least 50% of students receiving FRL had an initial engagement score of 2.93, and then grew to 4.02 at the final data collection. Classrooms with under 50% FRL students also grew, from 6.94 to 7.97. Adding this equity_group option will directly show how the difference between the two groups varied at each time point. In this case, classes with at least 50% FRL students had engagement scores that were 4.01 points lower than other classes initially, and 3.95 points lower at the final data collection. The difference of these differences (i.e., -3.95 - -4.01 = 0.0659) is shown in the list element “Change in differences between groups over time”. In this case, this difference is small and not significantly different from 0 (the p-value is 0.48), implying that the gap between these types of classrooms did not change meaningfully over time.

You must have the same group definitions in both data sets, or you’ll get an error:

# Renaming FRL class variable so it doesn't match initial data
ss_data_final_error <- ss_data_final %>%
  mutate(
    class_frl_cat = ifelse(
      class_frl_cat == "At least 50% FRL",
      ">= 50% FRL",
      class_frl_cat
    )
  )
metric_growth(
  data1 = ss_data_initial, 
  data2 = ss_data_final_error, 
  metric = "engagement",
  equity_group = "class_frl_cat",
  by_class = T
)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> [1] "2 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> Error: Some values of equity group are not present in BOTH data sets.

Calculating metrics with make_metric

Binary version of common metric

Checking for common errors

Misspelled Variables

Items on the wrong scale

(Optional) Censored scale use

Goals Analysis

Calculating the average common metric score

Using the binary version of the variable

Calculating the average common metric score for different groups

The by_class option

Calculating average growth over time

Using the binary version of the variable

Calculating differences in growth over time between equity groups

Calculating metrics with `make_metric`