Analyzing TNTP Common Metrics • tntpmetrics

TNTP uses common metrics to learn from the work project teams are doing. By using similar metrics across different projects, TNTP teams are better able to track their progress reliably and coordinate work across contexts. Common metrics also serve as the core of organization-wide goals. Though nearly all projects are using common metrics somewhere in their work, collecting common data does not guarantee each project will score or aggregate the metrics similarly. And despite scoring guidance, using valuable analyst time to walk through the steps to calculate metrics, or score teams’ goals is not ideal.

The tntpmetrics package includes three handy functions that start with raw, project-collected data and calculate individual common metric scores, summarize average common metrics scores for an entire project, compare common metric scores between (typically student sub-) groups, and analyze changes in metric scores over time. Most of the work of these functions is checking for potential errors or inconsistencies in the data – such as data not being on the proper scale, or missing but needed variables. These functions attempt to anticipate all of the potential issues that could occur between collecting raw data and calculating simple means from it. This document shows you how to use these three functions.

Current available metrics

Currently, tntpmetrics has functions to calculate and work with the following common metrics:

Student surveys: Engagement, Relevance, and Belonging
Observation tools: IPG and TNTP Core
Teacher and leader surveys: Expectations (both the current items and the older items)
Assignments: Grade-Appropriate Assignments

Practice Data Sets: ss_data_initial, ss_data_final, and ipg_data

To demonstrate how to apply the common metric functions in tntpmetrics, we will use two sets of fake student survey data. The data contains 1,000 student survey responses from 26 classes at the beginning of a project (ss_data_initial.rda) and another 1,000 student survey responses from the same 26 classes at the end of the project (ss_data_final.rda).

This data automatically comes with the tntpmetrics package. Both data sets have the same variable/column names, which include a value for each survey question from the Engagement, Relevance, and Belonging constructs. Specifically, these metrics are based on the following survey items:

Engagement
- eng_interest (“What we were learning was interesting.”)
- eng_like (“I liked what we did in class.”)
- eng_losttrack (“I was so into what we were learning I lost track of time.”)
- eng_moreabout (“I thought more about what we were learning than anything else.”)
Relevance (“We spend time in class on things that…”)
- rel_asmuch (“Will help me learn just as much as kids in other schools.”)
- rel_future (“Are important to my future goals.”)
- rel_outside (“I can use outside of school.”)
- rel_rightnow (“Are important to my life right now.”)
Belonging
- bel_ideas (“In this class, my ideas really count.”)
- tch_interestedideas (“In this class, my teacher is interested in my ideas.”)
- bel_fitin (“In this class, I feel like I fit in.”)
- tch_problem (“I could talk to my teacher for this class if I had a problem.”)

These surveys items take on values of 0 (for Not True), 1 (for A Little True), 2 (for Mostly True), or 3 (for Very True). Also in the data is a class ID, and two demographic categorical character variables associated with each class.

head(ss_data_initial)
#>   class_id response_id class_frl_cat class_soc_cat bel_fitin bel_ideas
#> 1        A           3 Under 50% FRL     0-25% SOC         3         3
#> 2        A          39 Under 50% FRL     0-25% SOC         2         2
#> 3        A          73 Under 50% FRL     0-25% SOC         2         2
#> 4        A          84 Under 50% FRL     0-25% SOC         2         3
#> 5        A          85 Under 50% FRL     0-25% SOC         2         2
#> 6        A          94 Under 50% FRL     0-25% SOC         2         2
#>   eng_interest eng_like eng_losttrack eng_moreabout rel_asmuch rel_future
#> 1            2        2             2             2          2          2
#> 2            2        3             2             2          2          3
#> 3            2        2             2             2          2          2
#> 4            2        2             2             2          2          3
#> 5            2        3             2             2          2          2
#> 6            2        2             2             2          3          2
#>   rel_outside rel_rightnow tch_interestedideas tch_problem
#> 1           2            2                   2           2
#> 2           2            2                   2           2
#> 3           2            2                   2           3
#> 4           2            2                   2           2
#> 5           2            2                   2           2
#> 6           2            2                   3           2

Because of the additional needed variables to use these functions with IPG data, we will also practice with the fake observation data ipg_data.rda. This data automatically comes with the tntpmetrics package. The data contains 100 observations using the IPG. All four subjects are represented.

Calculating Common Metrics

tntpmetrics contains a simple function called make_metric to attach a new column/variable to your data with the value of the scored common metric. The new column will always have the prefix cm_ followed by the name of the metric. For example, the engagement metric is simply the sum of the four engagement survey items. To use make_metric, simply provide the data set you want to use, and the metric you want calculated, making sure to put the latter in quotes. The result is your data but with new variable cm_engagement. The function also tells you how many rows of data did not have a construct created because at least one of the survey items was missing.

make_metric(data = ss_data_initial, metric = "engagement") %>%
  select(response_id, starts_with("eng_"), starts_with("cm_")) %>%
  head()
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#>   response_id eng_interest eng_like eng_losttrack eng_moreabout cm_engagement
#> 1           3            2        2             2             2             8
#> 2          39            2        3             2             2             9
#> 3          73            2        2             2             2             8
#> 4          84            2        2             2             2             8
#> 5          85            2        3             2             2             9
#> 6          94            2        2             2             2             8
#>   cm_binary_engagement
#> 1                 TRUE
#> 2                 TRUE
#> 3                 TRUE
#> 4                 TRUE
#> 5                 TRUE
#> 6                 TRUE

Binary version of common metric

Note that in the above, there were two new variables created: cm_engagement and cm_binary_engagement. For many common metrics, there is a cut-point on the metric scale above which, scores take on a special meaning. For engagement, for example, scores of 8 or above imply that the student in that particular response was “engaged”. The variable cm_binary_engagement will be TRUE if the engagement score is above this cut-point and FALSE if not. For most common metrics, the guidance is to set goals around the actual metric score, not the binary version, as the binary version reduces the nuance of the data. However, we know teams are interested in these binary classifications to cm_binary_ variables are always created when you run make_metric as long as the metric has a defined cut-point. (The metric tntpcore does not have a defined cut-point.)

Checking for common errors

make_metric automatically checks for the most common data issues.

Misspelled Variables

First, it requires that the data have the variable names spelled exactly as above. There is nothing special about these variable names, and the function had to choose some as the default. If your data has the variable names spelled differently, then you’ll have to change them before using make_metric. Otherwise, you’ll get an error:

ss_data_initial %>%
  rename(eng_interest_wrongspelling = eng_interest) %>%
  make_metric(metric = "engagement")
#> Error: Data set data is missing the following variable(s): eng_interest 
#>  Make sure spelling is correct.

Which variable names are needed for each metric can always be found by typing ? make_metric; they are also detailed later in this vignette.

Items on the wrong scale

Second, make_metric will check each item to ensure it’s on the expected scale. For student survey items, it expects the scale of 0-3 outlined above. If any data value is outside of this scale, you’ll get an error telling you which variables are out of scale and the proper scale on which they should be:

ss_data_initial %>%
  mutate(
    eng_interest = eng_interest + 1,
    eng_like = eng_like - 1
  ) %>%
  make_metric(metric = "engagement")
#> Error: In data the following variable(s) have a value out of scale: 
#>  eng_like, eng_interest 
#>  They should only take values of 0, 1, 2, 3

You will also get an error if your scales are not numeric:

ss_data_initial %>%
  mutate(
    eng_interest = case_when(
        eng_like == 0 ~ "Not True",
        eng_like == 1 ~ "A Little True",
        eng_like == 2 ~ "Mostly True",
        eng_like == 3 ~ "Very True"
    )
  ) %>%
  make_metric(metric = "engagement")
#> Error: In data the following variable(s) have a value out of scale: 
#>  eng_interest 
#>  They should only take values of 0, 1, 2, 3

The scales needed for each metric are detailed later in this vignette.

(Optional) Censored scale use

There are times where items may be on the wrong scale, but in a way that is undetectable. For example, what if the student survey data was provided to you with each item on a scale of 1-4, but because students never responded “Very True”, the data only actually has values of 1-3. Values of 1-3 are all in scale for student surveys, so that the preceding error will not occur. To account for this, make_metric automatically checks that each possible value on the scale is used and gives you a warning if that is not the case by indicating the affected variables and which value(s) they did not use:

ss_data_initial %>%
  mutate(eng_interest = ifelse(eng_interest == 0, NA, eng_interest)) %>%
  make_metric(metric = "engagement") %>%
  head()
#> Warning: Not all the possible values for each variable were used in data The following variables did NOT use the following values: 
#>  eng_interest: 0 
#>  This is not an error, but you should confirm that all values are on the scale: 0, 1, 2, 3
#> [1] "199 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#>   class_id response_id class_frl_cat class_soc_cat bel_fitin bel_ideas
#> 1        A           3 Under 50% FRL     0-25% SOC         3         3
#> 2        A          39 Under 50% FRL     0-25% SOC         2         2
#> 3        A          73 Under 50% FRL     0-25% SOC         2         2
#> 4        A          84 Under 50% FRL     0-25% SOC         2         3
#> 5        A          85 Under 50% FRL     0-25% SOC         2         2
#> 6        A          94 Under 50% FRL     0-25% SOC         2         2
#>   eng_interest eng_like eng_losttrack eng_moreabout rel_asmuch rel_future
#> 1            2        2             2             2          2          2
#> 2            2        3             2             2          2          3
#> 3            2        2             2             2          2          2
#> 4            2        2             2             2          2          3
#> 5            2        3             2             2          2          2
#> 6            2        2             2             2          3          2
#>   rel_outside rel_rightnow tch_interestedideas tch_problem cm_engagement
#> 1           2            2                   2           2             8
#> 2           2            2                   2           2             9
#> 3           2            2                   2           3             8
#> 4           2            2                   2           2             8
#> 5           2            2                   2           2             9
#> 6           2            2                   3           2             8
#>   cm_binary_engagement
#> 1                 TRUE
#> 2                 TRUE
#> 3                 TRUE
#> 4                 TRUE
#> 5                 TRUE
#> 6                 TRUE

Because this is not technically an error, you can turn off this default warning by setting scaleusewarning = F:

ss_data_initial %>%
  mutate(eng_interest = ifelse(eng_interest == 0, NA, eng_interest)) %>%
  make_metric(metric = "engagement", scaleusewarning = F) %>%
  head()
#> [1] "199 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#>   class_id response_id class_frl_cat class_soc_cat bel_fitin bel_ideas
#> 1        A           3 Under 50% FRL     0-25% SOC         3         3
#> 2        A          39 Under 50% FRL     0-25% SOC         2         2
#> 3        A          73 Under 50% FRL     0-25% SOC         2         2
#> 4        A          84 Under 50% FRL     0-25% SOC         2         3
#> 5        A          85 Under 50% FRL     0-25% SOC         2         2
#> 6        A          94 Under 50% FRL     0-25% SOC         2         2
#>   eng_interest eng_like eng_losttrack eng_moreabout rel_asmuch rel_future
#> 1            2        2             2             2          2          2
#> 2            2        3             2             2          2          3
#> 3            2        2             2             2          2          2
#> 4            2        2             2             2          2          3
#> 5            2        3             2             2          2          2
#> 6            2        2             2             2          3          2
#>   rel_outside rel_rightnow tch_interestedideas tch_problem cm_engagement
#> 1           2            2                   2           2             8
#> 2           2            2                   2           2             9
#> 3           2            2                   2           3             8
#> 4           2            2                   2           2             8
#> 5           2            2                   2           2             9
#> 6           2            2                   3           2             8
#>   cm_binary_engagement
#> 1                 TRUE
#> 2                 TRUE
#> 3                 TRUE
#> 4                 TRUE
#> 5                 TRUE
#> 6                 TRUE

Required column names and scales for each common metric

Below are the required column names and associated scales for each metric. See The Goals Guidance Hub for more details. Note that these are the names of the columns needed in your data. It doesn’t mean that every row must have a value on each of these variables. It can be okay if some of these variables have NA values for specific rows. For example, K-5 Literacy observations on the IPG require either all of the Core Actions (ca1_a, ca1_b, ca1_c, ca2_overall, ca3_overall) and/or rfs_overall. If an observation has all the core actions it still needs a variable called rfs_overall, but the value can be NA.

Engagement
- Metric name to use in package: metric = engagement
- Items: (“We spend time in class on things that…”)
  - eng_interest (“What we were learning was interesting.”)
  - eng_like (“I liked what we did in class.”)
  - eng_losttrack (“I was so into what we were learning I lost track of time.”)
  - eng_moreabout (“I thought more about what we were learning than anything else.”)
- Scale: 0 (Not True), 1 (A Little True), 2 (Mostly True), or 3 (Very True).
Relevance
- Metric name to use in package: metric = relevance
- Items: (“We spend time in class on things that…”)
  - rel_asmuch (“Will help me learn just as much as kids in other schools.”)
  - rel_future (“Are important to my future goals.”)
  - rel_outside (“I can use outside of school.”)
  - rel_rightnow (“Are important to my life right now.”)
- Scale: 0 (Not True), 1 (A Little True), 2 (Mostly True), or 3 (Very True).
Belonging
- Metric name to use in package: metric = belonging
- Items:
  - bel_ideas (“In this class, my ideas really count.”)
  - tch_interestedideas (“In this class, my teacher is interested in my ideas.”)
  - bel_fitin (“In this class, I feel like I fit in.”)
  - tch_problem (“I could talk to my teacher for this class if I had a problem.”)
- Scale: 0 (Not True), 1 (A Little True), 2 (Mostly True), or 3 (Very True).
Teacher or Leader Expectations - CURRENT Version
- Metric name to use in package: metric = expectations
- Items:
  - exp_fairtomaster (“It’s fair to expect students in this class to master these standards by the end of the year.”)
  - exp_oneyearenough (“One year is enough time for students in this class to master these standards.”)
  - exp_allstudents (“All students in my class can master the grade-level standards by the end of the year.”)
  - exp_appropriate (“The standards are appropriate for the students in this class.”)
- Scale: 0 (Strongly Disagree), 1 (Disagree), 2 (Somewhat Disagree), 3 (Somewhat Agree), 4 (Agree), and 5 (Strongly Agree).
Teacher or Leader Expectations - OLD Version
- Metric name to use in package: metric = expectation_old
- Items:
  - exp_allstudents (“All students in my class can master the grade-level standards by the end of the year.”)
  - exp_toochallenging (“The standards are too challenging for students in my class.”)
  - exp_oneyear (“One year is enough time for students in my class to master the standards.”)
  - exp_different (“Students in my class need something different than what is outlined in the standards.”)
  - exp_overburden (“Students in my class are overburdened by the demands of the standards.”)
  - exp_began (“Because of where students began the year, I spend nearly all of my time on standards from earlier grades.”)
TNTP Core
- Metric name to use in package: metric = tntpcore
- Items:
  - ec (“Essential Content”)
  - ao (“Academic Ownership”)
  - dl (“Demonstration of Learning”)
  - cl (“Culture of Learning”)
- Scale: 1 (Ineffective), 2 (Minimally Effective), 3 (Developing), 4 (Proficient), 5 (Skillful)
IPG
- Metric name to use in package: metric = ipg
- Items: All observations need
  - ca1_a (“Core Action 1A”)
  - ca1_b (“Core Action 1B”)
  - ca1_c (“Core Action 1C”)
  - ca2_overall (“Core Action 2 Overall”)
  - ca3_overall (“Core Action 3 Overall”)
  - col
- Items for K-5 Literacy observations also required are
  - rfs_overall (“Reading Foundation Skills Overall”)
- Items for science observations also required are
  - ca1_d (“Core Action 1D”)
  - ca1_e (“Core Action 1E”)
  - ca1_f (“Core Action 1F”)
  - science_filter (“Text”, “Inquiry and Scientific Practice”, “Both”, or “Neither”.)
- Other Needed Items are:
  - grade_level (“Numeric grade-level”)
  - form (“Math”, “Literacy”, “Science”, or “Social Studies”).
- Scale: ca1_a, ca1_b, ca1_c, ca1_d, ca1_e, and ca1_f are 0 (No) and 1 (Yes). All other items are 1 (Not Yet), 2 (Somewhat), 3 (Mostly), and 4 (Yes)
- Note: RFS Overall is only required if observation data contains K-5 Literacy observations. Core Action 1 D - F are only required if data contains science observations.
Grade-Appropriate Assignments
- Metric name to use in package: metric = assignments
- Items: content, practice, relevance
- Scale: 0 (No Opportunity), 1 (Minimal Opportunity), 2 (Sufficient Opportunity)

A note about the expectations metric

The items used to measure expectations shifted from a collection of six, mostly reverse-coded worded items to four positively worded items. Both expectations metrics are available, with the current 4-item expectations metric known as “expectations” and the older 6-item expectations metric known as “expectations_old”. That is, to use the older 6-item expectations metric in this package set metric = expectations_old and to use the current 4-item expectations metric set metric = expectations.

A note about the IPG

The IPG has the most complicated scoring approach of any of the common metrics. That is because it was originally intended as a diagnostic and development (rather than evaluation) tool, has different components based on the subject matter, has indicators on different scales, and can often have layers of skip logic in the online form used to capture the data. Nevertheless, make_metric works just as easily on the IPG as it does on other metrics, but users should be aware of three things:

The function expects the Core Action 1 indicators to be on a 0-1 scale, but the other Core Action scores (and RFS Overall and Culture of Learning) to be on a 1-4 scale. This is to make the function work more easily on data coming out of Academic Diagnostic forms, which tend to use these scales. make_metric will automatically place everything on the proper scale.
make_metric will not account for observations that should be excluded. For example, some Literacy observations were unrateable because they focused on narrative writing. make_metric does not expect any of these type of skip-logic filters that often accompany the online Academic Diagnostic form, so it’s up to the analyst to first exclude observations that should not be included based on the business rules. Similarly, because of the online skip logic, there are occasions where Core Actions 2 and 3 should be set to the lowest possible value because the observer was skipped past the questions. If these values are left as NA, make_metric will return NAs for the overall common metric score. The analyst must apply the appropriate business rules before using make_metric.

Because of the somewhat increased complexity, there are additional IPG examples at the end of this vignette.

Goals Analysis

In most cases, making the common metric is just an intermediate step to scoring the metric for goals purposes. tntpmetric has two functions that should make all the necessary goals calculations for you. In both cases, you do not need to create the metric ahead of time. Just provide the function your raw data, indicate the metric of interest and type of analysis needed.

Calculating the average common metric score

If you want to calculate the average common metric score at a single point in time, you can use the function metric_mean. For example, to calculate the average Sense of Belonging score in the initial survey data, simply give it your data and indicate it’s the “belonging” metric. (The by_class option will be discussed below)

metric_mean(ss_data_initial, metric = "belonging", by_class = T)
#> [1] "4 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Overall mean`
#>  1       emmean    SE df lower.CL upper.CL
#>  overall    4.9 0.473 25     3.93     5.87
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Number of data points`
#> [1] 996
#> 
#> $`Number of included classes`
#> [1] 26

metric_means estimates this mean using a multilevel model framework, and takes advantage of the R package emmeans to print the output. The overall mean is displayed in the first element of the returned list under emmean. For a more robust result, you are also provided the appropriate Standard Error (SE) and the lower and upper bounds of the 95% Confidence Interval (lower.CL and upper.cl)

Using the binary version of the variable

The function metric_mean also works on the binary version of the construct. Simply set the option use_binary to TRUE:

metric_mean(ss_data_initial, metric = "engagement", use_binary = T, by_class = T)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Overall mean`
#>  1       emmean     SE   df lower.CL upper.CL
#>  overall  0.238 0.0714 24.9   0.0906    0.385
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Number of data points`
#> [1] 994
#> 
#> $`Number of included classes`
#> [1] 26

Because the outcome is not a TRUE/FALSE binary, the mean will always be a proportion between 0 and 1. In the above example, the value 0.238 implies that 23.8% of responses in this data set were “engaged”.

Calculating the average common metric score for different groups

Many projects have equity-based goals that require looking at mean common metric scores for different types of classrooms. For example, the student survey data has a variable class_frl_cat indicating whether the response comes from a class with at least 50% of students receiving free or reduced price lunch or a class where fewer than 50% of students receive FRL. To look at the results for each group, simply include the column name as the equity_group:

metric_mean(ss_data_initial, metric = "belonging", equity_group = "class_frl_cat", by_class = T)
#> [1] "4 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Group means`
#>  equity_group     emmean    SE df lower.CL upper.CL
#>  At least 50% FRL   2.87 0.351 24     2.15     3.60
#>  Under 50% FRL      6.93 0.351 24     6.20     7.65
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Difference(s) between groups`
#>  contrast                         estimate    SE df t.ratio p.value
#>  At least 50% FRL - Under 50% FRL    -4.05 0.496 24  -8.164  <.0001
#> 
#> Degrees-of-freedom method: satterthwaite 
#> 
#> $`Number of data points`
#> [1] 996
#> 
#> $`Number of included classes`
#> [1] 26

Now, the results show the mean for both types of classes, and include another entry to the returned list called “Difference(s) between groups” the calculates the contrast, or the difference between these group means, and gives a standard error and p-value in case it’s of interest. Note that the contrast is always represented as the first group listed minus the second group listed. In this case, because the reported difference is negative, it means that classes with under 50% FRL students tended to have a higher sense of belonging score.

Equity group comparisons work even when there are more than two group values, like in the variable class_soc_cat:

metric_mean(ss_data_initial, metric = "belonging", equity_group = "class_soc_cat", by_class = T)
#> [1] "4 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Group means`
#>  equity_group emmean    SE   df lower.CL upper.CL
#>  0-25% SOC      8.01 0.314 22.2     7.36     8.66
#>  26-50% SOC     6.18 0.312 21.8     5.54     6.83
#>  51-75% SOC     4.20 0.312 21.8     3.56     4.85
#>  76-100% SOC    2.12 0.271 22.0     1.56     2.69
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Difference(s) between groups`
#>  contrast                     estimate    SE   df t.ratio p.value
#>  (0-25% SOC) - (26-50% SOC)       1.82 0.443 22.0   4.120  0.0024
#>  (0-25% SOC) - (51-75% SOC)       3.80 0.443 22.0   8.590  <.0001
#>  (0-25% SOC) - (76-100% SOC)      5.88 0.415 22.1  14.183  <.0001
#>  (26-50% SOC) - (51-75% SOC)      1.98 0.442 21.8   4.482  0.0010
#>  (26-50% SOC) - (76-100% SOC)     4.06 0.414 21.9   9.814  <.0001
#>  (51-75% SOC) - (76-100% SOC)     2.08 0.414 21.9   5.026  0.0003
#> 
#> Degrees-of-freedom method: satterthwaite 
#> P value adjustment: tukey method for comparing a family of 4 estimates 
#> 
#> $`Number of data points`
#> [1] 996
#> 
#> $`Number of included classes`
#> [1] 26

Because it’s rare for projects to set equity goals for factors that have many different groups, metric_mean warns you if your equity_group variable has more than 5 categories; usually that means something is wrong with your variable.

The by_class option

Some metrics collect multiple data points from a single class. For example, student surveys will survey multiple students in the same class, and in many cases multiple times. Because different classes will almost surely have a different number of associated data points – some classes might get 10 surveys, while another might get 50 – we need an approach that doesn’t over- or under-represent some classes because of differences in sample sizes. Fortunately, the multilevel models under-girding the the functions in tntpmetrics account for differences in sample sizes between classes automatically. But to make them work, you must have a variable in your data titled class_id representing each classroom’s unique identifier. You must also set by_class = T as we did in the above examples.

If you do not set by_class = T and/or you do not have a class_id variable, metric_mean will not account for differences in sample sizes by class. In cases where you have multiple rows of data associated with the same class, not accounting for class IDs is statistically inappropriate and the standard errors and confidence intervals will likely be too small. Because some projects will surely forget to collect a class ID, metric_means will still give you the results even if you set by_class = F (or do not specify this option, as FALSE is the default), but will warn you about this statistical issue if you are using a metric that is expecting a class ID, like student surveys or assignments:

metric_mean(ss_data_initial, metric = "belonging")
#> Warning: To properly analyze the belonging metric, you should have a variable
#> called class_id in your data, and set by_class = TRUE. If you did not collect
#> a class ID your results might not be appropriate. Contact Cassie Coddington to
#> discuss.
#> [1] "4 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Overall mean`
#>  1       emmean     SE  df lower.CL upper.CL
#>  overall   4.85 0.0794 995     4.69     5.01
#> 
#> Confidence level used: 0.95 
#> 
#> $`Number of data points`
#> [1] 996

You will not get this warning if you set by_class = F and you are analyzing a metric that is less likely to have multiple responses per class, like expectations or observations.

Calculating average growth over time

To examine how the average metric score has changed between two time points, use the function metric_growth. This function works the same as metric_mean but expects you to provide two data sets: one for the first time point (data1) and one for the later time point (data2). For example, to look at how engagement has changed over time, we can use:

metric_growth(
  data1 = ss_data_initial, 
  data2 = ss_data_final, 
  metric = "engagement", 
  by_class = T
)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> [1] "2 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Means at each timepoint`
#>  time    emmean     SE   df lower.CL upper.CL
#>  Final    0.326 0.0743 25.3   0.1731    0.479
#>  Initial  0.234 0.0743 25.3   0.0809    0.387
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Differences between timepoints`
#>  contrast        estimate     SE   df t.ratio p.value
#>  Final - Initial   0.0922 0.0115 1965   8.035  <.0001
#> 
#> Degrees-of-freedom method: satterthwaite 
#> 
#> $`Number of data points`
#> [1] 1992
#> 
#> $`Number of included classes`
#> [1] 26

In this example, the mean engagement score initially was 4.93, but increased to 5.99 by the final data collection. This difference was a growth of 1.06 points.

Using the binary version of the variable

The function metric_growth also works on the binary version of the construct. Simply set the option use_binary to TRUE:

metric_growth(
  data1 = ss_data_initial, 
  data2 = ss_data_final, 
  metric = "engagement",
  use_binary = T,
  by_class = T
)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> [1] "2 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Means at each timepoint`
#>  time    emmean     SE   df lower.CL upper.CL
#>  Final    0.326 0.0743 25.3   0.1731    0.479
#>  Initial  0.234 0.0743 25.3   0.0809    0.387
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Differences between timepoints`
#>  contrast        estimate     SE   df t.ratio p.value
#>  Final - Initial   0.0922 0.0115 1965   8.035  <.0001
#> 
#> Degrees-of-freedom method: satterthwaite 
#> 
#> $`Number of data points`
#> [1] 1992
#> 
#> $`Number of included classes`
#> [1] 26

As before, remember that the values represent proportions between 0 and 1. In the example above, 23% of responses in the initial data were engaging and 33% were engaging in the final data. The difference (0.0922) represents about 9 percentage points.

Calculating differences in growth over time between equity groups

You can also examine how growth compared between different groups by specifying he equity group:

metric_growth(
  data1 = ss_data_initial, 
  data2 = ss_data_final, 
  metric = "engagement",
  equity_group = "class_frl_cat",
  by_class = T
)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> [1] "2 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> $`Group means at each timepoint`
#> time = Initial:
#>  equity_group      emmean     SE   df lower.CL upper.CL
#>  At least 50% FRL 0.00365 0.0714 24.6   -0.143    0.151
#>  Under 50% FRL    0.46356 0.0714 24.6    0.316    0.611
#> 
#> time = Final:
#>  equity_group      emmean     SE   df lower.CL upper.CL
#>  At least 50% FRL 0.00645 0.0714 24.6   -0.141    0.154
#>  Under 50% FRL    0.64546 0.0714 24.6    0.498    0.793
#> 
#> Degrees-of-freedom method: satterthwaite 
#> Confidence level used: 0.95 
#> 
#> $`Differences between groups at each timepoint`
#> time = Initial:
#>  contrast                         estimate    SE   df t.ratio p.value
#>  At least 50% FRL - Under 50% FRL   -0.460 0.101 24.6  -4.557  0.0001
#> 
#> time = Final:
#>  contrast                         estimate    SE   df t.ratio p.value
#>  At least 50% FRL - Under 50% FRL   -0.639 0.101 24.6  -6.332  <.0001
#> 
#> Degrees-of-freedom method: satterthwaite 
#> 
#> $`Change in differences between groups over time`
#> contrast = At least 50% FRL - Under 50% FRL:
#>  contrast1       estimate     SE   df t.ratio p.value
#>  Final - Initial   -0.179 0.0226 1965  -7.923  <.0001
#> 
#> Degrees-of-freedom method: satterthwaite 
#> 
#> $`Number of data points`
#> [1] 1992
#> 
#> $`Number of included classes`
#> [1] 26

In this example, classes with at least 50% of students receiving FRL had an initial engagement score of 2.93, and then grew to 4.02 at the final data collection. Classrooms with under 50% FRL students also grew, from 6.94 to 7.97. Adding this equity_group option will directly show how the difference between the two groups varied at each time point. In this case, classes with at least 50% FRL students had engagement scores that were 4.01 points lower than other classes initially, and 3.95 points lower at the final data collection. The difference of these differences (i.e., -3.95 - -4.01 = 0.0659) is shown in the list element “Change in differences between groups over time”. In this case, this difference is small and not significantly different from 0 (the p-value is 0.48), implying that the gap between these types of classrooms did not change meaningfully over time.

You must have the same group definitions in both data sets, or you’ll get an error:

# Renaming FRL class variable so it doesn't match initial data
ss_data_final_error <- ss_data_final %>%
  mutate(
    class_frl_cat = ifelse(
      class_frl_cat == "At least 50% FRL",
      ">= 50% FRL",
      class_frl_cat
    )
  )
metric_growth(
  data1 = ss_data_initial, 
  data2 = ss_data_final_error, 
  metric = "engagement",
  equity_group = "class_frl_cat",
  by_class = T
)
#> [1] "6 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> [1] "2 Row(s) in data were NOT used because missing at least one value needed to create common measure."
#> Error: Some values of equity group are not present in BOTH data sets.

IPG Examples

Analyzing IPG data requires additional variables beyond those used directly in scoring the construct. This is because the IPG calculates the construct differently based on the subject and grade-level of the observation. Though some of these calculations are slightly more involved, you can use the functions in tntpmetrics in the same way as the examples above. You just need to make sure that each observation has all the necessary variables.

IPG data should be long and tidy

tntpmetrics expects your IPG data to be long and tidy: each row is a separate observation, and each column should represent just one thing. That is, you should only have one column titled ca2_overall (representing the overall score to Core Action 2); you should not have a separate variable for each subject (i.e., you should not have math2_overall, rlc2_overall, science2_overall, etc.), as this would not be “tidy” because that column now contains information on two things: subjects and core action 2 scores. Instead, you must have a separate column/variable called form in your data indicating the subject on which the IPG was applied. Depending on the project, this might require you to compile all of your IPG data from separate forms in WordPress/Formidable before using the tntpmetrics package.

Additional needed variables

When working with the IPG, all observations will need a variable called form which must be “Literacy”, “Math”, “Science”, or “Social Studies”. All observations will also need a variable called grade_level, which is numeric and can range from -1 to 12, where -1 is Pre-K and 0 is Kindergarten.

All literacy observations that take place in a Pre-K to 5th grade classroom must also have a variable called rfs_overall which represents the overall score on the reading foundation skills.

Science observations must have additional Core Action 1 domains ca1_d, ca1_e, and ca1_f. How these additional domains get scored depends on the value to a gatekeeper or filter question from the form: “Did the lesson focus on a text or inquiry and scientific practice (experimentation, application, modeling, analysis, etc.)? Note that “texts” in this context also means other scientifically appropriate mediums of communication, like grade-appropriate videos, data sets, models, etc.“. Thus, all science observations must also have a variable called science_filter which must have values of”Text“,”Inquiry and Scientific Practice“,”Both“, or”Neither".

Calculating observation scores

If you have all of the needed variables, you can create the overall observation score just as we did in the above examples:

ipg_data %>%
  make_metric(metric = "ipg") %>%
  select(ends_with("overall"), col, cm_ipg) %>%
  head()
#>   ca2_overall ca3_overall rfs_overall ca1_overall col cm_ipg
#> 1           4           1          NA           1   2   1.25
#> 2           1           1          NA           1   4   1.00
#> 3           4           1          NA           1   4   1.75
#> 4           1           4          NA           0   3   1.25
#> 5           3           3          NA           1   3   1.75
#> 6           1           3          NA           2   1   1.00

In the above, notice that Core Actions 2 and 3, and Culture of Learning were rescaled in order to calculate the IPG construct (cm_ipg) but are returned to you in their original scale of 1-4. Additionally, the oveall Core Action 1 score is created as a new variable (ca1_overall) as only the subdomains are included in the original data.

But note what happens if you do not have the variable called form:

ipg_data %>%
  select(-form) %>%
  make_metric(metric = "ipg")
#> Error: Data is missing the following variable(s): form 
#>  Make sure spelling is correct.

… Or if you don’t have a variable called grade_level:

ipg_data %>%
  select(-grade_level) %>%
  make_metric(metric = "ipg")
#> Error: Data is missing the following variable(s): grade_level 
#>  Make sure spelling is correct.

… Or if either of these takes unexpected values:

ipg_data %>%
  mutate(form = ifelse(row_number() == 1, "Blarg", form)) %>% 
  make_metric(metric = "ipg")
#> Error: form variable has values other than Math, Literacy, Science, or Social Studies 
#>  Make sure subject options are spelled correctly with correct capitalization.

ipg_data %>%
  mutate(grade_level = ifelse(row_number() == 1, 15, grade_level)) %>% 
  make_metric(metric = "ipg")
#> Error: grade_level has values other than -1, 0, 1, 2, ...12. 
#>  Make sure grade_level variable is an integer between -1 and 13.

In fact, NAs are not allowed for grade-level or form because they determine how the construct is scored. You will get an error if any of your data rows have NAs in these columns:

ipg_data %>%
  mutate(form = ifelse(row_number() == 1, NA, form)) %>% 
  make_metric(metric = "ipg")
#> Error in data_scale_check_ipg(data): All observations must have a value for form. NAs are not allowed.

Missing ratings

tntpmetric will attempt to create an overall IPG score for every observation in your data. However, it can only do so if all the needed domain ratings are included. In cases where an observation is missing a required domain rating – for example, an observation is missing an overall score on Core Action 2 or Culture of Learning – it will return an NA for the overall IPG construct value and tell you how many observations were missing key information.

In the example below, we can see this warning if we purposefully drop some key domain ratings in the data before applying the make_metric function:

ipg_data %>%
  mutate(
    col = ifelse(row_number() == 1, NA, col),
    ca3_overall = ifelse(row_number() == 2, NA, ca3_overall),
    rfs_overall = ifelse(row_number() == 14, NA, rfs_overall),
    ca2_overall = ifelse(row_number() == 14, NA, ca2_overall)
  ) %>% 
  make_metric(metric = "ipg") %>%
  select(observation_number, ends_with("overall"), col, cm_ipg) %>%
  slice(1:5, 14)
#> Warning: 1 K-5 Literacy observation(s) were missing an overall RFS score and
#> some Core Actions. These observations need an overall RFS score or all three
#> Core Actions, or both in order to have an overall IPG score calculated.
#> Warning: 1 Observation(s) have a missing score on at least one of the Core
#> Actions. Observations must be rated on all Core Actions in order to have an
#> overall IPG score calculated. In many cases, missing scores should be set to the
#> lowest possible value or, if it's ineligible, the entire observation should be
#> removed before scoring. Check the scoring guide for more details.
#> Warning: 1 Observation(s) have a missing score on Culture of Learning.
#> Observations must be rated on Culture of Learning in order to have an overall
#> IPG score calculated.
#>   observation_number ca2_overall ca3_overall rfs_overall ca1_overall col cm_ipg
#> 1                  1           4           1          NA           1  NA     NA
#> 2                  2           1          NA          NA           1   4     NA
#> 3                  3           4           1          NA           1   4   1.75
#> 4                  4           1           4          NA           0   3   1.25
#> 5                  5           3           3          NA           1   3   1.75
#> 6                 14          NA           2          NA           2   4     NA

Note that these messages are warnings, and the code will still run. But the affected observatinos now have NAs for their value for the overall construct (cm_ippg).

Science observation scores

The (at the time of this writing) new science IPG form has more Core Action 1 subdomain ratings to account for the variety of lessons science observers might see. tntpmetrics accounts for these scoring approaches, but you must include a variable called science_filter in your data to tell the function which Core Action 1 values to expect. Without this variable, you will get an error:

ipg_data %>%
  filter(form == "Science") %>%
  make_metric(metric = "ipg") %>%
  select(science_filter, starts_with("ca"), col, cm_ipg) %>%
  head()
#>   science_filter ca1_a ca1_b ca1_c ca1_d ca1_e ca1_f ca2_overall ca3_overall
#> 1           Both     0     1     0     0     0     1           4           1
#> 2        Neither     0     0    NA    NA    NA    NA           1           4
#> 3           Text     1     0     1     1     0    NA           3           3
#> 4           Text     0     1     1     0     0    NA           1           1
#> 5        Neither     0     0    NA    NA    NA    NA           2           3
#> 6           Both     1     0     0     1     0     0           1           2
#>   ca1_overall col cm_ipg
#> 1           1   2   1.25
#> 2           0   3   1.25
#> 3           1   3   1.75
#> 4           1   2   0.50
#> 5           0   2   1.00
#> 6           1   1   0.50

ipg_data %>%
  filter(form == "Science") %>%
  select(-science_filter) %>%
  make_metric(metric = "ipg")
#> Error: Data contains science observation(s) but is missing the following variables: science_filter 
#>  Make sure they are spelled correctly.

IPG goals

If you have all of the needed variables with proper, allowable values, then the functions in tntpmetrics to calculate means (overall or by group) or changes over time work identically to the student survey examples above. For instance:

ipg_data %>%
  metric_mean(metric = "ipg")
#> $`Overall mean`
#>  1       emmean     SE df lower.CL upper.CL
#>  overall   1.44 0.0502 99     1.34     1.54
#> 
#> Confidence level used: 0.95 
#> 
#> $`Number of data points`
#> [1] 100

Questions?

Contact Adam Maier or Cassie Coddington with questions.