Skip to contents

serpentine sorts data in a serpentine fashion (alternating between ascending and descending orders) for all variables specified. mixed_serpentine sorts the data with ascending or descending sorts for every variable specified except the last, which is serpentine sorted.

Usage

serpentine(data = NULL, ..., random_num = 1)

mixed_serpentine(data = NULL, ...)

Arguments

data

is the data.frame to be sorted

...

are the variables to serpentine sort, in the given order. In serpentine, the first variable listed will be sorted in ascending order, the second variable will alternate between ascending and descending order by the value of the first variable, and so on. In mixed_serpentine, it is assumed all variables listed should be sorted in ascending order except the last, which is serpentine sorted. The user can choose a descending sort for any variable except the last by using the desc() wrapper.

random_num

is a random number to break ties randomly. This is most helpful when all variables on which the data is sorted are categorical as it is more liekly there are several rows of data with identical values on each category. Default is 1 so that results are reproducable.

Value

A data.frame with equal size as the original data, but sorted differently.

Details

This is helpful in complex sampling designs with implicit stratification, as it reduces the variation in the stratified outcome for adjacent sampled units and thus reduces the overall sampling error. Serpentine sorts are commonly used in NCES surveys.

Examples

# All variables except first are serpentine sorted
serpentine(data = mtcars, cyl, mpg)
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21.4     4 121     109  4.11  2.78  18.6     1     1     4     2
#>  2  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
#>  3  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
#>  4  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
#>  5  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
#>  6  26       4 120.     91  4.43  2.14  16.7     0     1     5     2
#>  7  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
#>  8  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
#>  9  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
#> 10  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
#> # ... with 22 more rows
serpentine(data = mtcars, cyl, vs, mpg)
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  26       4 120.     91  4.43  2.14  16.7     0     1     5     2
#>  2  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
#>  3  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
#>  4  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
#>  5  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
#>  6  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
#>  7  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
#>  8  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
#>  9  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
#> 10  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
#> # ... with 22 more rows

# Same sort variables, but different resulting order because changing random number
serpentine(data = mtcars, vs, am)
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  13.3     8  350    245  3.73  3.84  15.4     0     0     3     4
#>  2  16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
#>  3  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  4  15.5     8  318    150  2.76  3.52  16.9     0     0     3     2
#>  5  19.2     8  400    175  3.08  3.84  17.0     0     0     3     2
#>  6  15.2     8  276.   180  3.07  3.78  18       0     0     3     3
#>  7  10.4     8  460    215  3     5.42  17.8     0     0     3     4
#>  8  15.2     8  304    150  3.15  3.44  17.3     0     0     3     2
#>  9  17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
#> 10  14.7     8  440    230  3.23  5.34  17.4     0     0     3     4
#> # ... with 22 more rows
serpentine(data = mtcars, vs, am, random_num = 5)
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  18.7     8  360    175  3.15  3.44  17.0     0     0     3     2
#>  2  19.2     8  400    175  3.08  3.84  17.0     0     0     3     2
#>  3  10.4     8  460    215  3     5.42  17.8     0     0     3     4
#>  4  15.2     8  304    150  3.15  3.44  17.3     0     0     3     2
#>  5  13.3     8  350    245  3.73  3.84  15.4     0     0     3     4
#>  6  10.4     8  472    205  2.93  5.25  18.0     0     0     3     4
#>  7  17.3     8  276.   180  3.07  3.73  17.6     0     0     3     3
#>  8  14.7     8  440    230  3.23  5.34  17.4     0     0     3     4
#>  9  16.4     8  276.   180  3.07  4.07  17.4     0     0     3     3
#> 10  14.3     8  360    245  3.21  3.57  15.8     0     0     3     4
#> # ... with 22 more rows

# Changing the random number has minimal effect when a non-cateogrical variable is included
serpentine(data = mtcars, cyl, vs, mpg)
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  26       4 120.     91  4.43  2.14  16.7     0     1     5     2
#>  2  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
#>  3  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
#>  4  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
#>  5  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
#>  6  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
#>  7  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
#>  8  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
#>  9  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
#> 10  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
#> # ... with 22 more rows
serpentine(data = mtcars, cyl, vs, mpg, random_num = 23)
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  26       4 120.     91  4.43  2.14  16.7     0     1     5     2
#>  2  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
#>  3  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
#>  4  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
#>  5  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
#>  6  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
#>  7  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
#>  8  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
#>  9  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
#> 10  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
#> # ... with 22 more rows

# cyl, and vs are ascending sorted while mpg is serpentine sorted
mixed_serpentine(mtcars, cyl, vs, mpg)
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  26       4 120.     91  4.43  2.14  16.7     0     1     5     2
#>  2  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
#>  3  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
#>  4  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
#>  5  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
#>  6  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
#>  7  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
#>  8  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
#>  9  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
#> 10  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
#> # ... with 22 more rows

# cyl is ascending, vs is descending, and mpg is serpentine sorted
mixed_serpentine(mtcars, cyl, dplyr::desc(vs), mpg)
#> # A tibble: 32 x 11
#>      mpg   cyl  disp    hp  drat    wt  qsec    vs    am  gear  carb
#>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#>  1  21.4     4 121     109  4.11  2.78  18.6     1     1     4     2
#>  2  21.5     4 120.     97  3.7   2.46  20.0     1     0     3     1
#>  3  22.8     4 108      93  3.85  2.32  18.6     1     1     4     1
#>  4  22.8     4 141.     95  3.92  3.15  22.9     1     0     4     2
#>  5  24.4     4 147.     62  3.69  3.19  20       1     0     4     2
#>  6  27.3     4  79      66  4.08  1.94  18.9     1     1     4     1
#>  7  30.4     4  95.1   113  3.77  1.51  16.9     1     1     5     2
#>  8  30.4     4  75.7    52  4.93  1.62  18.5     1     1     4     2
#>  9  32.4     4  78.7    66  4.08  2.2   19.5     1     1     4     1
#> 10  33.9     4  71.1    65  4.22  1.84  19.9     1     1     4     1
#> # ... with 22 more rows