Perform serpentine sorts on multiple variables.
serpentine.Rd
serpentine
sorts data in a serpentine fashion (alternating between ascending and
descending orders) for all variables specified. mixed_serpentine
sorts the data with
ascending or descending sorts for every variable specified except the last, which is serpentine
sorted.
Arguments
- data
is the data.frame to be sorted
- ...
are the variables to serpentine sort, in the given order. In
serpentine
, the first variable listed will be sorted in ascending order, the second variable will alternate between ascending and descending order by the value of the first variable, and so on. Inmixed_serpentine
, it is assumed all variables listed should be sorted in ascending order except the last, which is serpentine sorted. The user can choose a descending sort for any variable except the last by using thedesc()
wrapper.- random_num
is a random number to break ties randomly. This is most helpful when all variables on which the data is sorted are categorical as it is more liekly there are several rows of data with identical values on each category. Default is 1 so that results are reproducable.
Details
This is helpful in complex sampling designs with implicit stratification, as it reduces the variation in the stratified outcome for adjacent sampled units and thus reduces the overall sampling error. Serpentine sorts are commonly used in NCES surveys.
Examples
# All variables except first are serpentine sorted
serpentine(data = mtcars, cyl, mpg)
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
#> 2 21.5 4 120. 97 3.7 2.46 20.0 1 0 3 1
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 5 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 6 26 4 120. 91 4.43 2.14 16.7 0 1 5 2
#> 7 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1
#> 8 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2
#> 9 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
#> 10 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
#> # ... with 22 more rows
serpentine(data = mtcars, cyl, vs, mpg)
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 26 4 120. 91 4.43 2.14 16.7 0 1 5 2
#> 2 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
#> 3 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
#> 4 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
#> 5 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2
#> 6 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1
#> 7 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 8 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 21.5 4 120. 97 3.7 2.46 20.0 1 0 3 1
#> # ... with 22 more rows
# Same sort variables, but different resulting order because changing random number
serpentine(data = mtcars, vs, am)
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4
#> 2 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
#> 3 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 4 15.5 8 318 150 2.76 3.52 16.9 0 0 3 2
#> 5 19.2 8 400 175 3.08 3.84 17.0 0 0 3 2
#> 6 15.2 8 276. 180 3.07 3.78 18 0 0 3 3
#> 7 10.4 8 460 215 3 5.42 17.8 0 0 3 4
#> 8 15.2 8 304 150 3.15 3.44 17.3 0 0 3 2
#> 9 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
#> 10 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
#> # ... with 22 more rows
serpentine(data = mtcars, vs, am, random_num = 5)
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 18.7 8 360 175 3.15 3.44 17.0 0 0 3 2
#> 2 19.2 8 400 175 3.08 3.84 17.0 0 0 3 2
#> 3 10.4 8 460 215 3 5.42 17.8 0 0 3 4
#> 4 15.2 8 304 150 3.15 3.44 17.3 0 0 3 2
#> 5 13.3 8 350 245 3.73 3.84 15.4 0 0 3 4
#> 6 10.4 8 472 205 2.93 5.25 18.0 0 0 3 4
#> 7 17.3 8 276. 180 3.07 3.73 17.6 0 0 3 3
#> 8 14.7 8 440 230 3.23 5.34 17.4 0 0 3 4
#> 9 16.4 8 276. 180 3.07 4.07 17.4 0 0 3 3
#> 10 14.3 8 360 245 3.21 3.57 15.8 0 0 3 4
#> # ... with 22 more rows
# Changing the random number has minimal effect when a non-cateogrical variable is included
serpentine(data = mtcars, cyl, vs, mpg)
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 26 4 120. 91 4.43 2.14 16.7 0 1 5 2
#> 2 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
#> 3 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
#> 4 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
#> 5 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2
#> 6 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1
#> 7 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 8 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 21.5 4 120. 97 3.7 2.46 20.0 1 0 3 1
#> # ... with 22 more rows
serpentine(data = mtcars, cyl, vs, mpg, random_num = 23)
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 26 4 120. 91 4.43 2.14 16.7 0 1 5 2
#> 2 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
#> 3 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
#> 4 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2
#> 5 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
#> 6 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1
#> 7 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 8 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 21.5 4 120. 97 3.7 2.46 20.0 1 0 3 1
#> # ... with 22 more rows
# cyl, and vs are ascending sorted while mpg is serpentine sorted
mixed_serpentine(mtcars, cyl, vs, mpg)
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 26 4 120. 91 4.43 2.14 16.7 0 1 5 2
#> 2 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
#> 3 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
#> 4 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2
#> 5 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
#> 6 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1
#> 7 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 8 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 9 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 10 21.5 4 120. 97 3.7 2.46 20.0 1 0 3 1
#> # ... with 22 more rows
# cyl is ascending, vs is descending, and mpg is serpentine sorted
mixed_serpentine(mtcars, cyl, dplyr::desc(vs), mpg)
#> # A tibble: 32 x 11
#> mpg cyl disp hp drat wt qsec vs am gear carb
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 21.4 4 121 109 4.11 2.78 18.6 1 1 4 2
#> 2 21.5 4 120. 97 3.7 2.46 20.0 1 0 3 1
#> 3 22.8 4 108 93 3.85 2.32 18.6 1 1 4 1
#> 4 22.8 4 141. 95 3.92 3.15 22.9 1 0 4 2
#> 5 24.4 4 147. 62 3.69 3.19 20 1 0 4 2
#> 6 27.3 4 79 66 4.08 1.94 18.9 1 1 4 1
#> 7 30.4 4 95.1 113 3.77 1.51 16.9 1 1 5 2
#> 8 30.4 4 75.7 52 4.93 1.62 18.5 1 1 4 2
#> 9 32.4 4 78.7 66 4.08 2.2 19.5 1 1 4 1
#> 10 33.9 4 71.1 65 4.22 1.84 19.9 1 1 4 1
#> # ... with 22 more rows