Code knitr :: opts_chunk $ set ( echo = TRUE , message= FALSE , warnings= FALSE , include = TRUE )
library ( here )
library ( glue )
library ( readr )
library ( bruceR )
library ( xtable )
library ( formattable ) # format styles of table
library ( reshape2 )
library ( tidyverse )
library ( ggtext )
library ( kableExtra )
options ( knitr.kable.NA = '' )
mycolors = c ( "#4682B4" , "#B4464B" , "#B4AF46" ,
"#1B9E77" , "#D95F02" , "#7570B3" ,
"#E7298A" , "#66A61E" , "#B4F60A" )
softcolors = c ( "#B4464B" , "#F3DCD4" , "#ECC9C7" ,
"#D9E3DA" , "#D1CFC0" , "#C2C2B4" )
mykbl <- function ( x , ... ) {
kbl ( x , digits = 2 , ... ) |> kable_styling ( bootstrap_options = c ( "striped" , "condensed" ) ) }
Code datList <- readRDS ( here :: here ( "posts/2023-07-04-data-visualization-for-survey-data/Example_Data.RDS" ) )
str ( datList )
List of 2
$ measurement: tibble [500 × 12] (S3: tbl_df/tbl/data.frame)
..$ EDEQS1 : num [1:500] 1 0 3 2 0 0 1 0 2 0 ...
..$ EDEQS2 : num [1:500] 0 0 0 1 0 0 0 0 0 0 ...
..$ EDEQS3 : num [1:500] 0 0 0 2 0 0 0 0 2 0 ...
..$ EDEQS4 : num [1:500] 0 0 1 2 0 0 0 0 1 0 ...
..$ EDEQS5 : num [1:500] 3 1 3 1 0 1 1 0 2 1 ...
..$ EDEQS6 : num [1:500] 1 1 3 2 0 1 1 0 2 1 ...
..$ EDEQS7 : num [1:500] 0 0 0 1 0 0 0 0 0 0 ...
..$ EDEQS8 : num [1:500] 0 0 2 1 0 0 1 0 1 0 ...
..$ EDEQS9 : num [1:500] 0 1 2 2 0 0 2 0 0 0 ...
..$ EDEQS10: num [1:500] 0 0 1 1 0 0 1 0 0 0 ...
..$ EDEQS11: num [1:500] 1 1 2 1 0 1 0 0 1 1 ...
..$ EDEQS12: num [1:500] 2 1 3 2 0 1 1 0 2 1 ...
$ description: tibble [500 × 6] (S3: tbl_df/tbl/data.frame)
..$ age : num [1:500] 18 19 18 19 20 19 18 19 19 20 ...
..$ gender : Factor w/ 2 levels "男","女": 2 1 2 2 2 2 2 1 1 2 ...
..$ race : Factor w/ 2 levels "汉","少数": 2 1 1 1 1 1 1 1 1 1 ...
..$ birthplace: Factor w/ 2 levels "城市","农村": 2 2 2 1 1 2 2 1 2 2 ...
..$ height : num [1:500] 160 160 160 160 163 158 170 176 183 155 ...
..$ weight : num [1:500] 110 100 140 140 90 NA 105 121 180 90 ...
After finishing data preprocessing (e.g., data cleaning, missing cases dropping), it’s always a great habit to save a list of measurement data and demographic information. The example I used here is a pseudo data exact from real data. The sample size is 500.
The measurement data contains 12 items, each ranging from 0 to 3. The demographic data contains 6 variables: age, gender, race, birthplace, height, weight. The very first thing is to visualize the characteristics of the samples to have a big picture of respondents.
Code description <- datList $ description
bruceR :: Freq ( dplyr :: select ( description , gender : birthplace ) ,
varname = "gender" )
Frequency Statistics:
────────────
N %
────────────
男 172 34.4
女 328 65.6
────────────
Total N = 500
Code
gender
男
172
34.4
gender
女
328
65.6
race
汉
449
89.8
race
少数
51
10.2
birthplace
城市
183
36.6
birthplace
农村
317
63.4
Code survey = datList $ measurement
survey <- survey |>
mutate ( ID = 1 : nrow ( survey ) ) |>
mutate ( across ( starts_with ( "EDEQS" ) , \( x ) factor ( x , levels = 0 : 3 ) ) ) |>
pivot_longer ( starts_with ( "EDEQS" ) , names_to = "items" , values_to = "values" ) |>
group_by ( items ) |>
dplyr :: count ( values ) |>
dplyr :: mutate ( perc = n / sum ( n ) * 100 )
p = ggplot ( survey ) +
geom_col ( aes ( y = factor ( items , levels = paste0 ( "EDEQS" , 1 : 12 ) ) ,
x = perc ,
fill = values ) ,
position = position_stack ( reverse = TRUE ) ) +
labs ( y = "" , x = "Proportion (%)" , title = "N and proportion of responses for items" )
p = p + geom_text ( aes ( y = factor ( items , levels = paste0 ( "EDEQS" , 1 : 12 ) ) ,
x = perc , group = items ,
label = ifelse ( n >= 50 , paste0 ( n , "(" , round ( perc , 1 ) , "%)" ) , "" ) ) ,
size = 3 , color = "white" ,
position = position_stack ( reverse = TRUE , vjust = 0.5 ) )
p = p + scale_fill_manual ( values = mycolors )
p
We can clearly identify item 7 has highest proportion of level 0, and needed to be theoretically justified.
Back to top