How to use Julia in Quarto

Julia
Quarto
Author

Jihong Zhang

Published

March 15, 2024

1 Previous posts

This post illustrates how to use Julia to create a gradient descent algorithm. What has not been introduced, however, is how to perform the data analysis using Julia in Quarto. This post will illustrate the workflow step by step.

2 Initial Setup

First of all, refer to this Quarto.org, JuliaHub, and Patrick Altmeyer’s post. The first step is to install following components:

  1. IJulia
  2. Revise.jl
  3. Jupyter Cache
Terminal
using Pkg
Pkg.add("IJulia")
Pkg.add("Revise")
using Conda
Conda.add("jupyter-cache")

Second, when you create the new quarto document, make sure the yaml header contains the jupyter item. For example, the yaml of this post is:

title: 'How to use Julia in Quarto'
author: 'Jihong Zhang'
date: 'Mar 10 2024'
categories:
  - Julia
  - Quarto
format: 
  html: 
    code-summary: 'Code'
    code-fold: false
    code-line-numbers: false
jupyter: julia-1.6

After the installation, you should be able to run the julia code in quarto like:

print("Hello World!")
Hello World!

3 Import dataset

# import packages
using DataFrames
using CSV
# load in the diamonds.csv
diamonds = DataFrame(CSV.File("diamonds.csv"))
first(diamonds, 7)
7×10 DataFrame
Row carat cut color clarity depth table price x y z
Float64 String15 String1 String7 Float64 Float64 Int64 Float64 Float64 Float64
1 0.23 Ideal E SI2 61.5 55.0 326 3.95 3.98 2.43
2 0.21 Premium E SI1 59.8 61.0 326 3.89 3.84 2.31
3 0.23 Good E VS1 56.9 65.0 327 4.05 4.07 2.31
4 0.29 Premium I VS2 62.4 58.0 334 4.2 4.23 2.63
5 0.31 Good J SI2 63.3 58.0 335 4.34 4.35 2.75
6 0.24 Very Good J VVS2 62.8 57.0 336 3.94 3.96 2.48
7 0.24 Very Good I VVS1 62.3 57.0 336 3.95 3.98 2.47

4 Basic Statistical Modeling

Following the previous post, we can easily model a generalized linear regression using GLM module:

using GLM
lm_fit = lm(@formula(price ~ depth), diamonds)
StatsModels.TableRegressionModel{LinearModel{GLM.LmResp{Vector{Float64}}, GLM.DensePredChol{Float64, LinearAlgebra.CholeskyPivoted{Float64, Matrix{Float64}, Vector{Int64}}}}, Matrix{Float64}}

price ~ 1 + depth

Coefficients:
────────────────────────────────────────────────────────────────────────
               Coef.  Std. Error      t  Pr(>|t|)  Lower 95%   Upper 95%
────────────────────────────────────────────────────────────────────────
(Intercept)  5763.67    740.556    7.78    <1e-14  4312.17    7215.16
depth         -29.65     11.9897  -2.47    0.0134   -53.1499    -6.15005
────────────────────────────────────────────────────────────────────────

Let’s do some more advanced measurement - Factor analysis:

using MultivariateStats
# only sample first 300 cases and four variables
Xtr = diamonds[1:300 , [:x, :y, :z]]
# with each observation in a column
Xtr = Matrix(Xtr)' # somehow the data matrix has size of (d, n), which is the trasponse of data matrix in R 
# train a one-factor model
M = fit(FactorAnalysis, Xtr; maxoutdim=1, method=:em)
Factor Analysis(indim = 3, outdim = 1)

You can refer to this doc for more details for parameter estimation of factor analysis

loadings(M)
3×1 Matrix{Float64}:
 0.8294175777737991
 0.8157441937710099
 0.5052202721703213

Let’s quickly compare the results of lavaan

library(ggplot2)
library(lavaan)
data('diamonds')
X = diamonds[1:300, c('x', 'y', 'z')]
fa_model = "
F1 =~ x + y + z
"
fit = cfa(fa_model, data = X, std.lv = TRUE)
coef(fit)[1:3] # factor loading
    F1=~x     F1=~y     F1=~z 
0.7802245 0.7673664 0.4752576