Skip to contents

This document tracks which R GenomicSEM parameters are accepted by the Rust port, and which are accepted as stubs.

It applies to all three frontends:

  • gsemr — R package (library(gsemr))
  • genomicsem — Python package (import genomicsem)
  • gsem — CLI (gsem userGWAS --...)

Parameter names match R GenomicSEM exactly in the R binding. The Python binding uses snake_case equivalents (fix.measurementfix_measurement, sample.prevsample_prev, etc.). The CLI uses --kebab-case flags with the same meanings.

For the algorithmic and numerical differences between the Rust port and R GenomicSEM — why commonfactorGWAS uses a different parameterization, why SEs differ, why optimizer behavior differs on Heywood cases — see ARCHITECTURE.md.

Behavioral notes (summary)

For the bulk of the pipeline (LDSC, S/V/I matrices, single-factor and 2-factor SEM, usermodel, userGWAS per-SNP point estimates) the Rust port produces results identical to R GenomicSEM within numerical tolerance (~1e-5 for S, ~1e-8 for V, ~1e-4 for SEM point estimates).

The known places where outputs differ from R:

  • commonfactorGWAS per-SNP signs and magnitudes do not match R’s commonfactorGWAS by default — they match R’s userGWAS. See ARCHITECTURE.md §3.3.
  • Standard errors are sandwich (robust) SEs, not lavaan’s information-matrix SEs. See ARCHITECTURE.md §3.2.
  • SEM optimizer is L-BFGS, not nlminb; converges to the same minimum on well-conditioned problems but can diverge on Heywood cases. See ARCHITECTURE.md §3.1.
  • Heywood cases are allowed by default in both packages; see ARCHITECTURE.md §3.5.

Fully implemented parameters

All core parameters for every function work identically to R GenomicSEM, including:

  • ldsc: traits, sample.prev, population.prev, ld, wld, trait.names, n.blocks, chr, stand, select, chisq.max, sep_weights, ldsc.log, parallel, cores
  • commonfactor: covstruc, estimation
  • usermodel: covstruc, estimation, model, std.lv, fix_resid, imp_cov, Q_Factor, toler, CFIcalc
  • munge: files, hm3, trait.names, N, info.filter, maf.filter, column.names, overwrite, log.name
  • sumstats: files, ref, trait.names, se.logit, OLS, linprob, N, info.filter, maf.filter, keep.indel, out, ambig, betas, direct.filter
  • userGWAS: covstruc, SNPs, model, estimation, GC, sub, SNPSE, smooth_check, std.lv, fix_measurement, Q_SNP, printwarn, TWAS, parallel, cores
  • commonfactorGWAS: covstruc, SNPs, estimation, GC, SNPSE, smooth_check, TWAS, identification, parallel, cores
  • paLDSC: covstruc, r, p, diag, save.pdf, fa, fm, nfactors, parallel, cores
  • write.model: Loadings, S_LD, cutoff, fix_resid, bifactor, mustload, common
  • rgmodel: LDSCoutput, model, std.lv, estimation, sub
  • hdl: traits, sample.prev, population.prev, LD.path, Nref, method
  • s_ldsc: traits, sample.prev, population.prev, ld, wld, frq, trait.names, n.blocks, exclude_cont, ldsc.log
  • enrich: s_baseline, s_annot, v_annot, model, params, fix, std.lv, toler, fixparam, tau, rm_flank
  • simLDSC: covmat, N, ld, rPheno, int, N_overlap
  • multiSNP: covstruc, model, beta, se, var_snp, ld_matrix, snp_names, SNPSE
  • multiGene: covstruc, model, beta, se, var_gene, ld_matrix, gene_names, GeneSE, Genelist
  • summaryGLS: covstruc, results
  • read_fusion: files, trait.names, binary, N, perm — reads raw FUSION TWAS .dat output into the merged TWAS format that multiGene/userGWAS(TWAS=TRUE) consume (CLI: gsem read-fusion)
  • subSV: subset vech(S) and the V block by 1-based vech positions, for TYPE S/S_Stand (with-diagonal) and R (off-diagonal) numbering — gsem_matrix::vech::subset_sv. (R’s matrix-input path has an undefined-RMATRIX bug; the Rust port matches the bug-free LDSC_OBJECT path.)
  • summaryGLSbands (numeric core): GLS fit + confidence-band data (predictor grid, fitted line, ±BAND_SIZE·SE envelope), with INTERCEPT/QUAD/CONTROLVARS/INTERVALSgsem::stats::gls::summary_gls_bands. The ggplot rendering is not ported (Rust draws no plots); R additionally has a band-loop bug on the Y/V_Y input path that the port does not reproduce.
  • cores/parallel: per-call thread budget for ldsc, userGWAS, commonfactorGWAS, and paLDSC. Each call builds its own local rayon pool, so concurrent calls do not share thread state and parallel=FALSE is fully scoped. Currently a no-op for munge, sumstats, and simLDSC (their underlying implementations are serial).

Option-level R-equivalence coverage

Behaviour-changing options are validated against the real R package one option at a time (fixtures in tests/fixtures/, regenerated by tests/generate_*.R):

  • ldsc: stand=TRUE (S_Stand/V_Stand), select="ODD", chisq.max, and liability-scale (sample/population.prev) each have a dedicated fixture + parity test (ldsc_stand, ldsc_select_odd, ldsc_chisqmax, ldsc_liability). stand/select are binding-layer features, tested via the testthat parity suite; the rest are tested in r_validation_ldsc.rs.
  • sumstats: all four standardization modes (OLS/linprob/se.logit/none) plus ambig=TRUE are numerically validated against R (sumstats_synth, sumstats_ambig). keep.indel and direct.filter are implemented but cannot be exercised on the synthetic data (it contains no indels, and its allele frequencies are already in-range), so they are covered by the in-crate unit tests of their predicates rather than an end-to-end R fixture.
  • userGWAS: estimation="ML", GC={conserv,none}, Q_SNP, and std.lv each have a real-package fixture (gwas_options.json). fix_measurement=FALSE is intentionally untested — R’s free-measurement fit is computationally singular on the test subset (documented in r_validation_gwas_options.rs).

Functions not ported

These R GenomicSEM functions have no Rust implementation and are not exposed by the CLI or the R/Python bindings. They are outside the drop-in surface; call the original R GenomicSEM package if you need them.

R function Status / reason
addSNPs Deprecated in R GenomicSEM (superseded by userGWAS/commonfactorGWAS).
addGenes Deprecated in R GenomicSEM (superseded by the TWAS path in userGWAS/multiGene).
indexS Tiny vech-index lookup helper; the same column-major lower-triangle indexing is available via gsem_matrix::vech. Not exposed standalone.
localSRMD Local structural-residual diagnostic; not ported.
qtrait Quantitative-trait simulation helper; not ported (also removed from current upstream GenomicSEM).

The remaining exceptions are the deprecated/diagnostic helpers above; the behaviour-bearing TWAS on-ramp read_fusion is ported (see below).

Not yet implemented

These parameters are accepted but have no effect. An informational message is printed when they are used.

MPI

  • userGWAS(MPI=TRUE) and commonfactorGWAS(MPI=TRUE) — MPI distributed computing. Not applicable to the Rust backend, which uses shared-memory parallelism via rayon. For distributed workloads, split the sumstats file and run independent gsem CLI processes on each chunk, then concatenate the TSV outputs.

This page is generated from API_COMPAT.md in the repository — edit it there.