Merge GWAS Summary Statistics — sumstats • gsemr

Reads multiple GWAS files, applies QC filters, aligns alleles, and outputs a merged tab-delimited file for multivariate GWAS.

Usage

sumstats(
  files,
  ref,
  trait.names = NULL,
  se.logit,
  OLS = NULL,
  linprob = NULL,
  N = NULL,
  betas = NULL,
  info.filter = 0.6,
  maf.filter = 0.01,
  keep.indel = FALSE,
  parallel = TRUE,
  cores = NULL,
  ambig = FALSE,
  direct.filter = FALSE,
  out = "merged_sumstats.tsv"
)

Arguments

files: Character vector of GWAS file paths
ref: Path to reference panel file (e.g., w_hm3.snplist)
trait.names: Character vector of trait names
se.logit: Logical vector indicating which traits have logistic SEs
OLS: Logical vector indicating which traits are from OLS regression
linprob: Logical vector indicating which traits are linear probability
N: Numeric vector or list of sample size overrides
betas: Named list of beta column overrides per trait (default NULL = auto-detect)
info.filter: INFO score filter (default 0.6)
maf.filter: MAF filter (default 0.01)
keep.indel: Keep indels (default FALSE)
parallel: Use a parallel rayon worker pool to read the reference and GWAS files (default TRUE). Each input file is decompressed and parsed on its own worker thread. Set to FALSE to force single-threaded execution.
cores: Integer cap on the rayon pool size. When NULL (the default) rayon honours RAYON_NUM_THREADS if set, else it uses the number of logical cores reported by the OS. Since the reads are parallelized across files, values above length(files) + 1 don't help. On many-core machines (32+) or when the underlying BLAS is multithreaded, set this explicitly to avoid oversubscribing CPUs with nested BLAS threads.
ambig: Keep ambiguous SNPs (default FALSE)
direct.filter: Apply MAF filter directly to GWAS file frequencies (default FALSE)
out: Output file path for the merged sumstats TSV (default "merged_sumstats.tsv")

Value

Path to the merged output file

Examples

if (FALSE) { # \dontrun{
# Merge munged sumstats into a single SNP x trait TSV for GWAS.
sumstats(
  files = c("T1.sumstats.gz", "T2.sumstats.gz"),
  ref = "eur_w_ld_chr/",
  trait.names = c("V1", "V2"),
  out = "merged_sumstats.tsv"
)
} # }