Skip to contents

Reads multiple GWAS files, applies QC filters, aligns alleles, and outputs a merged tab-delimited file for multivariate GWAS.

Usage

sumstats(
  files,
  ref,
  trait.names = NULL,
  se.logit,
  OLS = NULL,
  linprob = NULL,
  N = NULL,
  betas = NULL,
  info.filter = 0.6,
  maf.filter = 0.01,
  keep.indel = FALSE,
  parallel = TRUE,
  cores = NULL,
  ambig = FALSE,
  direct.filter = FALSE,
  out = "merged_sumstats.tsv"
)

Arguments

files

Character vector of GWAS file paths

ref

Path to reference panel file (e.g., w_hm3.snplist)

trait.names

Character vector of trait names

se.logit

Logical vector indicating which traits have logistic SEs

OLS

Logical vector indicating which traits are from OLS regression

linprob

Logical vector indicating which traits are linear probability

N

Numeric vector or list of sample size overrides

betas

Named list of beta column overrides per trait (default NULL = auto-detect)

info.filter

INFO score filter (default 0.6)

maf.filter

MAF filter (default 0.01)

keep.indel

Keep indels (default FALSE)

parallel

Use a parallel rayon worker pool to read the reference and GWAS files (default TRUE). Each input file is decompressed and parsed on its own worker thread. Set to FALSE to force single-threaded execution.

cores

Integer cap on the rayon pool size. When NULL (the default) rayon honours RAYON_NUM_THREADS if set, else it uses the number of logical cores reported by the OS. Since the reads are parallelized across files, values above length(files) + 1 don't help. On many-core machines (32+) or when the underlying BLAS is multithreaded, set this explicitly to avoid oversubscribing CPUs with nested BLAS threads.

ambig

Keep ambiguous SNPs (default FALSE)

direct.filter

Apply MAF filter directly to GWAS file frequencies (default FALSE)

out

Output file path for the merged sumstats TSV (default "merged_sumstats.tsv")

Value

Path to the merged output file

Examples

if (FALSE) { # \dontrun{
# Merge munged sumstats into a single SNP x trait TSV for GWAS.
sumstats(
  files = c("T1.sumstats.gz", "T2.sumstats.gz"),
  ref = "eur_w_ld_chr/",
  trait.names = c("V1", "V2"),
  out = "merged_sumstats.tsv"
)
} # }