Command line interface#


Step 1

hamsta preprocess \
 --rfmixfb example.fb.tsv AFR \
 --global-ancestry example.rfmix.Q \
 --out example

The command will create files for the U and S in A' = USV', where the shape of A' is (marker, sample)


Step 2

hamsta infer \
 --sumstat example.pheno.glm.linear \
 --svd example.U.npy example.S.npy \
 --out example.hamsta_result.txt

The result will be written to example.hamsta_result.txt

Single value decomposition of local ancestry data#

usage: hamsta preprocess [-h] [--rfmixfb RFMIXFB RFMIXFB] [--zarr ZARR ZARR]
                         [--nc NC NC] [--global-ancestry GLOBAL_ANCESTRY]
                         [--out OUT] [--keep KEEP] [--exclude EXCLUDE] [--k K]
                         [--version] [-v] [-vv]

Named Arguments#


Path to input local ancestry in rfmix .fb.tsv format, two args require, (filepath, ancestry)


Path to input local ancestry in zarr format storing an Xarray dataset, two args require, (filepath, ancestry)


Path to input local ancestry in netcdf format storing an Xarray dataset, two args require, (filepath, ancestry)


Path to global ancestry file in rfmix.Q format


output prefix


text file with a header #IID, followed by a list of individual to keep


bed file containing ranges to be exlcuded


Number of singular values to compute


show program’s version number and exit

-v, --verbose

set loglevel to INFO

Default: 30

-vv, --very-verbose

set loglevel to DEBUG


If Xarray dataset is used, the following structure is expected

Dimensions:           (marker: 800, sample: 3000, ploidy: 2, ancestry: 2)
  * marker            (marker) uint32 15309459 15343272 ... 50702360 50743879
  * sample            (sample) <U7 'msp1' 'msp2' 'msp3' ... 'msp2999' 'msp3000'
  * ploidy            (ploidy) int8 0 1
  * ancestry          (ancestry) <U3 'AFR' 'EUR'
Data variables:
    locanc            (marker, sample, ploidy, ancestry) float32 1.0 0.0 ... 0.0
    genetic_position  (marker) float64 0.0 0.009117 0.01411 ... 73.63 73.78 73.9

RFMIX .fb.tsv format example

#reference_panel_population:    AFR     EUR
chromosome      physical_position       genetic_position        genetic_marker_index    msp1:::hap1:::AFR       msp1:::hap1:::EUR       msp1:::hap2:::AFR       msp1:::hap2:::EUR       msp2:::hap1:::AFR       msp2:::hap1:::EUR       msp2:::hap2:::AFR       msp2:::hap2:::EUR msp3:::hap1:::AFR
1       15309459        .       0       1.0     0.0     1.0     0.0     1.0     0.0     0.0     1.0
1       15343272        .       1       1.0     0.0     1.0     0.0     1.0     0.0     0.0     1.0
1       15349660        .       2       1.0     0.0     1.0     0.0     1.0     0.0     0.0     1.0

RFMIX .rfmix.Q format example

#rfmix diploid global ancestry .Q format output
#sample AFR     EUR
msp1    0.859375        0.140625
msp10   0.88125 0.11875

Running HAMSTA#

usage: hamsta infer [-h] [--sumstat SUMSTAT] [--sumstat-chr SUMSTAT_CHR]
                    [--svd SVD SVD] [--svd-chr SVD_CHR] [--N N]
                    [--num-blocks NUM_BLOCKS] [--thres] [--out OUT]
                    [--version] [-v] [-vv]

Named Arguments#


Input filename of admixture mapping results, expect the marker order is the same as the SVD input, default column storing Z score is T_STAT


file storing list of admixture mapping results, expect each input follows the same marker order in the SVD input, default column storing Z score is T_STAT


SVD results, require 2 arguments, path to U and path to S


file storing list of SVD results, each line contains path to U and S corresponding to the same line in –sumstat-chr


Number of individuals


Number of jackknife blocks

Default: 10


whether significance threshold will be estimated

Default: False


output path

Default: <_io.TextIOWrapper name=’<stdout>’ mode=’w’ encoding=’utf-8’>


show program’s version number and exit

-v, --verbose

set loglevel to INFO

Default: 30

-vv, --very-verbose

set loglevel to DEBUG