Command line interface#

Example#

Step 1

hamsta preprocess \
 --rfmixfb example.fb.tsv AFR \
 --global-ancestry example.rfmix.Q \
 --out example

The command will create files for the U and S in A' = USV', where the shape of A' is (marker, sample)

example.U.npy
example.S.npy

Step 2

hamsta infer \
 --sumstat example.pheno.glm.linear \
 --svd example.U.npy example.S.npy \
 --out example.hamsta_result.txt

The result will be written to example.hamsta_result.txt

Single value decomposition of local ancestry data#

usage: hamsta preprocess [-h] [--rfmixfb RFMIXFB RFMIXFB] [--zarr ZARR ZARR]
                         [--nc NC NC] [--global-ancestry GLOBAL_ANCESTRY]
                         [--out OUT] [--keep KEEP] [--exclude EXCLUDE] [--k K]
                         [--version] [-v] [-vv]

Named Arguments#

--rfmixfb

Path to input local ancestry in rfmix .fb.tsv format, two args require, (filepath, ancestry)

--zarr

Path to input local ancestry in zarr format storing an Xarray dataset, two args require, (filepath, ancestry)

--nc

Path to input local ancestry in netcdf format storing an Xarray dataset, two args require, (filepath, ancestry)

--global-ancestry

Path to global ancestry file in rfmix.Q format

--out

output prefix

--keep

text file with a header #IID, followed by a list of individual to keep

--exclude

bed file containing ranges to be exlcuded

--k

Number of singular values to compute

--version

show program’s version number and exit

-v, --verbose

set loglevel to INFO

Default: 30

-vv, --very-verbose

set loglevel to DEBUG

Note

If Xarray dataset is used, the following structure is expected

<xarray.Dataset>
Dimensions:           (marker: 800, sample: 3000, ploidy: 2, ancestry: 2)
Coordinates:
  * marker            (marker) uint32 15309459 15343272 ... 50702360 50743879
  * sample            (sample) <U7 'msp1' 'msp2' 'msp3' ... 'msp2999' 'msp3000'
  * ploidy            (ploidy) int8 0 1
  * ancestry          (ancestry) <U3 'AFR' 'EUR'
Data variables:
    locanc            (marker, sample, ploidy, ancestry) float32 1.0 0.0 ... 0.0
    genetic_position  (marker) float64 0.0 0.009117 0.01411 ... 73.63 73.78 73.9

RFMIX .fb.tsv format example

#reference_panel_population:    AFR     EUR
chromosome      physical_position       genetic_position        genetic_marker_index    msp1:::hap1:::AFR       msp1:::hap1:::EUR       msp1:::hap2:::AFR       msp1:::hap2:::EUR       msp2:::hap1:::AFR       msp2:::hap1:::EUR       msp2:::hap2:::AFR       msp2:::hap2:::EUR msp3:::hap1:::AFR
1       15309459        .       0       1.0     0.0     1.0     0.0     1.0     0.0     0.0     1.0
1       15343272        .       1       1.0     0.0     1.0     0.0     1.0     0.0     0.0     1.0
1       15349660        .       2       1.0     0.0     1.0     0.0     1.0     0.0     0.0     1.0

RFMIX .rfmix.Q format example

#rfmix diploid global ancestry .Q format output
#sample AFR     EUR
msp1    0.859375        0.140625
msp10   0.88125 0.11875

Running HAMSTA#

usage: hamsta infer [-h] [--sumstat SUMSTAT] [--sumstat-chr SUMSTAT_CHR]
                    [--svd SVD SVD] [--svd-chr SVD_CHR] [--N N]
                    [--num-blocks NUM_BLOCKS] [--thres] [--out OUT]
                    [--version] [-v] [-vv]

Named Arguments#

--sumstat

Input filename of admixture mapping results, expect the marker order is the same as the SVD input, default column storing Z score is T_STAT

--sumstat-chr

file storing list of admixture mapping results, expect each input follows the same marker order in the SVD input, default column storing Z score is T_STAT

--svd

SVD results, require 2 arguments, path to U and path to S

--svd-chr

file storing list of SVD results, each line contains path to U and S corresponding to the same line in –sumstat-chr

--N

Number of individuals

--num-blocks

Number of jackknife blocks

Default: 10

--thres

whether significance threshold will be estimated

Default: False

--out

output path

Default: <_io.TextIOWrapper name=’<stdout>’ mode=’w’ encoding=’utf-8’>

--version

show program’s version number and exit

-v, --verbose

set loglevel to INFO

Default: 30

-vv, --very-verbose

set loglevel to DEBUG