Command line interface#
Example#
Step 1
hamsta preprocess \
--rfmixfb example.fb.tsv AFR \
--global-ancestry example.rfmix.Q \
--out example
The command will create files for the U
and S
in A' = USV'
, where the shape of A'
is (marker, sample)
example.U.npy
example.S.npy
Step 2
hamsta infer \
--sumstat example.pheno.glm.linear \
--svd example.U.npy example.S.npy \
--out example.hamsta_result.txt
The result will be written to example.hamsta_result.txt
Single value decomposition of local ancestry data#
usage: hamsta preprocess [-h] [--rfmixfb RFMIXFB RFMIXFB] [--zarr ZARR ZARR]
[--nc NC NC] [--global-ancestry GLOBAL_ANCESTRY]
[--out OUT] [--keep KEEP] [--exclude EXCLUDE] [--k K]
[--version] [-v] [-vv]
Named Arguments#
- --rfmixfb
Path to input local ancestry in rfmix .fb.tsv format, two args require, (filepath, ancestry)
- --zarr
Path to input local ancestry in zarr format storing an Xarray dataset, two args require, (filepath, ancestry)
- --nc
Path to input local ancestry in netcdf format storing an Xarray dataset, two args require, (filepath, ancestry)
- --global-ancestry
Path to global ancestry file in rfmix.Q format
- --out
output prefix
- --keep
text file with a header #IID, followed by a list of individual to keep
- --exclude
bed file containing ranges to be exlcuded
- --k
Number of singular values to compute
- --version
show program’s version number and exit
- -v, --verbose
set loglevel to INFO
Default: 30
- -vv, --very-verbose
set loglevel to DEBUG
Note
If Xarray dataset is used, the following structure is expected
<xarray.Dataset>
Dimensions: (marker: 800, sample: 3000, ploidy: 2, ancestry: 2)
Coordinates:
* marker (marker) uint32 15309459 15343272 ... 50702360 50743879
* sample (sample) <U7 'msp1' 'msp2' 'msp3' ... 'msp2999' 'msp3000'
* ploidy (ploidy) int8 0 1
* ancestry (ancestry) <U3 'AFR' 'EUR'
Data variables:
locanc (marker, sample, ploidy, ancestry) float32 1.0 0.0 ... 0.0
genetic_position (marker) float64 0.0 0.009117 0.01411 ... 73.63 73.78 73.9
RFMIX .fb.tsv format example
#reference_panel_population: AFR EUR
chromosome physical_position genetic_position genetic_marker_index msp1:::hap1:::AFR msp1:::hap1:::EUR msp1:::hap2:::AFR msp1:::hap2:::EUR msp2:::hap1:::AFR msp2:::hap1:::EUR msp2:::hap2:::AFR msp2:::hap2:::EUR msp3:::hap1:::AFR
1 15309459 . 0 1.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0
1 15343272 . 1 1.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0
1 15349660 . 2 1.0 0.0 1.0 0.0 1.0 0.0 0.0 1.0
RFMIX .rfmix.Q format example
#rfmix diploid global ancestry .Q format output
#sample AFR EUR
msp1 0.859375 0.140625
msp10 0.88125 0.11875
Running HAMSTA#
usage: hamsta infer [-h] [--sumstat SUMSTAT] [--sumstat-chr SUMSTAT_CHR]
[--svd SVD SVD] [--svd-chr SVD_CHR] [--N N]
[--num-blocks NUM_BLOCKS] [--thres] [--out OUT]
[--version] [-v] [-vv]
Named Arguments#
- --sumstat
Input filename of admixture mapping results, expect the marker order is the same as the SVD input, default column storing Z score is T_STAT
- --sumstat-chr
file storing list of admixture mapping results, expect each input follows the same marker order in the SVD input, default column storing Z score is T_STAT
- --svd
SVD results, require 2 arguments, path to U and path to S
- --svd-chr
file storing list of SVD results, each line contains path to U and S corresponding to the same line in –sumstat-chr
- --N
Number of individuals
- --num-blocks
Number of jackknife blocks
Default: 10
- --thres
whether significance threshold will be estimated
Default: False
- --out
output path
Default: <_io.TextIOWrapper name=’<stdout>’ mode=’w’ encoding=’utf-8’>
- --version
show program’s version number and exit
- -v, --verbose
set loglevel to INFO
Default: 30
- -vv, --very-verbose
set loglevel to DEBUG