Running HAMSTA#
SVD#
hamsta import core, io, preprocess, utils
First, you need to compute the singular value decomposition of the local ancestry data. You can provide either a local ancestry file or a local ancestry correlation matrix (LAD) that contains the markers in your admixture mapping summary statistics.
When optional argument k
is set, truncated SVD is performed for the specified number of components. Otherwise, full SVD will be performed.
If you proceed with local ancestry, you can use the following code to read local ancestry calls from an rfmix output. The code reads the ancestry “HCB” from local ancestry file
example.fb.tsv
. The arrayA
should be of dimension (#marker, #sample) and elementA[i,j]
are the number of copies of HCB ancestry present in the i-th marker of the j-th individual. The singular values will be scaled according to the sample size.A, A_sample = io.read_rfmixfb(fname="example.fb.tsv", ancestry="HCB") U, S = preprocess.SVD(A=A, Q=None, k=None)
Optional argument
Q
(# sample, # covariates) for regressing out covariates from local acnestry.If you choose to decompose an LAD matrix, you can store the matrix in a numpy ndarray and pass it to
LAD
inpreprocess.SVD()
.U, S = preprocess.SVD(LAD=LAD, k=None)
Inference#
In the inference step, we can create a HAMSTA
object and use HAMSTA.fit()
to fit the processed statistics and singular values. Arguments Z
and N
are the admixture mapping test statistics and sample size. intercept_design
is a design matrix (number of singular values, number of distinct intercept) for assign intercepts to each rotated Z. For single intecept, we can set intercept_design = np.ones((S.shape[0], 1))
. We also provide a function utils.make_intercept_design(S.shape[0], bin_size=500)
to specify a new intercept for every 500 rotated Z for example.
ham = core.HAMSTA() ham.fit(Z=Z, U=U, S=S, N=N, jackknife=True, intercept_design=intercept_design, est_thres=True)
To combine input from multiple chromosomes, apply np.concatenate
on the list of Z
or S
and scipy.linalg.block_diag
on U
.
You can also rotate Z for each chromosome and pass them to argument rotated_Z
.
rotated_Z_list = [core.rotate(U=U_list[i], S=S_list[i], Z=Z_list[i], N=N) for i in range(22)] rotated_Z = np.concatenate(rotated_Z_list) S = np.concatenate(S_list) ham = core.HAMSTA() ham.fit(rotated_Z=rotated_Z,S=S,M=M,jackknife=True,intercept_design=intercept_design,N=N)
This output the results in a dictionary ham.result
containing h2gamma and intercepts estimates and their standard errors, followed by the log-likelihood for h2gamma=0 (h0), single intercept (h1_sintcpt) and multiple intercepts (h1_mintcpt) and the p-values for testing for h2gamma and multiple intercepts. Based on the LAD and estimated intercept, a significance threshold corresponding to family-wise error rate of 0.05 is provided.
Example result:
h2gamma [0.03280452]
h2gamam_SE [2.05591325e-12]
intercepts [0.97167965]
intercepts_SE [1.84471781e-07]
mean_intercept 0.9716796462662657
h0 -27125.71904267748
h1_sintcpt -26519.959294116295
h1_mintcpt -26505.393609470648
p_h2gamma 0.0000e+00
p_intercept 8.7520e-01
thres 0.0001467017385257119