hamsta.io.read_zarr#

hamsta.io.read_zarr(fname, ancestry, exclude=None)#

Reader for xarray stored in .zarr

Read a xarray.Dataset with data locanc in (marker, sample, ploidy, ancestry), example:

<xarray.Dataset>
Dimensions:   (marker: 8, sample: 39, ploidy: 2, ancestry: 2)
Coordinates:
* ancestry  (ancestry) <U3 'HCB' 'JPT'
* marker    (marker) uint32 1 6 12 20 25 31 36 43
* ploidy    (ploidy) int8 0 1
* sample    (sample) <U6 'HCB182' 'HCB190' 'HCB191' ... 'JPT266' 'JPT267'
Data variables:
    locanc    (marker, sample, ploidy, ancestry) float32 1.0 0.0 1.0 ... 0.0 1.0
Parameters
  • fname (str) – Path to zarr file storing local ancestry in xarray

  • ancestry (str) – The ancestry to be extracted

Return type

Tuple[Array, DataFrame]

Returns

a local ancestry matrix (marker, sample) and list of sample

Example

>>> from hamsta import io
>>> A, A_sample = io.read_zarr("tests/testdata/example.zarr", "HCB")
>>> A[:5, :5]
DeviceArray([[2.   , 2.   , 2.   , 1.969, 1.   ],
             [2.   , 2.   , 2.   , 1.969, 1.   ],
             [2.   , 2.   , 2.   , 1.969, 1.   ],
             [2.   , 2.   , 2.   , 1.969, 1.   ],
             [2.   , 2.   , 2.   , 1.969, 1.   ]], dtype=float32)
>>> A_sample.head(5)
   sample
0  HCB182
1  HCB190
2  HCB191
3  HCB193
4  HCB194