distance_histogram#

RNAdist.sampling.ed_sampling.distance_histogram(fc: RNA.fold_compound, nr_samples: int = 1000, i: int = None, j: int = None, return_samples: bool = False)#

Samples structures for a sequence and returns the histogram of (all) pairwise distances.

Uses a much faster implementation if i and j are specified. Else computes all pairwise histograms

Parameters:
  • fc (RNA.fold_compound) – ViennaRNA fold compound.

  • nr_samples (int) – How many samples should be drawn

  • i (int) – only use starting index i

  • j (int) – only use target index j

  • return_samples (bool) – returns samples as dictionary containing bit compressed structures as keys and counts as values. Structures can be decompressed using bit_to_structure()

Returns:

N x N x N matrix or N matrix depending on wheter i and j are specified
Without i and j the fill matrix containins the histogram of distances from nucleotide i to j

at matrix[i][j]

Return type:

np.ndarray

dict: Dictionary containing bytes representation of structures if return samples if true

It is possible to sample expected distances using the ViennaRNA fold compound as follows. Please make sure to enable unique multiloop decomposition via uniq_ML=1.

>>> import RNA
>>> seq = "GGGCUAUUAGCUC"
>>> fc = RNA.fold_compound(seq, RNA.md(uniq_ML=1))
>>> x = distance_histogram(fc)
>>> x[0, -1]
array([  0, 867,   1, 109,   0,  14,   0,   0,   0,   0,   0,   0,   9], dtype=int32)