Aligner#

class pyopal.Aligner#

The Opal aligner.

The Aligner implements an accelerated pipeline for computing pairwise alignments between a query sequence and a database of target sequences in parallel, using Single instruction, multiple data capacities of modern processors.

Note

The Opal algorithm requires scoring matrices to be integer matrices, as all computations are handled with integer vectors. Only matrices where the is_integer returns True can be given to the Aligner constructor.

scoring_matrix#

The scoring matrix to use for scoring the alignments.

Type:

ScoringMatrix

alphabet#

The alphabet for encoding sequences before alignment.

Type:

Alphabet

gap_open#

The gap opening penalty \(G\) for scoring the alignments.

Type:

int

gap_extend#

The gap extension penalty \(E\) for scoring the alignments.

Type:

int

Added in version 0.5.0.

Changed in version 0.6.0: Use the external ScoringMatrix class to handle scoring matrices.

__init__(scoring_matrix=None, gap_open=3, gap_extend=1)#

Create a new Aligner with the given parameters.

Parameters:
  • scoring_matrix (ScoringMatrix or str) – The scoring matrix to use for scoring the alignments, either as a ScoringMatrix object, or as the name of a matrix to load with the ScoringMatrix.from_name class method. The aligner will use the matrix columns to instantiate an Alphabet.

  • gap_open (int) – The gap opening penalty \(G\) for scoring the alignments.

  • gap_extend (int) – The gap extension penalty \(E\) for scoring the alignments.

Hint

A gap of length \(N\) will receive a penalty of \(E + (N - 1)G\).

Raises:
  • ValueError – When the given scoring matrix is not an integer matrix.

  • RuntimeError – When no supported SIMD backend could be detected on the host platform.

  • MemoryError – When some internal buffers could not be allocated properly.

align(query, database, *, mode='score', overflow='buckets', algorithm='sw', start=0, end=4294967295)#

Align the query sequence to all targets of the database.

Parameters:
  • query (str or byte-like object) – The sequence to query the database with.

  • database (BaseDatabase) – The database sequences to align the query to.

Keyword Arguments:
  • mode (str) – The search mode to use for querying the database: score to only report scores for each hit (default), end to report scores and end coordinates for each hit (slower), full to report scores, coordinates and alignment for each hit (slowest).

  • overflow (str) – The strategy to use when a sequence score overflows in the comparison pipeline: simple computes scores with 8-bit range first then recomputes with 16-bit range (and then 32-bit) the sequences that overflowed; buckets to divide the targets in buckets, and switch to larger score ranges within a bucket when the first overflow is detected.

  • algorithm (str) – The alignment algorithm to use: nw for global Needleman-Wunsch alignment, hw for semi-global alignment without penalization of gaps on query edges, ov for semi-global alignment without penalization of gaps on query or target edges, and sw for local Smith-Waterman alignment.

  • start (int) – The start offset from which to start processing the database. Useful for processing only a chunk of the database without copying the sequences.

  • end (int) – The end offset until which to process the database. Useful for processing only a chunk of the database without copying the sequences.

Returns:

list of pyopal.ScoreResult

A list containing one

ScoreResult object for each target sequence in the database. The actual type depends on the requested mode: it will be a ScoreResult for mode score, EndResult for mode end and FullResult for mode full.

Raises:
  • ValueError – When sequence contains invalid characters with respect to the alphabet of the database scoring matrix.

  • OverflowError – When the score computed by Opal for a sequence overflows or underflows the limit values for the SIMD backend.