Aligner#
- class pyopal.Aligner#
The Opal aligner.
The
Alignerimplements an accelerated pipeline for computing pairwise alignments between a query sequence and a database of target sequences in parallel, using Single instruction, multiple data capacities of modern processors.Note
The Opal algorithm requires scoring matrices to be integer matrices, as all computations are handled with integer vectors. Only matrices where the
is_integerreturnsTruecan be given to theAlignerconstructor.- scoring_matrix#
The scoring matrix to use for scoring the alignments.
- Type:
Added in version 0.5.0.
Changed in version 0.6.0: Use the external
ScoringMatrixclass to handle scoring matrices.- __init__(scoring_matrix=None, gap_open=3, gap_extend=1)#
Create a new Aligner with the given parameters.
- Parameters:
scoring_matrix (
ScoringMatrixorstr) – The scoring matrix to use for scoring the alignments, either as aScoringMatrixobject, or as the name of a matrix to load with theScoringMatrix.from_nameclass method. The aligner will use the matrix columns to instantiate anAlphabet.gap_open (
int) – The gap opening penalty \(G\) for scoring the alignments.gap_extend (
int) – The gap extension penalty \(E\) for scoring the alignments.
Hint
A gap of length \(N\) will receive a penalty of \(E + (N - 1)G\).
- Raises:
ValueError – When the given scoring matrix is not an integer matrix.
RuntimeError – When no supported SIMD backend could be detected on the host platform.
MemoryError – When some internal buffers could not be allocated properly.
- align(query, database, *, mode='score', overflow='buckets', algorithm='sw', start=0, end=4294967295)#
Align the query sequence to all targets of the database.
- Parameters:
query (
stror byte-like object) – The sequence to query the database with.database (
BaseDatabase) – The database sequences to align the query to.
- Keyword Arguments:
mode (
str) – The search mode to use for querying the database:scoreto only report scores for each hit (default),endto report scores and end coordinates for each hit (slower),fullto report scores, coordinates and alignment for each hit (slowest).overflow (
str) – The strategy to use when a sequence score overflows in the comparison pipeline:simplecomputes scores with 8-bit range first then recomputes with 16-bit range (and then 32-bit) the sequences that overflowed;bucketsto divide the targets in buckets, and switch to larger score ranges within a bucket when the first overflow is detected.algorithm (
str) – The alignment algorithm to use:nwfor global Needleman-Wunsch alignment,hwfor semi-global alignment without penalization of gaps on query edges,ovfor semi-global alignment without penalization of gaps on query or target edges, andswfor local Smith-Waterman alignment.start (
int) – The start offset from which to start processing the database. Useful for processing only a chunk of the database without copying the sequences.end (
int) – The end offset until which to process the database. Useful for processing only a chunk of the database without copying the sequences.
- Returns:
listofpyopal.ScoreResult–- A list containing one
ScoreResultobject for each target sequence in the database. The actual type depends on the requestedmode: it will be aScoreResultfor modescore,EndResultfor modeendandFullResultfor modefull.
- Raises:
ValueError – When
sequencecontains invalid characters with respect to the alphabet of the database scoring matrix.OverflowError – When the score computed by Opal for a sequence overflows or underflows the limit values for the SIMD backend.