API Reference

Functions

pyopal.align(query, database, scoring_matrix=None, *, gap_open=3, gap_extend=1, mode='score', overflow='buckets', algorithm='sw', threads=0, pool=None, ordered=False)

Align the query sequence to every database sequence in parallel.

Parameters:
  • query (str or byte-like object) – The sequence to query the database with.

  • database (iterable of str or byte-like objects) – The database sequences to align the query to.

  • scoring_matrix (ScoringMatrix or str) – The scoring matrix to use for the alignment, either as a ScoringMatrix object, or as the name of a matrix to load with the ScoringMatrix.from_name class method.

Keyword Arguments:
  • gap_open (int) – The gap opening penalty \(G\) for scoring the alignments.

  • gap_extend (int) – The gap extension penalty \(E\) for scoring the alignments.

  • mode (str) – The search mode to use for querying the database: score to only report scores for each hit (default), end to report scores and end coordinates for each hit (slower), full to report scores, coordinates and alignment for each hit (slowest).

  • overflow (str) – The strategy to use when a sequence score overflows in the comparison pipeline: simple computes scores with 8-bit range first then recomputes with 16-bit range (and then 32-bit) the sequences that overflowed; buckets to divide the targets in buckets, and switch to larger score ranges within a bucket when the first overflow is detected.

  • algorithm (str) – The alignment algorithm to use: nw for global Needleman-Wunsch alignment, hw for semi-global alignment without penalization of gaps on query edges, ov for semi-global alignment without penalization of gaps on query or target edges, and sw for local Smith-Waterman alignment.

  • threads (int) – The number of threads to use for aligning sequences in parallel. If zero is given, uses the number of CPUs reported by os.cpu_count. If one given, use the main threads for aligning, otherwise spawns a multiprocessing.pool.ThreadPool.

  • pool (multiprocessing.pool.ThreadPool) – A running pool instance to use for parallelization. Useful for reusing the same pool across several calls of align. If None given, spawns a new pool based on the threads argument.

  • ordered (bool) – Whether the results should be returned in the same order as the database sequences. Internally switches the code to use ThreadPool.imap instead of ThreadPool.imap_unordered, which can have an impact on performance.

Yields:

ScoreResult – Results for the alignment of the query to each target sequence in the database. The actual type depends on the requested mode: it will be ScoreResult for mode score, EndResult for mode end and FullResult for mode full.

Hint

Consider storing the database sequences into a Database object if you are querying the same sequences more than once to avoid the overhead added by sequence encoding.

Example

>>> targets = ["AACCGCTG", "ATGCGCT", "TTATTACG"]
>>> for res in pyopal.align("ACCTG", targets, gap_open=2, ordered=True):
...     print(res.score, targets[res.target_index])
41 AACCGCTG
31 ATGCGCT
23 TTATTACG

Added in version 0.5.0.

Classes

pyopal.Alphabet

A class for ordinal encoding of sequences.

pyopal.Database

A database of target sequences.

pyopal.Aligner

The Opal aligner.

pyopal.ScoreResult

The results of a search in score mode.

pyopal.EndResult

The results of a search in end mode.

pyopal.FullResult

The results of a search in full mode.