API Reference#
Functions#
- pyopal.align(query, database, scoring_matrix=None, *, gap_open=3, gap_extend=1, mode='score', overflow='buckets', algorithm='sw', threads=0, pool=None, ordered=False)#
Align the query sequence to every database sequence in parallel.
- Parameters:
query (
stror byte-like object) – The sequence to query the database with.database (iterable of
stror byte-like objects) – The database sequences to align the query to.scoring_matrix (
ScoringMatrixorstr) – The scoring matrix to use for the alignment, either as aScoringMatrixobject, or as the name of a matrix to load with theScoringMatrix.from_nameclass method.
- Keyword Arguments:
gap_open (
int) – The gap opening penalty \(G\) for scoring the alignments.gap_extend (
int) – The gap extension penalty \(E\) for scoring the alignments.mode (
str) – The search mode to use for querying the database:scoreto only report scores for each hit (default),endto report scores and end coordinates for each hit (slower),fullto report scores, coordinates and alignment for each hit (slowest).overflow (
str) – The strategy to use when a sequence score overflows in the comparison pipeline:simplecomputes scores with 8-bit range first then recomputes with 16-bit range (and then 32-bit) the sequences that overflowed;bucketsto divide the targets in buckets, and switch to larger score ranges within a bucket when the first overflow is detected.algorithm (
str) – The alignment algorithm to use:nwfor global Needleman-Wunsch alignment,hwfor semi-global alignment without penalization of gaps on query edges,ovfor semi-global alignment without penalization of gaps on query or target edges, andswfor local Smith-Waterman alignment.threads (
int) – The number of threads to use for aligning sequences in parallel. If zero is given, uses the number of CPUs reported byos.cpu_count. If one given, use the main threads for aligning, otherwise spawns amultiprocessing.pool.ThreadPool.pool (
multiprocessing.pool.ThreadPool) – A running pool instance to use for parallelization. Useful for reusing the same pool across several calls ofalign. IfNonegiven, spawns a new pool based on thethreadsargument.ordered (
bool) – Whether the results should be returned in the same order as the database sequences. Internally switches the code to useThreadPool.imapinstead ofThreadPool.imap_unordered, which can have an impact on performance.
- Yields:
ScoreResult– Results for the alignment of the query to each target sequence in the database. The actual type depends on the requestedmode: it will beScoreResultfor modescore,EndResultfor modeendandFullResultfor modefull.
Hint
Consider storing the database sequences into a
Databaseobject if you are querying the same sequences more than once to avoid the overhead added by sequence encoding.Example
>>> targets = ["AACCGCTG", "ATGCGCT", "TTATTACG"] >>> for res in pyopal.align("ACCTG", targets, gap_open=2, ordered=True): ... print(res.score, targets[res.target_index]) 41 AACCGCTG 31 ATGCGCT 23 TTATTACG
Added in version 0.5.0.
Classes#
A class for ordinal encoding of sequences. |
|
A database of target sequences. |
|
The Opal aligner. |
|
The results of a search in |
|
The results of a search in |
|
The results of a search in |