API Reference¶
Functions¶
- pyopal.align(query, database, scoring_matrix=None, *, gap_open=3, gap_extend=1, mode='score', overflow='buckets', algorithm='sw', threads=0, pool=None, ordered=False)¶
Align the query sequence to every database sequence in parallel.
- Parameters:
query (
str
or byte-like object) – The sequence to query the database with.database (iterable of
str
or byte-like objects) – The database sequences to align the query to.scoring_matrix (
ScoringMatrix
orstr
) – The scoring matrix to use for the alignment, either as aScoringMatrix
object, or as the name of a matrix to load with theScoringMatrix.from_name
class method.
- Keyword Arguments:
gap_open (
int
) – The gap opening penalty \(G\) for scoring the alignments.gap_extend (
int
) – The gap extension penalty \(E\) for scoring the alignments.mode (
str
) – The search mode to use for querying the database:score
to only report scores for each hit (default),end
to report scores and end coordinates for each hit (slower),full
to report scores, coordinates and alignment for each hit (slowest).overflow (
str
) – The strategy to use when a sequence score overflows in the comparison pipeline:simple
computes scores with 8-bit range first then recomputes with 16-bit range (and then 32-bit) the sequences that overflowed;buckets
to divide the targets in buckets, and switch to larger score ranges within a bucket when the first overflow is detected.algorithm (
str
) – The alignment algorithm to use:nw
for global Needleman-Wunsch alignment,hw
for semi-global alignment without penalization of gaps on query edges,ov
for semi-global alignment without penalization of gaps on query or target edges, andsw
for local Smith-Waterman alignment.threads (
int
) – The number of threads to use for aligning sequences in parallel. If zero is given, uses the number of CPUs reported byos.cpu_count
. If one given, use the main threads for aligning, otherwise spawns amultiprocessing.pool.ThreadPool
.pool (
multiprocessing.pool.ThreadPool
) – A running pool instance to use for parallelization. Useful for reusing the same pool across several calls ofalign
. IfNone
given, spawns a new pool based on thethreads
argument.ordered (
bool
) – Whether the results should be returned in the same order as the database sequences. Internally switches the code to useThreadPool.imap
instead ofThreadPool.imap_unordered
, which can have an impact on performance.
- Yields:
ScoreResult
– Results for the alignment of the query to each target sequence in the database. The actual type depends on the requestedmode
: it will beScoreResult
for modescore
,EndResult
for modeend
andFullResult
for modefull
.
Hint
Consider storing the database sequences into a
Database
object if you are querying the same sequences more than once to avoid the overhead added by sequence encoding.Example
>>> targets = ["AACCGCTG", "ATGCGCT", "TTATTACG"] >>> for res in pyopal.align("ACCTG", targets, gap_open=2, ordered=True): ... print(res.score, targets[res.target_index]) 41 AACCGCTG 31 ATGCGCT 23 TTATTACG
Added in version 0.5.0.
Classes¶
A class for ordinal encoding of sequences. |
|
A database of target sequences. |
|
The Opal aligner. |
|
The results of a search in |
|
The results of a search in |
|
The results of a search in |