Database#
BaseDatabase#
- class pyopal.BaseDatabase#
The base class for views of database sequences.
To allow reusing the rest of the code, this class can be inherited from a Cython extension and used with
Aligner.align. Child classes only need to implement methods to obtain the size of the database, the lengths of the sequences, and the sequences data. UseDatabasefor a basic implementation using C++ shared pointers to store the sequences.- alphabet#
The alphabet object used for encoding the sequences stored in the sequence database.
- Type:
- lock#
A read-write lock to synchronize the accesses to the database.
- Type:
SharedMutex
Added in version 0.5.0.
-
size_t get_size(self)#
Return the number of elements in the database.
-
digit_t **get_sequences(self)#
Return a pointer to an array of sequence pointers.
-
int *get_lengths(self)#
Return a pointer to an array of lengths.
- __getitem__(key, /)#
Return self[key].
- __init__(sequences=(), alphabet=None)#
Create a new base database with the given sequences.
- __len__()#
Return len(self).
Database#
- class pyopal.Database(BaseDatabase)#
A database of target sequences.
Like many biological sequence analysis tools, Opal encodes sequences with an alphabet for faster indexing of matrices. Sequences inserted in a database are stored in encoded format using the alphabet given on instantiation.
- __delitem__(key, /)#
Delete self[key].
- __getitem__(key, /)#
Return self[key].
- __init__(sequences=(), alphabet=None)#
Create a new database with the given sequences.
- __setitem__(key, value, /)#
Set self[key] to value.
- append(sequence)#
Append a single sequence at the end of the database.
- Parameters:
sequence (
stror byte-like object) – The new sequence.
Hint
When inserting several sequences in the database, consider using the
Database.extendmethod instead so that the internal buffers can reserve space just once for every new sequence.Example
>>> db = pyopal.Database(["ATGC", "TTCA"]) >>> db.append("AAAA") >>> list(db) ['ATGC', 'TTCA', 'AAAA']
- clear()#
Remove all sequences from the database.
- extend(sequences)#
Extend the database by adding sequences from an iterable.
Example
>>> db = pyopal.Database(["ATGC"]) >>> db.extend(["TTCA", "AAAA", "GGTG"]) >>> list(db) ['ATGC', 'TTCA', 'AAAA', 'GGTG']
- extract(indices)#
Extract a subset of the database using the given indices.
- Parameters:
indices (
collections.abc.Sequenceofint) – A sequence ofintobjects to use to index the database.- Raises:
IndexError – When
indicescontains an invalid index.
Example
>>> db = pyopal.Database(['AAAA', 'CCCC', 'KKKK', 'FFFF']) >>> list(db.extract([2, 0])) ['KKKK', 'AAAA']
Caution
Negative indexing is not supported.
Added in version 0.3.0.
- insert(index, sequence)#
Insert a sequence in the database at a given position.
- Parameters:
Note
If the insertion index is out of bounds, the insertion will happen at either end of the database:
>>> db = pyopal.Database(["ATGC", "TTGC", "CTGC"]) >>> db.insert(-100, "TTTT") >>> db.insert(100, "AAAA") >>> list(db) ['TTTT', 'ATGC', 'TTGC', 'CTGC', 'AAAA']
- mask(bitmask)#
Extract a subset of the database where the bitmask is
True.- Parameters:
bitmask (iterable of
bool) – A sequence ofboolobjects with the same length as the database.- Raises:
IndexError – When the bitmask has a different dimension.
Example
>>> db = pyopal.Database(['AAAA', 'CCCC', 'KKKK', 'FFFF']) >>> list(db.mask([True, False, False, True])) ['AAAA', 'FFFF']
Added in version 0.3.0.
- reverse()#
Reverse the database, in place.
Example
>>> db = pyopal.Database(['ATGC', 'TTGC', 'CTGC']) >>> db.reverse() >>> list(db) ['CTGC', 'TTGC', 'ATGC']