Database

BaseDatabase

class pyopal.BaseDatabase

The base class for views of database sequences.

To allow reusing the rest of the code, this class can be inherited from a Cython extension and used with Aligner.align. Child classes only need to implement methods to obtain the size of the database, the lengths of the sequences, and the sequences data. Use Database for a basic implementation using C++ shared pointers to store the sequences.

alphabet

The alphabet object used for encoding the sequences stored in the sequence database.

Type:

Alphabet

lock

A read-write lock to synchronize the accesses to the database.

Type:

SharedMutex

Added in version 0.5.0.

size_t get_size(self)

Return the number of elements in the database.

digit_t **get_sequences(self)

Return a pointer to an array of sequence pointers.

int *get_lengths(self)

Return a pointer to an array of lengths.

__getitem__(key, /)

Return self[key].

__init__(*args, **kwargs)
__len__()

Return len(self).

lengths

The length of each sequence in the database.

Type:

list of int

total_length

The total length of the database.

Type:

int

Database

class pyopal.Database(BaseDatabase)

A database of target sequences.

Like many biological sequence analysis tools, Opal encodes sequences with an alphabet for faster indexing of matrices. Sequences inserted in a database are stored in encoded format using the alphabet given on instantiation.

__delitem__(key, /)

Delete self[key].

__getitem__(key, /)

Return self[key].

__init__(*args, **kwargs)
__setitem__(key, value, /)

Set self[key] to value.

append(sequence)

Append a single sequence at the end of the database.

Parameters:

sequence (str or byte-like object) – The new sequence.

Hint

When inserting several sequences in the database, consider using the Database.extend method instead so that the internal buffers can reserve space just once for every new sequence.

Example

>>> db = pyopal.Database(["ATGC", "TTCA"])
>>> db.append("AAAA")
>>> list(db)
['ATGC', 'TTCA', 'AAAA']
clear()

Remove all sequences from the database.

extend(sequences)

Extend the database by adding sequences from an iterable.

Example

>>> db = pyopal.Database(["ATGC"])
>>> db.extend(["TTCA", "AAAA", "GGTG"])
>>> list(db)
['ATGC', 'TTCA', 'AAAA', 'GGTG']
extract(indices)

Extract a subset of the database using the given indices.

Parameters:

indices (collections.abc.Sequence of int) – A sequence of int objects to use to index the database.

Raises:

IndexError – When indices contains an invalid index.

Example

>>> db = pyopal.Database(['AAAA', 'CCCC', 'KKKK', 'FFFF'])
>>> list(db.extract([2, 0]))
['KKKK', 'AAAA']

Caution

Negative indexing is not supported.

Added in version 0.3.0.

insert(index, sequence)

Insert a sequence in the database at a given position.

Parameters:
  • index (int) – The index where to insert the new sequence.

  • sequence (str or byte-like object) – The new sequence.

Note

If the insertion index is out of bounds, the insertion will happen at either end of the database:

>>> db = pyopal.Database(["ATGC", "TTGC", "CTGC"])
>>> db.insert(-100, "TTTT")
>>> db.insert(100, "AAAA")
>>> list(db)
['TTTT', 'ATGC', 'TTGC', 'CTGC', 'AAAA']
mask(bitmask)

Extract a subset of the database where the bitmask is True.

Parameters:

bitmask (iterable of bool) – A sequence of bool objects with the same length as the database.

Raises:

IndexError – When the bitmask has a different dimension.

Example

>>> db = pyopal.Database(['AAAA', 'CCCC', 'KKKK', 'FFFF'])
>>> list(db.mask([True, False, False, True]))
['AAAA', 'FFFF']

Added in version 0.3.0.

reverse()

Reverse the database, in place.

Example

>>> db = pyopal.Database(['ATGC', 'TTGC', 'CTGC'])
>>> db.reverse()
>>> list(db)
['CTGC', 'TTGC', 'ATGC']