Pfam Access Functions¶
This module defines functions for interfacing Pfam database.
-
searchPfam
(query, **kwargs)[source]¶ Returns Pfam search results in a dictionary. Matching Pfam accession as keys will map to evalue, alignment start and end residue positions.
Parameters: query can also be a PDB identifier, e.g.
'1mkp'
or'1mkpA'
with chain identifier. UniProt ID of the specified chain, or the first protein chain will be used for searching the Pfam database.
-
fetchPfamMSA
(acc, alignment='full', compressed=False, **kwargs)[source]¶ Returns a path to the downloaded Pfam MSA file.
Parameters: - acc (str) – Pfam ID or Accession Code
- alignment – alignment type, one of
'full'
(default),'seed'
,'ncbi'
,'metagenomics'
,'rp15'
,'rp35'
,'rp55'
,'rp75'
or'uniprot'
where rp stands for representative proteomes - compressed – gzip the downloaded MSA file, default is False
Alignment Options
Parameters: - format – a Pfam supported MSA file format, one of
'selex'
, (default),'stockholm'
or'fasta'
- order – ordering of sequences,
'tree'
(default) or'alphabetical'
- inserts – letter case for inserts,
'upper'
(default) or'lower'
- gaps – gap character, one of
'dashes'
(default),'dots'
,'mixed'
or None for unaligned
Other Options
Parameters: - timeout – timeout for blocking connection attempt in seconds, default is 60
- outname – out filename, default is input
'acc_alignment.format'
- folder – output folder, default is
'.'
-
parsePfamPDBs
(query, data=[], **kwargs)[source]¶ Returns a list of
AtomGroup
objects containing sections of chains that correspond to a particular PFAM domain family. These are defined by alignment start and end residue numbers.Parameters: - query (str) – UniProt ID or PDB ID If a PDB ID is provided the corresponding UniProt ID is used. If this returns multiple matches then start or end must also be provided. This query is also used for label refinement of the Pfam domain MSA.
- data (list) – If given the data list from the Pfam mapping table will be output through this argument.
- start (int) – Residue number for defining the start of the domain. The PFAM domain that starts closest to this will be selected. Default is 1
- end (int) – Residue number for defining the end of the domain. The PFAM domain that ends closest to this will be selected.