Atom Selections

Release Notes

v2.0 series come with new and improved sequence, structure, and dynamics analysis features. See release notes for details.

How to Cite

Bakan A, Meireles LM, Bahar I ProDy: Protein Dynamics Inferred from Theory and Experiments
Bioinformatics 2011 27(11):1575-1577.

Bakan A, Dutta A, Mao W, Liu Y, Chennubhotla C, Lezon TR, Bahar I Evol and ProDy for Bridging Protein Sequence Evolution and Structural Dynamics
Bioinformatics 2014 30(18):2681-2683.

Atom Selections¶

This module defines a class for selecting subsets of atoms. You can read this page in interactive sessions using help(select).

ProDy offers a fast and powerful atom selection class, Select. Selection features, grammar, and keywords are similar to those of VMD. Small differences, that is described below, should not affect most practical uses of atom selections. With added flexibility of Python, ProDy selection engine can also be used to identify intermolecular contacts. You may see this and other usage examples in Intermolecular Contacts and Operations on Selections.

First, we import everything from ProDy and parse a protein-DNA-ligand complex structure:

In [1]: from prody import *

In [2]: p = parsePDB('3mht')

parsePDB() returns AtomGroup instances, p in this case, that stores all atomic data in the file. We can count different types of atoms using Atom Flags and numAtoms() method as follows:

In [3]: p.numAtoms('protein')
Out[3]: 2606

In [4]: p.numAtoms('nucleic')
Out[4]: 509

In [5]: p.numAtoms('hetero')
Out[5]: 96

In [6]: p.numAtoms('water')
Out[6]: 70

Last two counts suggest that ligand has 26 atoms, i.e. number of hetero atoms less the number of water atoms.

Atom flags¶

We select subset of atoms by using AtomGroup.select() method. All Atom Flags can be input arguments to this methods as follows:

In [7]: p.select('protein')
Out[7]: <Selection: 'protein' from 3mht (2606 atoms)>

In [8]: p.select('water')
Out[8]: <Selection: 'water' from 3mht (70 atoms)>

This operation returns Selection instances, which can be an input to functions that accepts an atoms argument.

Logical operators¶

Flags can be combined using 'and' and 'or' operators:

In [9]: p.select('protein and water')

'protein and water' did not result in selection of protein and water atoms. This is because, no atom is flagged as a protein and a water atom at the same time.

Note

Interpreting selection strings

You may think as if a selection string, such as 'protein and water', is evaluated on a per atom basis and an atom is selected if it satisfies the given criterion. To select both water and protein atoms, 'or' logical operator should be used instead. A protein or a water atom would satisfy 'protein or water' criterion.

In [10]: p.select('protein or water')
Out[10]: <Selection: 'protein or water' from 3mht (2676 atoms)>

We can also use 'not' operator to negate an atom flag. For example, the following selection will only select ligand atoms:

In [11]: p.select('not water and hetero')
Out[11]: <Selection: 'not water and hetero' from 3mht (26 atoms)>

If you omit the 'and' operator, you will get the same result:

In [12]: p.select('not water hetero')
Out[12]: <Selection: 'not water hetero' from 3mht (26 atoms)>

Note

Default operator between two flags, or other selection tokens that will be discussed later, is 'and'. For example, 'not water hetero' is equivalent to 'not water and hetero'.

We can select Cα atoms of acidic residues by omitting the default logical operator as follows:

In [13]: sel = p.select('acidic calpha')

In [14]: sel
Out[14]: <Selection: 'acidic calpha' from 3mht (39 atoms)>

In [15]: set(sel.getResnames())
Out[15]: {'ASP', 'GLU'}

Quick selections¶

For simple selections, such as shown above, following may be preferable over the select() method:

In [16]: p.acidic_calpha
Out[16]: <Selection: 'acidic calpha' from 3mht (39 atoms)>

The result is the same as using p.select('acidic calpha'). Underscore, _, is considered as a whitespace. The limitation of this approach is that special characters cannot be used.

Atom data fields¶

In addition to Atom Flags, Atom Data Fields can be used in atom selections when combined with some values. For example, we can select Cα and Cβ atoms of alanine residues as follows:

In [17]: p.select('resname ALA name CA CB')
Out[17]: <Selection: 'resname ALA name CA CB' from 3mht (32 atoms)>

Note that we omitted the default 'and' operator.

Note

Whitespace or empty string can be specified using an '_'. Atoms with string data fields empty, such as those with no a chain identifiers or alternate location identifiers, can be selected using an underscore.

In [18]: p.select('chain _')  # chain identifiers of all atoms are specified in 3mht

In [19]: p.select('altloc _')  # altloc identifiers for all atoms are empty
Out[19]: <Selection: 'altloc _' from 3mht (3211 atoms)>

Numeric data fields can also be used to make selections:

In [20]: p.select('ca resnum 1 2 3 4')
Out[20]: <Selection: 'ca resnum 1 2 3 4' from 3mht (4 atoms)>

A special case for residues is having insertion codes. Residue numbers and insertion codes can be specified together as follows:

'resnum 5' selects residue 5 (all insertion codes)

'resnum 5A' selects residue 5 with insertion code A

'resnum 5_' selects residue 5 with no insertion code

Number ranges¶

A range of numbers using 'to' or Python style slicing with ':':

In [21]: p.select('ca resnum 1to4')
Out[21]: <Selection: 'ca resnum 1to4' from 3mht (4 atoms)>

In [22]: p.select('ca resnum 1:4')
Out[22]: <Selection: 'ca resnum 1:4' from 3mht (3 atoms)>

In [23]: p.select('ca resnum 1:4:2')
Out[23]: <Selection: 'ca resnum 1:4:2' from 3mht (2 atoms)>

Note

Number ranges specify continuous intervals:

'to' is all inclusive, e.g. 'resnum 1 to 4' means '1 <= resnum <= 4'

':' is left inclusive, e.g. 'resnum 1:4' means '1 <= resnum < 4'

Consecutive use of ':', however, specifies a discrete range of numbers, e.g. 'resnum 1:4:2' means 'resnum 1 3'

Special characters¶

Following characters can be specified when using Atom Data Fields for atom selections:

abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789
~@#$.:;_',

For example, "name C' N` O~ C$ C#" is a valid selection string.

Note

Special characters (~!@#$%^&*()-_=+[{}]\|;:,<>./?()'") must be escaped using grave accent characters (``).

Negative numbers¶

Negative numbers and number ranges must also be escaped using grave accent characters, since negative sign '-' is considered a special character unless it indicates subtraction operation (see below).

In [24]: p.select('x `-25 to 25`')
Out[24]: <Selection: 'x `-25 to 25`' from 3mht (1941 atoms)>

In [25]: p.select('x `-22.542`')
Out[25]: <Selection: 'x `-22.542`' from 3mht (1 atoms)>

Omitting the grave accent character will cause a SelectionError.

Regular expressions¶

Finally, you can specify regular expressions to select atoms based on data fields with type string. Following will select residues whose names start with capital letter A

In [26]: sel = p.select('resname "A.*"')

In [27]: set(sel.getResnames())
Out[27]: {'ALA', 'ARG', 'ASN', 'ASP'}

Note

Regular expressions can be specified using double quotes, "...". For more information on regular expressions see re.

Numerical comparisons¶

Atom Data Fields with numeric types can be used as operands in numerical comparisons:

In [28]: p.select('x < 0')
Out[28]: <Selection: 'x < 0' from 3mht (3095 atoms)>

In [29]: p.select('occupancy = 1')
Out[29]: <Selection: 'occupancy = 1' from 3mht (3211 atoms)>

Comparison	Description
<	less than
>	greater than
<=	less than or equal
>=	greater than or equal
==	equal
=	equal
!=	not equal

It is also possible to chain comparison statements as follows:

In [30]: p.select('-10 <= x < 0')
Out[30]: <Selection: '-10 <= x < 0' from 3mht (557 atoms)>

This would be the same as the following selection:

In [31]: p.select('-10 <= x and x < 0') == p.select('-10 <= x < 0')
Out[31]: True

Furthermore, numerical comparisons may involve the following operations:

Operation	Description
x ** y	x to the power y
x ^ y	x to the power y
x * y	x times y
x / y	x divided by y
x // y	x divided by y (floor division)
x % y	x modulo y
x + y	x plus y
x - y	x minus y

These operations must be used with a numerical comparison, e.g.

In [32]: p.select('x ** 2 < 10')
Out[32]: <Selection: 'x ** 2 < 10' from 3mht (238 atoms)>

In [33]: p.select('x ** 2 ** 2 < 10')
Out[33]: <Selection: 'x ** 2 ** 2 < 10' from 3mht (134 atoms)>

Finally, following functions can be used in numerical comparisons:

Function	Description
abs(x)	absolute value of x
acos(x)	arccos of x
asin(x)	arcsin of x
atan(x)	arctan of x
ceil(x)	smallest integer not less than x
cos(x)	cosine of x
cosh(x)	hyperbolic cosine of x
floor(x)	largest integer not greater than x
exp(x)	e to the power x
log(x)	natural logarithm of x
log10(x)	base 10 logarithm of x
sin(x)	sine of x
sinh(x)	hyperbolic sine of x
sq(x)	square of x
sqrt(x)	square-root of x
tan(x)	tangent of x
tanh(x)	hyperbolic tangent of x

In [34]: p.select('sqrt(sq(x) + sq(y) + sq(z)) < 100')  # within 100 A of origin
Out[34]: <Selection: 'sqrt(sq(x) + sq(y) + sq(z)) < 100' from 3mht (1975 atoms)>

Distance based selections¶

Atoms within a user specified distance (A) from a set of user specified atoms can be selected using 'within . of .' keyword, e.g. 'within 5 of water' selects atoms that are within 5 A of water molecules. This setting will results selecting water atoms as well.

User can avoid selecting specified atoms using exwithin . of .. setting, e.g. 'exwithin 5 of water' will not select water molecules and is equivalent to 'within 5 of water and not water'

In [35]: p.select('exwithin 5 of water') == p.select('not water within 5 of water')
Out[35]: True

Sequence selections¶

One-letter amino acid sequences can be used to make atom selections. 'sequence SAR' will select SER-ALA-ARG residues in a chain. Note that the selection does not consider connectivity within a chain. Regular expressions can also be used to make selections: 'sequence "MI.*KQ"' will select MET-ILE-(XXX)n-ASP-LYS-GLN pattern, if present.

In [36]: sel = p.select('ca sequence "MI.*DKQ"')

In [37]: sel
Out[37]: <Selection: 'ca sequence "MI.*DKQ"' from 3mht (8 atoms)>

In [38]: sel.getResnames()
Out[38]: 
array(['MET', 'ILE', 'GLU', 'ILE', 'LYS', 'ASP', 'LYS', 'GLN'],
      dtype='|S6')

Expanding selections¶

A selection can be expanded to include the atoms in the same residue, chain, or segment using same .. as .. setting, e.g. 'same residue as exwithin 4 of water' will select residues that have at least an atom within 4 A of any water molecule.

In [39]: p.select('same residue as exwithin 4 of water')
Out[39]: <Selection: 'same residue as...thin 4 of water' from 3mht (1554 atoms)>

Additionally, a selection may be expanded to the immediately bonded atoms using bonded [n] to ... setting, e.g. bonded 1 to calpha will select atoms bonded to Cα atoms. For this setting to work, bonds must be set by the user using the AtomGroup.setBonds() or AtomGroup.inferBonds() method. It is also possible to select bonded atoms by excluding the originating atoms using exbonded [n] to ... setting. Number '[n]' indicates number of bonds to consider from the originating selection.

Selection macros¶

ProDy allows you to define a macro for any valid selection string. Below functions are for manipulating selection macros:

defSelectionMacro()

delSelectionMacro()

getSelectionMacro()

isSelectionMacro()

In [40]: defSelectionMacro('alanine', 'resname ALA')

In [41]: p.select('alanine') == p.select('resname ALA')
Out[41]: True

You can also use this macro as follows:

In [42]: p.alanine
Out[42]: <Selection: 'alanine' from 3mht (80 atoms)>

Macros are stored in ProDy configuration file permanently. You can delete them if you wish as follows:

In [43]: delSelectionMacro('alanine')

Keyword arguments¶

select() method also accepts keyword arguments that can simplify some selections. Consider the following case where you want to select some protein atoms that are close to its center:

In [44]: protein = p.protein

In [45]: calcCenter(protein).round(2)
Out[45]: array([-21.17,  35.86,  79.97])

In [46]: sel1 = protein.select('sqrt(sq(x--21.17) + sq(y-35.86) + sq(z-79.97)) < 5')

In [47]: sel1
Out[47]: <Selection: '(sqrt(sq(x--21....) and (protein)' from 3mht (20 atoms)>

Instead, you could pass a keyword argument and use the keyword in the selection string:

In [48]: sel2 = protein.select('within 5 of center', center=calcCenter(protein))

In [49]: sel2
Out[49]: <Selection: 'index 1452 to 1...33 2935 to 2944' from 3mht (20 atoms)>

In [50]: sel1 == sel2
Out[50]: True

Note that selection string for sel2 lists indices of atoms. This substitution is performed automatically to ensure reproducibility of the selection without the keyword center.

Keywords cannot be reserved words (see listReservedWords()) and must be all alphanumeric characters.

exception SelectionError(sel, loc=0, msg='', tkns=None)[source]¶: Exception raised when there are errors in the selection string.

exception SelectionWarning(sel='', loc=0, msg='', tkns=None)[source]¶: A class used for issuing warning messages when potential typos are detected in a selection string. Warnings are issued to sys.stderr via ProDy package logger. Use confProDy() to selection warnings on or off, e.g. confProDy(selection_warning=False).

class Select[source]¶

Select subsets of atoms based on a selection string. See select module documentation for selection grammar and examples. This class makes use of pyparsing module.

getBoolArray(atoms, selstr, **kwargs)[source]¶: Returns a boolean array with True values for atoms matching selstr. The length of the boolean numpy.ndarray will be equal to the length of atoms argument.

getIndices(atoms, selstr, **kwargs)[source]¶: Returns indices of atoms matching selstr. Indices correspond to the order in atoms argument. If atoms is a subset of atoms, they should not be used for indexing the corresponding AtomGroup instance.

select(atoms, selstr, **kwargs)[source]¶

Returns a Selection of atoms matching selstr, or None, if selection string does not match any atoms.

Parameters:	atoms (`Atomic`) – atoms to be evaluated selstr (str) – selection string

Note that, if atoms is an AtomMap instance, an AtomMap is returned, instead of a a Selection.

defSelectionMacro(name, selstr)[source]¶

Define selection macro selstr with name name. Both name and selstr must be string. An existing keyword cannot be used as a macro name. If a macro with given name exists, it will be overwritten.

In [1]: defSelectionMacro('cbeta', 'name CB and protein')

delSelectionMacro(name)[source]¶

Delete the macro name.

In [1]: delSelectionMacro('cbeta')

getSelectionMacro(name=None)[source]¶: Returns the definition of the macro name. If name is not given, returns a copy of the selection macros dictionary.

isSelectionMacro(word)[source]¶: Returns True if word is a user defined selection macro.