Atom Selections¶
This module defines a class for selecting subsets of atoms. You can read
this page in interactive sessions using help(select)
.
ProDy offers a fast and powerful atom selection class, Select
.
Selection features, grammar, and keywords are similar to those of VMD.
Small differences, that is described below, should not affect most practical
uses of atom selections. With added flexibility of Python, ProDy selection
engine can also be used to identify intermolecular contacts. You may see
this and other usage examples in Intermolecular Contacts and
Operations on Selections.
First, we import everything from ProDy and parse a protein-DNA-ligand complex structure:
In [1]: from prody import *
In [2]: p = parsePDB('3mht')
parsePDB()
returns AtomGroup
instances, p
in this case,
that stores all atomic data in the file. We can count different types of
atoms using Atom Flags and numAtoms()
method as follows:
In [3]: p.numAtoms('protein')
Out[3]: 2606
In [4]: p.numAtoms('nucleic')
Out[4]: 509
In [5]: p.numAtoms('hetero')
Out[5]: 96
In [6]: p.numAtoms('water')
Out[6]: 70
Last two counts suggest that ligand has 26 atoms, i.e. number of hetero atoms less the number of water atoms.
Atom flags¶
We select subset of atoms by using AtomGroup.select()
method.
All Atom Flags can be input arguments to this methods as follows:
In [7]: p.select('protein')
Out[7]: <Selection: 'protein' from 3mht (2606 atoms)>
In [8]: p.select('water')
Out[8]: <Selection: 'water' from 3mht (70 atoms)>
This operation returns Selection
instances, which can be an input
to functions that accepts an atoms argument.
Logical operators¶
Flags can be combined using 'and'
and 'or'
operators:
In [9]: p.select('protein and water')
'protein and water'
did not result in selection of protein and
water atoms. This is because, no atom is flagged as a protein and a
water atom at the same time.
Note
Interpreting selection strings
You may think as if a selection string, such as 'protein and water'
, is
evaluated on a per atom basis and an atom is selected if it satisfies the
given criterion. To select both water and protein atoms, 'or'
logical
operator should be used instead. A protein or a water atom would satisfy
'protein or water'
criterion.
In [10]: p.select('protein or water')
Out[10]: <Selection: 'protein or water' from 3mht (2676 atoms)>
We can also use 'not'
operator to negate an atom flag. For example,
the following selection will only select ligand atoms:
In [11]: p.select('not water and hetero')
Out[11]: <Selection: 'not water and hetero' from 3mht (26 atoms)>
If you omit the 'and'
operator, you will get the same result:
In [12]: p.select('not water hetero')
Out[12]: <Selection: 'not water hetero' from 3mht (26 atoms)>
Note
Default operator between two flags, or other selection tokens that will
be discussed later, is 'and'
. For example, 'not water hetero'
is equivalent to 'not water and hetero'
.
We can select Cα atoms of acidic residues by omitting the default logical operator as follows:
In [13]: sel = p.select('acidic calpha')
In [14]: sel
Out[14]: <Selection: 'acidic calpha' from 3mht (39 atoms)>
In [15]: set(sel.getResnames())
Out[15]: {'ASP', 'GLU'}
Quick selections¶
For simple selections, such as shown above, following may be preferable over
the select()
method:
In [16]: p.acidic_calpha
Out[16]: <Selection: 'acidic calpha' from 3mht (39 atoms)>
The result is the same as using p.select('acidic calpha')
. Underscore,
_
, is considered as a whitespace. The limitation of this approach is that
special characters cannot be used.
Atom data fields¶
In addition to Atom Flags, Atom Data Fields can be used in atom selections when combined with some values. For example, we can select Cα and Cβ atoms of alanine residues as follows:
In [17]: p.select('resname ALA name CA CB')
Out[17]: <Selection: 'resname ALA name CA CB' from 3mht (32 atoms)>
Note that we omitted the default 'and'
operator.
Note
Whitespace or empty string can be specified using an '_'
.
Atoms with string data fields empty, such as those with no a chain
identifiers or alternate location identifiers, can be selected using
an underscore.
In [18]: p.select('chain _') # chain identifiers of all atoms are specified in 3mht
In [19]: p.select('altloc _') # altloc identifiers for all atoms are empty
Out[19]: <Selection: 'altloc _' from 3mht (3211 atoms)>
Numeric data fields can also be used to make selections:
In [20]: p.select('ca resnum 1 2 3 4')
Out[20]: <Selection: 'ca resnum 1 2 3 4' from 3mht (4 atoms)>
A special case for residues is having insertion codes. Residue numbers and insertion codes can be specified together as follows:
'resnum 5'
selects residue 5 (all insertion codes)'resnum 5A'
selects residue 5 with insertion code A'resnum 5_'
selects residue 5 with no insertion code
Number ranges¶
A range of numbers using 'to'
or Python style slicing with ':'
:
In [21]: p.select('ca resnum 1to4')
Out[21]: <Selection: 'ca resnum 1to4' from 3mht (4 atoms)>
In [22]: p.select('ca resnum 1:4')
Out[22]: <Selection: 'ca resnum 1:4' from 3mht (3 atoms)>
In [23]: p.select('ca resnum 1:4:2')
Out[23]: <Selection: 'ca resnum 1:4:2' from 3mht (2 atoms)>
Note
Number ranges specify continuous intervals:
'to'
is all inclusive, e.g.'resnum 1 to 4'
means'1 <= resnum <= 4'
':'
is left inclusive, e.g.'resnum 1:4'
means'1 <= resnum < 4'
Consecutive use of ':'
, however, specifies a discrete range of numbers,
e.g. 'resnum 1:4:2'
means 'resnum 1 3'
Special characters¶
Following characters can be specified when using Atom Data Fields for atom selections:
abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
0123456789
~@#$.:;_',
For example, "name C' N` O~ C$ C#"
is a valid selection string.
Note
Special characters (~!@#$%^&*()-_=+[{}]\|;:,<>./?()'"
) must be
escaped using grave accent characters (``
).
Negative numbers¶
Negative numbers and number ranges must also be escaped using grave accent
characters, since negative sign '-'
is considered a special character
unless it indicates subtraction operation (see below).
In [24]: p.select('x `-25 to 25`')
Out[24]: <Selection: 'x `-25 to 25`' from 3mht (1941 atoms)>
In [25]: p.select('x `-22.542`')
Out[25]: <Selection: 'x `-22.542`' from 3mht (1 atoms)>
Omitting the grave accent character will cause a SelectionError
.
Regular expressions¶
Finally, you can specify regular expressions to select atoms based on data fields with type string. Following will select residues whose names start with capital letter A
In [26]: sel = p.select('resname "A.*"')
In [27]: set(sel.getResnames())
Out[27]: {'ALA', 'ARG', 'ASN', 'ASP'}
Note
Regular expressions can be specified using double quotes, "..."
.
For more information on regular expressions see re
.
Numerical comparisons¶
Atom Data Fields with numeric types can be used as operands in numerical comparisons:
In [28]: p.select('x < 0')
Out[28]: <Selection: 'x < 0' from 3mht (3095 atoms)>
In [29]: p.select('occupancy = 1')
Out[29]: <Selection: 'occupancy = 1' from 3mht (3211 atoms)>
Comparison | Description |
---|---|
< | less than |
> | greater than |
<= | less than or equal |
>= | greater than or equal |
== | equal |
= | equal |
!= | not equal |
It is also possible to chain comparison statements as follows:
In [30]: p.select('-10 <= x < 0')
Out[30]: <Selection: '-10 <= x < 0' from 3mht (557 atoms)>
This would be the same as the following selection:
In [31]: p.select('-10 <= x and x < 0') == p.select('-10 <= x < 0')
Out[31]: True
Furthermore, numerical comparisons may involve the following operations:
Operation | Description |
---|---|
x ** y | x to the power y |
x ^ y | x to the power y |
x * y | x times y |
x / y | x divided by y |
x // y | x divided by y (floor division) |
x % y | x modulo y |
x + y | x plus y |
x - y | x minus y |
These operations must be used with a numerical comparison, e.g.
In [32]: p.select('x ** 2 < 10')
Out[32]: <Selection: 'x ** 2 < 10' from 3mht (238 atoms)>
In [33]: p.select('x ** 2 ** 2 < 10')
Out[33]: <Selection: 'x ** 2 ** 2 < 10' from 3mht (134 atoms)>
Finally, following functions can be used in numerical comparisons:
Function | Description |
---|---|
abs(x) | absolute value of x |
acos(x) | arccos of x |
asin(x) | arcsin of x |
atan(x) | arctan of x |
ceil(x) | smallest integer not less than x |
cos(x) | cosine of x |
cosh(x) | hyperbolic cosine of x |
floor(x) | largest integer not greater than x |
exp(x) | e to the power x |
log(x) | natural logarithm of x |
log10(x) | base 10 logarithm of x |
sin(x) | sine of x |
sinh(x) | hyperbolic sine of x |
sq(x) | square of x |
sqrt(x) | square-root of x |
tan(x) | tangent of x |
tanh(x) | hyperbolic tangent of x |
In [34]: p.select('sqrt(sq(x) + sq(y) + sq(z)) < 100') # within 100 A of origin
Out[34]: <Selection: 'sqrt(sq(x) + sq(y) + sq(z)) < 100' from 3mht (1975 atoms)>
Distance based selections¶
Atoms within a user specified distance (A) from a set of user specified atoms
can be selected using 'within . of .'
keyword, e.g. 'within 5 of water'
selects atoms that are within 5 A of water molecules. This setting will
results selecting water atoms as well.
User can avoid selecting specified atoms using exwithin . of ..
setting,
e.g. 'exwithin 5 of water'
will not select water molecules and is
equivalent to 'within 5 of water and not water'
In [35]: p.select('exwithin 5 of water') == p.select('not water within 5 of water')
Out[35]: True
Sequence selections¶
One-letter amino acid sequences can be used to make atom selections.
'sequence SAR'
will select SER-ALA-ARG residues in a chain. Note
that the selection does not consider connectivity within a chain. Regular
expressions can also be used to make selections: 'sequence "MI.*KQ"'
will
select MET-ILE-(XXX)n-ASP-LYS-GLN pattern, if present.
In [36]: sel = p.select('ca sequence "MI.*DKQ"')
In [37]: sel
Out[37]: <Selection: 'ca sequence "MI.*DKQ"' from 3mht (8 atoms)>
In [38]: sel.getResnames()
Out[38]:
array(['MET', 'ILE', 'GLU', 'ILE', 'LYS', 'ASP', 'LYS', 'GLN'],
dtype='|S6')
Expanding selections¶
A selection can be expanded to include the atoms in the same residue,
chain, or segment using same .. as ..
setting, e.g.
'same residue as exwithin 4 of water'
will select residues that have
at least an atom within 4 A of any water molecule.
In [39]: p.select('same residue as exwithin 4 of water')
Out[39]: <Selection: 'same residue as...thin 4 of water' from 3mht (1554 atoms)>
Additionally, a selection may be expanded to the immediately bonded atoms using
bonded [n] to ...
setting, e.g. bonded 1 to calpha
will select atoms
bonded to Cα atoms. For this setting to work, bonds must be set by the user
using the AtomGroup.setBonds()
or AtomGroup.inferBonds()
method.
It is also possible to select bonded atoms by excluding the originating atoms
using exbonded [n] to ...
setting. Number '[n]'
indicates number of
bonds to consider from the originating selection.
Selection macros¶
ProDy allows you to define a macro for any valid selection string. Below functions are for manipulating selection macros:
In [40]: defSelectionMacro('alanine', 'resname ALA')
In [41]: p.select('alanine') == p.select('resname ALA')
Out[41]: True
You can also use this macro as follows:
In [42]: p.alanine
Out[42]: <Selection: 'alanine' from 3mht (80 atoms)>
Macros are stored in ProDy configuration file permanently. You can delete them if you wish as follows:
In [43]: delSelectionMacro('alanine')
Keyword arguments¶
select()
method also accepts keyword arguments that can simplify
some selections. Consider the following case where you want to select some
protein atoms that are close to its center:
In [44]: protein = p.protein
In [45]: calcCenter(protein).round(2)
Out[45]: array([-21.17, 35.86, 79.97])
In [46]: sel1 = protein.select('sqrt(sq(x--21.17) + sq(y-35.86) + sq(z-79.97)) < 5')
In [47]: sel1
Out[47]: <Selection: '(sqrt(sq(x--21....) and (protein)' from 3mht (20 atoms)>
Instead, you could pass a keyword argument and use the keyword in the selection string:
In [48]: sel2 = protein.select('within 5 of center', center=calcCenter(protein))
In [49]: sel2
Out[49]: <Selection: 'index 1452 to 1...33 2935 to 2944' from 3mht (20 atoms)>
In [50]: sel1 == sel2
Out[50]: True
Note that selection string for sel2 lists indices of atoms. This substitution is performed automatically to ensure reproducibility of the selection without the keyword center.
Keywords cannot be reserved words (see listReservedWords()
) and must be
all alphanumeric characters.
-
exception
SelectionError
(sel, loc=0, msg='', tkns=None)[source]¶ Exception raised when there are errors in the selection string.
-
exception
SelectionWarning
(sel='', loc=0, msg='', tkns=None)[source]¶ A class used for issuing warning messages when potential typos are detected in a selection string. Warnings are issued to
sys.stderr
via ProDy package logger. UseconfProDy()
to selection warnings on or off, e.g.confProDy(selection_warning=False)
.
-
class
Select
[source]¶ Select subsets of atoms based on a selection string. See
select
module documentation for selection grammar and examples. This class makes use of pyparsing module.-
getBoolArray
(atoms, selstr, **kwargs)[source]¶ Returns a boolean array with True values for atoms matching selstr. The length of the boolean
numpy.ndarray
will be equal to the length of atoms argument.
-
getIndices
(atoms, selstr, **kwargs)[source]¶ Returns indices of atoms matching selstr. Indices correspond to the order in atoms argument. If atoms is a subset of atoms, they should not be used for indexing the corresponding
AtomGroup
instance.
-
-
defSelectionMacro
(name, selstr)[source]¶ Define selection macro selstr with name name. Both name and selstr must be string. An existing keyword cannot be used as a macro name. If a macro with given name exists, it will be overwritten.
In [1]: defSelectionMacro('cbeta', 'name CB and protein')