1 | \section{\module{sets} ---
|
---|
2 | Unordered collections of unique elements}
|
---|
3 |
|
---|
4 | \declaremodule{standard}{sets}
|
---|
5 | \modulesynopsis{Implementation of sets of unique elements.}
|
---|
6 | \moduleauthor{Greg V. Wilson}{gvwilson@nevex.com}
|
---|
7 | \moduleauthor{Alex Martelli}{aleax@aleax.it}
|
---|
8 | \moduleauthor{Guido van Rossum}{guido@python.org}
|
---|
9 | \sectionauthor{Raymond D. Hettinger}{python@rcn.com}
|
---|
10 |
|
---|
11 | \versionadded{2.3}
|
---|
12 |
|
---|
13 | The \module{sets} module provides classes for constructing and manipulating
|
---|
14 | unordered collections of unique elements. Common uses include membership
|
---|
15 | testing, removing duplicates from a sequence, and computing standard math
|
---|
16 | operations on sets such as intersection, union, difference, and symmetric
|
---|
17 | difference.
|
---|
18 |
|
---|
19 | Like other collections, sets support \code{\var{x} in \var{set}},
|
---|
20 | \code{len(\var{set})}, and \code{for \var{x} in \var{set}}. Being an
|
---|
21 | unordered collection, sets do not record element position or order of
|
---|
22 | insertion. Accordingly, sets do not support indexing, slicing, or
|
---|
23 | other sequence-like behavior.
|
---|
24 |
|
---|
25 | Most set applications use the \class{Set} class which provides every set
|
---|
26 | method except for \method{__hash__()}. For advanced applications requiring
|
---|
27 | a hash method, the \class{ImmutableSet} class adds a \method{__hash__()}
|
---|
28 | method but omits methods which alter the contents of the set. Both
|
---|
29 | \class{Set} and \class{ImmutableSet} derive from \class{BaseSet}, an
|
---|
30 | abstract class useful for determining whether something is a set:
|
---|
31 | \code{isinstance(\var{obj}, BaseSet)}.
|
---|
32 |
|
---|
33 | The set classes are implemented using dictionaries. Accordingly, the
|
---|
34 | requirements for set elements are the same as those for dictionary keys;
|
---|
35 | namely, that the element defines both \method{__eq__} and \method{__hash__}.
|
---|
36 | As a result, sets
|
---|
37 | cannot contain mutable elements such as lists or dictionaries.
|
---|
38 | However, they can contain immutable collections such as tuples or
|
---|
39 | instances of \class{ImmutableSet}. For convenience in implementing
|
---|
40 | sets of sets, inner sets are automatically converted to immutable
|
---|
41 | form, for example, \code{Set([Set(['dog'])])} is transformed to
|
---|
42 | \code{Set([ImmutableSet(['dog'])])}.
|
---|
43 |
|
---|
44 | \begin{classdesc}{Set}{\optional{iterable}}
|
---|
45 | Constructs a new empty \class{Set} object. If the optional \var{iterable}
|
---|
46 | parameter is supplied, updates the set with elements obtained from iteration.
|
---|
47 | All of the elements in \var{iterable} should be immutable or be transformable
|
---|
48 | to an immutable using the protocol described in
|
---|
49 | section~\ref{immutable-transforms}.
|
---|
50 | \end{classdesc}
|
---|
51 |
|
---|
52 | \begin{classdesc}{ImmutableSet}{\optional{iterable}}
|
---|
53 | Constructs a new empty \class{ImmutableSet} object. If the optional
|
---|
54 | \var{iterable} parameter is supplied, updates the set with elements obtained
|
---|
55 | from iteration. All of the elements in \var{iterable} should be immutable or
|
---|
56 | be transformable to an immutable using the protocol described in
|
---|
57 | section~\ref{immutable-transforms}.
|
---|
58 |
|
---|
59 | Because \class{ImmutableSet} objects provide a \method{__hash__()} method,
|
---|
60 | they can be used as set elements or as dictionary keys. \class{ImmutableSet}
|
---|
61 | objects do not have methods for adding or removing elements, so all of the
|
---|
62 | elements must be known when the constructor is called.
|
---|
63 | \end{classdesc}
|
---|
64 |
|
---|
65 |
|
---|
66 | \subsection{Set Objects \label{set-objects}}
|
---|
67 |
|
---|
68 | Instances of \class{Set} and \class{ImmutableSet} both provide
|
---|
69 | the following operations:
|
---|
70 |
|
---|
71 | \begin{tableiii}{c|c|l}{code}{Operation}{Equivalent}{Result}
|
---|
72 | \lineiii{len(\var{s})}{}{cardinality of set \var{s}}
|
---|
73 |
|
---|
74 | \hline
|
---|
75 | \lineiii{\var{x} in \var{s}}{}
|
---|
76 | {test \var{x} for membership in \var{s}}
|
---|
77 | \lineiii{\var{x} not in \var{s}}{}
|
---|
78 | {test \var{x} for non-membership in \var{s}}
|
---|
79 | \lineiii{\var{s}.issubset(\var{t})}{\code{\var{s} <= \var{t}}}
|
---|
80 | {test whether every element in \var{s} is in \var{t}}
|
---|
81 | \lineiii{\var{s}.issuperset(\var{t})}{\code{\var{s} >= \var{t}}}
|
---|
82 | {test whether every element in \var{t} is in \var{s}}
|
---|
83 |
|
---|
84 | \hline
|
---|
85 | \lineiii{\var{s}.union(\var{t})}{\var{s} \textbar{} \var{t}}
|
---|
86 | {new set with elements from both \var{s} and \var{t}}
|
---|
87 | \lineiii{\var{s}.intersection(\var{t})}{\var{s} \&\ \var{t}}
|
---|
88 | {new set with elements common to \var{s} and \var{t}}
|
---|
89 | \lineiii{\var{s}.difference(\var{t})}{\var{s} - \var{t}}
|
---|
90 | {new set with elements in \var{s} but not in \var{t}}
|
---|
91 | \lineiii{\var{s}.symmetric_difference(\var{t})}{\var{s} \^\ \var{t}}
|
---|
92 | {new set with elements in either \var{s} or \var{t} but not both}
|
---|
93 | \lineiii{\var{s}.copy()}{}
|
---|
94 | {new set with a shallow copy of \var{s}}
|
---|
95 | \end{tableiii}
|
---|
96 |
|
---|
97 | Note, the non-operator versions of \method{union()},
|
---|
98 | \method{intersection()}, \method{difference()}, and
|
---|
99 | \method{symmetric_difference()} will accept any iterable as an argument.
|
---|
100 | In contrast, their operator based counterparts require their arguments to
|
---|
101 | be sets. This precludes error-prone constructions like
|
---|
102 | \code{Set('abc') \&\ 'cbs'} in favor of the more readable
|
---|
103 | \code{Set('abc').intersection('cbs')}.
|
---|
104 | \versionchanged[Formerly all arguments were required to be sets]{2.3.1}
|
---|
105 |
|
---|
106 | In addition, both \class{Set} and \class{ImmutableSet}
|
---|
107 | support set to set comparisons. Two sets are equal if and only if
|
---|
108 | every element of each set is contained in the other (each is a subset
|
---|
109 | of the other).
|
---|
110 | A set is less than another set if and only if the first set is a proper
|
---|
111 | subset of the second set (is a subset, but is not equal).
|
---|
112 | A set is greater than another set if and only if the first set is a proper
|
---|
113 | superset of the second set (is a superset, but is not equal).
|
---|
114 |
|
---|
115 | The subset and equality comparisons do not generalize to a complete
|
---|
116 | ordering function. For example, any two disjoint sets are not equal and
|
---|
117 | are not subsets of each other, so \emph{all} of the following return
|
---|
118 | \code{False}: \code{\var{a}<\var{b}}, \code{\var{a}==\var{b}}, or
|
---|
119 | \code{\var{a}>\var{b}}.
|
---|
120 | Accordingly, sets do not implement the \method{__cmp__} method.
|
---|
121 |
|
---|
122 | Since sets only define partial ordering (subset relationships), the output
|
---|
123 | of the \method{list.sort()} method is undefined for lists of sets.
|
---|
124 |
|
---|
125 | The following table lists operations available in \class{ImmutableSet}
|
---|
126 | but not found in \class{Set}:
|
---|
127 |
|
---|
128 | \begin{tableii}{c|l}{code}{Operation}{Result}
|
---|
129 | \lineii{hash(\var{s})}{returns a hash value for \var{s}}
|
---|
130 | \end{tableii}
|
---|
131 |
|
---|
132 | The following table lists operations available in \class{Set}
|
---|
133 | but not found in \class{ImmutableSet}:
|
---|
134 |
|
---|
135 | \begin{tableiii}{c|c|l}{code}{Operation}{Equivalent}{Result}
|
---|
136 | \lineiii{\var{s}.update(\var{t})}
|
---|
137 | {\var{s} \textbar= \var{t}}
|
---|
138 | {return set \var{s} with elements added from \var{t}}
|
---|
139 | \lineiii{\var{s}.intersection_update(\var{t})}
|
---|
140 | {\var{s} \&= \var{t}}
|
---|
141 | {return set \var{s} keeping only elements also found in \var{t}}
|
---|
142 | \lineiii{\var{s}.difference_update(\var{t})}
|
---|
143 | {\var{s} -= \var{t}}
|
---|
144 | {return set \var{s} after removing elements found in \var{t}}
|
---|
145 | \lineiii{\var{s}.symmetric_difference_update(\var{t})}
|
---|
146 | {\var{s} \textasciicircum= \var{t}}
|
---|
147 | {return set \var{s} with elements from \var{s} or \var{t}
|
---|
148 | but not both}
|
---|
149 |
|
---|
150 | \hline
|
---|
151 | \lineiii{\var{s}.add(\var{x})}{}
|
---|
152 | {add element \var{x} to set \var{s}}
|
---|
153 | \lineiii{\var{s}.remove(\var{x})}{}
|
---|
154 | {remove \var{x} from set \var{s}; raises \exception{KeyError}
|
---|
155 | if not present}
|
---|
156 | \lineiii{\var{s}.discard(\var{x})}{}
|
---|
157 | {removes \var{x} from set \var{s} if present}
|
---|
158 | \lineiii{\var{s}.pop()}{}
|
---|
159 | {remove and return an arbitrary element from \var{s}; raises
|
---|
160 | \exception{KeyError} if empty}
|
---|
161 | \lineiii{\var{s}.clear()}{}
|
---|
162 | {remove all elements from set \var{s}}
|
---|
163 | \end{tableiii}
|
---|
164 |
|
---|
165 | Note, the non-operator versions of \method{update()},
|
---|
166 | \method{intersection_update()}, \method{difference_update()}, and
|
---|
167 | \method{symmetric_difference_update()} will accept any iterable as
|
---|
168 | an argument.
|
---|
169 | \versionchanged[Formerly all arguments were required to be sets]{2.3.1}
|
---|
170 |
|
---|
171 | Also note, the module also includes a \method{union_update()} method
|
---|
172 | which is an alias for \method{update()}. The method is included for
|
---|
173 | backwards compatibility. Programmers should prefer the
|
---|
174 | \method{update()} method because it is supported by the builtin
|
---|
175 | \class{set()} and \class{frozenset()} types.
|
---|
176 |
|
---|
177 | \subsection{Example \label{set-example}}
|
---|
178 |
|
---|
179 | \begin{verbatim}
|
---|
180 | >>> from sets import Set
|
---|
181 | >>> engineers = Set(['John', 'Jane', 'Jack', 'Janice'])
|
---|
182 | >>> programmers = Set(['Jack', 'Sam', 'Susan', 'Janice'])
|
---|
183 | >>> managers = Set(['Jane', 'Jack', 'Susan', 'Zack'])
|
---|
184 | >>> employees = engineers | programmers | managers # union
|
---|
185 | >>> engineering_management = engineers & managers # intersection
|
---|
186 | >>> fulltime_management = managers - engineers - programmers # difference
|
---|
187 | >>> engineers.add('Marvin') # add element
|
---|
188 | >>> print engineers
|
---|
189 | Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])
|
---|
190 | >>> employees.issuperset(engineers) # superset test
|
---|
191 | False
|
---|
192 | >>> employees.union_update(engineers) # update from another set
|
---|
193 | >>> employees.issuperset(engineers)
|
---|
194 | True
|
---|
195 | >>> for group in [engineers, programmers, managers, employees]:
|
---|
196 | ... group.discard('Susan') # unconditionally remove element
|
---|
197 | ... print group
|
---|
198 | ...
|
---|
199 | Set(['Jane', 'Marvin', 'Janice', 'John', 'Jack'])
|
---|
200 | Set(['Janice', 'Jack', 'Sam'])
|
---|
201 | Set(['Jane', 'Zack', 'Jack'])
|
---|
202 | Set(['Jack', 'Sam', 'Jane', 'Marvin', 'Janice', 'John', 'Zack'])
|
---|
203 | \end{verbatim}
|
---|
204 |
|
---|
205 |
|
---|
206 | \subsection{Protocol for automatic conversion to immutable
|
---|
207 | \label{immutable-transforms}}
|
---|
208 |
|
---|
209 | Sets can only contain immutable elements. For convenience, mutable
|
---|
210 | \class{Set} objects are automatically copied to an \class{ImmutableSet}
|
---|
211 | before being added as a set element.
|
---|
212 |
|
---|
213 | The mechanism is to always add a hashable element, or if it is not
|
---|
214 | hashable, the element is checked to see if it has an
|
---|
215 | \method{__as_immutable__()} method which returns an immutable equivalent.
|
---|
216 |
|
---|
217 | Since \class{Set} objects have a \method{__as_immutable__()} method
|
---|
218 | returning an instance of \class{ImmutableSet}, it is possible to
|
---|
219 | construct sets of sets.
|
---|
220 |
|
---|
221 | A similar mechanism is needed by the \method{__contains__()} and
|
---|
222 | \method{remove()} methods which need to hash an element to check
|
---|
223 | for membership in a set. Those methods check an element for hashability
|
---|
224 | and, if not, check for a \method{__as_temporarily_immutable__()} method
|
---|
225 | which returns the element wrapped by a class that provides temporary
|
---|
226 | methods for \method{__hash__()}, \method{__eq__()}, and \method{__ne__()}.
|
---|
227 |
|
---|
228 | The alternate mechanism spares the need to build a separate copy of
|
---|
229 | the original mutable object.
|
---|
230 |
|
---|
231 | \class{Set} objects implement the \method{__as_temporarily_immutable__()}
|
---|
232 | method which returns the \class{Set} object wrapped by a new class
|
---|
233 | \class{_TemporarilyImmutableSet}.
|
---|
234 |
|
---|
235 | The two mechanisms for adding hashability are normally invisible to the
|
---|
236 | user; however, a conflict can arise in a multi-threaded environment
|
---|
237 | where one thread is updating a set while another has temporarily wrapped it
|
---|
238 | in \class{_TemporarilyImmutableSet}. In other words, sets of mutable sets
|
---|
239 | are not thread-safe.
|
---|
240 |
|
---|
241 |
|
---|
242 | \subsection{Comparison to the built-in \class{set} types
|
---|
243 | \label{comparison-to-builtin-set}}
|
---|
244 |
|
---|
245 | The built-in \class{set} and \class{frozenset} types were designed based
|
---|
246 | on lessons learned from the \module{sets} module. The key differences are:
|
---|
247 |
|
---|
248 | \begin{itemize}
|
---|
249 | \item \class{Set} and \class{ImmutableSet} were renamed to \class{set} and
|
---|
250 | \class{frozenset}.
|
---|
251 | \item There is no equivalent to \class{BaseSet}. Instead, use
|
---|
252 | \code{isinstance(x, (set, frozenset))}.
|
---|
253 | \item The hash algorithm for the built-ins performs significantly better
|
---|
254 | (fewer collisions) for most datasets.
|
---|
255 | \item The built-in versions have more space efficient pickles.
|
---|
256 | \item The built-in versions do not have a \method{union_update()} method.
|
---|
257 | Instead, use the \method{update()} method which is equivalent.
|
---|
258 | \item The built-in versions do not have a \method{_repr(sorted=True)} method.
|
---|
259 | Instead, use the built-in \function{repr()} and \function{sorted()}
|
---|
260 | functions: \code{repr(sorted(s))}.
|
---|
261 | \item The built-in version does not have a protocol for automatic conversion
|
---|
262 | to immutable. Many found this feature to be confusing and no one
|
---|
263 | in the community reported having found real uses for it.
|
---|
264 | \end{itemize}
|
---|