[190] | 1 | <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
|
---|
| 2 | <!-- /home/espenr/tmp/qt-3.3.8-espenr-2499/qt-x11-free-3.3.8/src/tools/qregexp.cpp:77 -->
|
---|
| 3 | <html>
|
---|
| 4 | <head>
|
---|
| 5 | <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
|
---|
| 6 | <title>QRegExp Class</title>
|
---|
| 7 | <style type="text/css"><!--
|
---|
| 8 | fn { margin-left: 1cm; text-indent: -1cm; }
|
---|
| 9 | a:link { color: #004faf; text-decoration: none }
|
---|
| 10 | a:visited { color: #672967; text-decoration: none }
|
---|
| 11 | body { background: #ffffff; color: black; }
|
---|
| 12 | --></style>
|
---|
| 13 | </head>
|
---|
| 14 | <body>
|
---|
| 15 |
|
---|
| 16 | <table border="0" cellpadding="0" cellspacing="0" width="100%">
|
---|
| 17 | <tr bgcolor="#E5E5E5">
|
---|
| 18 | <td valign=center>
|
---|
| 19 | <a href="index.html">
|
---|
| 20 | <font color="#004faf">Home</font></a>
|
---|
| 21 | | <a href="classes.html">
|
---|
| 22 | <font color="#004faf">All Classes</font></a>
|
---|
| 23 | | <a href="mainclasses.html">
|
---|
| 24 | <font color="#004faf">Main Classes</font></a>
|
---|
| 25 | | <a href="annotated.html">
|
---|
| 26 | <font color="#004faf">Annotated</font></a>
|
---|
| 27 | | <a href="groups.html">
|
---|
| 28 | <font color="#004faf">Grouped Classes</font></a>
|
---|
| 29 | | <a href="functions.html">
|
---|
| 30 | <font color="#004faf">Functions</font></a>
|
---|
| 31 | </td>
|
---|
| 32 | <td align="right" valign="center"><img src="logo32.png" align="right" width="64" height="32" border="0"></td></tr></table><h1 align=center>QRegExp Class Reference</h1>
|
---|
| 33 |
|
---|
| 34 | <p>The QRegExp class provides pattern matching using regular expressions.
|
---|
| 35 | <a href="#details">More...</a>
|
---|
| 36 | <p>All the functions in this class are <a href="threads.html#reentrant">reentrant</a> when Qt is built with thread support.</p>
|
---|
| 37 | <p><tt>#include <<a href="qregexp-h.html">qregexp.h</a>></tt>
|
---|
| 38 | <p><a href="qregexp-members.html">List of all member functions.</a>
|
---|
| 39 | <h2>Public Members</h2>
|
---|
| 40 | <ul>
|
---|
| 41 | <li class=fn>enum <a href="#CaretMode-enum"><b>CaretMode</b></a> { CaretAtZero, CaretAtOffset, CaretWontMatch }</li>
|
---|
| 42 | <li class=fn><a href="#QRegExp"><b>QRegExp</b></a> ()</li>
|
---|
| 43 | <li class=fn><a href="#QRegExp-2"><b>QRegExp</b></a> ( const QString & pattern, bool caseSensitive = TRUE, bool wildcard = FALSE )</li>
|
---|
| 44 | <li class=fn><a href="#QRegExp-3"><b>QRegExp</b></a> ( const QRegExp & rx )</li>
|
---|
| 45 | <li class=fn><a href="#~QRegExp"><b>~QRegExp</b></a> ()</li>
|
---|
| 46 | <li class=fn>QRegExp & <a href="#operator-eq"><b>operator=</b></a> ( const QRegExp & rx )</li>
|
---|
| 47 | <li class=fn>bool <a href="#operator-eq-eq"><b>operator==</b></a> ( const QRegExp & rx ) const</li>
|
---|
| 48 | <li class=fn>bool <a href="#operator!-eq"><b>operator!=</b></a> ( const QRegExp & rx ) const</li>
|
---|
| 49 | <li class=fn>bool <a href="#isEmpty"><b>isEmpty</b></a> () const</li>
|
---|
| 50 | <li class=fn>bool <a href="#isValid"><b>isValid</b></a> () const</li>
|
---|
| 51 | <li class=fn>QString <a href="#pattern"><b>pattern</b></a> () const</li>
|
---|
| 52 | <li class=fn>void <a href="#setPattern"><b>setPattern</b></a> ( const QString & pattern )</li>
|
---|
| 53 | <li class=fn>bool <a href="#caseSensitive"><b>caseSensitive</b></a> () const</li>
|
---|
| 54 | <li class=fn>void <a href="#setCaseSensitive"><b>setCaseSensitive</b></a> ( bool sensitive )</li>
|
---|
| 55 | <li class=fn>bool <a href="#wildcard"><b>wildcard</b></a> () const</li>
|
---|
| 56 | <li class=fn>void <a href="#setWildcard"><b>setWildcard</b></a> ( bool wildcard )</li>
|
---|
| 57 | <li class=fn>bool <a href="#minimal"><b>minimal</b></a> () const</li>
|
---|
| 58 | <li class=fn>void <a href="#setMinimal"><b>setMinimal</b></a> ( bool minimal )</li>
|
---|
| 59 | <li class=fn>bool <a href="#exactMatch"><b>exactMatch</b></a> ( const QString & str ) const</li>
|
---|
| 60 | <li class=fn>int match ( const QString & str, int index = 0, int * len = 0, bool indexIsStart = TRUE ) const <em>(obsolete)</em></li>
|
---|
| 61 | <li class=fn>int <a href="#search"><b>search</b></a> ( const QString & str, int offset = 0, CaretMode caretMode = CaretAtZero ) const</li>
|
---|
| 62 | <li class=fn>int <a href="#searchRev"><b>searchRev</b></a> ( const QString & str, int offset = -1, CaretMode caretMode = CaretAtZero ) const</li>
|
---|
| 63 | <li class=fn>int <a href="#matchedLength"><b>matchedLength</b></a> () const</li>
|
---|
| 64 | <li class=fn>int <a href="#numCaptures"><b>numCaptures</b></a> () const</li>
|
---|
| 65 | <li class=fn>QStringList <a href="#capturedTexts"><b>capturedTexts</b></a> ()</li>
|
---|
| 66 | <li class=fn>QString <a href="#cap"><b>cap</b></a> ( int nth = 0 )</li>
|
---|
| 67 | <li class=fn>int <a href="#pos"><b>pos</b></a> ( int nth = 0 )</li>
|
---|
| 68 | <li class=fn>QString <a href="#errorString"><b>errorString</b></a> ()</li>
|
---|
| 69 | </ul>
|
---|
| 70 | <h2>Static Public Members</h2>
|
---|
| 71 | <ul>
|
---|
| 72 | <li class=fn>QString <a href="#escape"><b>escape</b></a> ( const QString & str )</li>
|
---|
| 73 | </ul>
|
---|
| 74 | <hr><a name="details"></a><h2>Detailed Description</h2>
|
---|
| 75 |
|
---|
| 76 |
|
---|
| 77 |
|
---|
| 78 | The QRegExp class provides pattern matching using regular expressions.
|
---|
| 79 | <p>
|
---|
| 80 |
|
---|
| 81 |
|
---|
| 82 |
|
---|
| 83 | <!-- index regular expression --><a name="regular-expression"></a>
|
---|
| 84 | <p> Regular expressions, or "regexps", provide a way to find patterns
|
---|
| 85 | within text. This is useful in many contexts, for example:
|
---|
| 86 | <p> <center><table cellpadding="4" cellspacing="2" border="0">
|
---|
| 87 | <tr bgcolor="#f0f0f0"> <td valign="top">Validation
|
---|
| 88 | <td valign="top">A regexp can be used to check whether a piece of text
|
---|
| 89 | meets some criteria, e.g. is an integer or contains no
|
---|
| 90 | whitespace.
|
---|
| 91 | <tr bgcolor="#d0d0d0"> <td valign="top">Searching
|
---|
| 92 | <td valign="top">Regexps provide a much more powerful means of searching
|
---|
| 93 | text than simple string matching does. For example we can
|
---|
| 94 | create a regexp which says "find one of the words 'mail',
|
---|
| 95 | 'letter' or 'correspondence' but not any of the words
|
---|
| 96 | 'email', 'mailman' 'mailer', 'letterbox' etc."
|
---|
| 97 | <tr bgcolor="#f0f0f0"> <td valign="top">Search and Replace
|
---|
| 98 | <td valign="top">A regexp can be used to replace a pattern with a piece of
|
---|
| 99 | text, for example replace all occurrences of '&' with
|
---|
| 100 | '&amp;' except where the '&' is already followed by 'amp;'.
|
---|
| 101 | <tr bgcolor="#d0d0d0"> <td valign="top">String Splitting
|
---|
| 102 | <td valign="top">A regexp can be used to identify where a string should be
|
---|
| 103 | split into its component fields, e.g. splitting tab-delimited
|
---|
| 104 | strings.
|
---|
| 105 | </table></center>
|
---|
| 106 | <p> We present a very brief introduction to regexps, a description of
|
---|
| 107 | Qt's regexp language, some code examples, and finally the function
|
---|
| 108 | documentation itself. QRegExp is modeled on Perl's regexp
|
---|
| 109 | language, and also fully supports Unicode. QRegExp can also be
|
---|
| 110 | used in the weaker 'wildcard' (globbing) mode which works in a
|
---|
| 111 | similar way to command shells. A good text on regexps is <em>Mastering Regular Expressions: Powerful Techniques for Perl and Other Tools</em> by Jeffrey E. Friedl, ISBN 1565922573.
|
---|
| 112 | <p> Experienced regexp users may prefer to skip the introduction and
|
---|
| 113 | go directly to the relevant information.
|
---|
| 114 | <p> In case of multi-threaded programming, note that QRegExp depends on
|
---|
| 115 | <a href="qthreadstorage.html">QThreadStorage</a> internally. For that reason, QRegExp should only be
|
---|
| 116 | used with threads started with <a href="qthread.html">QThread</a>, i.e. not with threads
|
---|
| 117 | started with platform-specific APIs.
|
---|
| 118 | <p> <!-- toc -->
|
---|
| 119 | <ul>
|
---|
| 120 | <li><a href="#1"> Introduction
|
---|
| 121 | </a>
|
---|
| 122 | <li><a href="#1-1"> Characters and Abbreviations for Sets of Characters
|
---|
| 123 | </a>
|
---|
| 124 | <li><a href="#1-2"> Sets of Characters
|
---|
| 125 | </a>
|
---|
| 126 | <li><a href="#1-3"> Quantifiers
|
---|
| 127 | </a>
|
---|
| 128 | <li><a href="#1-4"> Capturing Text
|
---|
| 129 | </a>
|
---|
| 130 | <li><a href="#1-5"> Assertions
|
---|
| 131 | </a>
|
---|
| 132 | <li><a href="#1-6"> Wildcard Matching (globbing)
|
---|
| 133 | </a>
|
---|
| 134 | <li><a href="#1-7"> Notes for Perl Users
|
---|
| 135 | </a>
|
---|
| 136 | <li><a href="#1-8"> Code Examples
|
---|
| 137 | </a>
|
---|
| 138 | </ul>
|
---|
| 139 | <!-- endtoc -->
|
---|
| 140 |
|
---|
| 141 | <p> <h3> Introduction
|
---|
| 142 | </h3>
|
---|
| 143 | <a name="1"></a><p> Regexps are built up from expressions, quantifiers, and assertions.
|
---|
| 144 | The simplest form of expression is simply a character, e.g.
|
---|
| 145 | <b>x</b> or <b>5</b>. An expression can also be a set of
|
---|
| 146 | characters. For example, <b>[ABCD]</b>, will match an <b>A</b> or
|
---|
| 147 | a <b>B</b> or a <b>C</b> or a <b>D</b>. As a shorthand we could
|
---|
| 148 | write this as <b>[A-D]</b>. If we want to match any of the
|
---|
| 149 | captital letters in the English alphabet we can write
|
---|
| 150 | <b>[A-Z]</b>. A quantifier tells the regexp engine how many
|
---|
| 151 | occurrences of the expression we want, e.g. <b>x{1,1}</b> means
|
---|
| 152 | match an <b>x</b> which occurs at least once and at most once.
|
---|
| 153 | We'll look at assertions and more complex expressions later.
|
---|
| 154 | <p> Note that in general regexps cannot be used to check for balanced
|
---|
| 155 | brackets or tags. For example if you want to match an opening html
|
---|
| 156 | <tt><b></tt> and its closing <tt></b></tt> you can only use a regexp if you
|
---|
| 157 | know that these tags are not nested; the html fragment, <tt><b>bold <b>bolder</b></b></tt> will not match as expected. If you know the
|
---|
| 158 | maximum level of nesting it is possible to create a regexp that
|
---|
| 159 | will match correctly, but for an unknown level of nesting, regexps
|
---|
| 160 | will fail.
|
---|
| 161 | <p> We'll start by writing a regexp to match integers in the range 0
|
---|
| 162 | to 99. We will require at least one digit so we will start with
|
---|
| 163 | <b>[0-9]{1,1}</b> which means match a digit exactly once. This
|
---|
| 164 | regexp alone will match integers in the range 0 to 9. To match one
|
---|
| 165 | or two digits we can increase the maximum number of occurrences so
|
---|
| 166 | the regexp becomes <b>[0-9]{1,2}</b> meaning match a digit at
|
---|
| 167 | least once and at most twice. However, this regexp as it stands
|
---|
| 168 | will not match correctly. This regexp will match one or two digits
|
---|
| 169 | <em>within</em> a string. To ensure that we match against the whole
|
---|
| 170 | string we must use the anchor assertions. We need <b>^</b> (caret)
|
---|
| 171 | which when it is the first character in the regexp means that the
|
---|
| 172 | regexp must match from the beginning of the string. And we also
|
---|
| 173 | need <b>$</b> (dollar) which when it is the last character in the
|
---|
| 174 | regexp means that the regexp must match until the end of the
|
---|
| 175 | string. So now our regexp is <b>^[0-9]{1,2}$</b>. Note that
|
---|
| 176 | assertions, such as <b>^</b> and <b>$</b>, do not match any
|
---|
| 177 | characters.
|
---|
| 178 | <p> If you've seen regexps elsewhere they may have looked different from
|
---|
| 179 | the ones above. This is because some sets of characters and some
|
---|
| 180 | quantifiers are so common that they have special symbols to
|
---|
| 181 | represent them. <b>[0-9]</b> can be replaced with the symbol
|
---|
| 182 | <b>\d</b>. The quantifier to match exactly one occurrence,
|
---|
| 183 | <b>{1,1}</b>, can be replaced with the expression itself. This means
|
---|
| 184 | that <b>x{1,1}</b> is exactly the same as <b>x</b> alone. So our 0
|
---|
| 185 | to 99 matcher could be written <b>^\d{1,2}$</b>. Another way of
|
---|
| 186 | writing it would be <b>^\d\d{0,1}$</b>, i.e. from the start of the
|
---|
| 187 | string match a digit followed by zero or one digits. In practice
|
---|
| 188 | most people would write it <b>^\d\d?$</b>. The <b>?</b> is a
|
---|
| 189 | shorthand for the quantifier <b>{0,1}</b>, i.e. a minimum of no
|
---|
| 190 | occurrences a maximum of one occurrence. This is used to make an
|
---|
| 191 | expression optional. The regexp <b>^\d\d?$</b> means "from the
|
---|
| 192 | beginning of the string match one digit followed by zero or one
|
---|
| 193 | digits and then the end of the string".
|
---|
| 194 | <p> Our second example is matching the words 'mail', 'letter' or
|
---|
| 195 | 'correspondence' but without matching 'email', 'mailman',
|
---|
| 196 | 'mailer', 'letterbox' etc. We'll start by just matching 'mail'. In
|
---|
| 197 | full the regexp is, <b>m{1,1}a{1,1}i{1,1}l{1,1}</b>, but since
|
---|
| 198 | each expression itself is automatically quantified by <b>{1,1}</b>
|
---|
| 199 | we can simply write this as <b>mail</b>; an 'm' followed by an 'a'
|
---|
| 200 | followed by an 'i' followed by an 'l'. The symbol '|' (bar) is
|
---|
| 201 | used for <em>alternation</em>, so our regexp now becomes
|
---|
| 202 | <b>mail|letter|correspondence</b> which means match 'mail' <em>or</em>
|
---|
| 203 | 'letter' <em>or</em> 'correspondence'. Whilst this regexp will find the
|
---|
| 204 | words we want it will also find words we don't want such as
|
---|
| 205 | 'email'. We will start by putting our regexp in parentheses,
|
---|
| 206 | <b>(mail|letter|correspondence)</b>. Parentheses have two effects,
|
---|
| 207 | firstly they group expressions together and secondly they identify
|
---|
| 208 | parts of the regexp that we wish to <a href="#capturing-text">capture</a>. Our regexp still matches any of the three words but now
|
---|
| 209 | they are grouped together as a unit. This is useful for building
|
---|
| 210 | up more complex regexps. It is also useful because it allows us to
|
---|
| 211 | examine which of the words actually matched. We need to use
|
---|
| 212 | another assertion, this time <b>\b</b> "word boundary":
|
---|
| 213 | <b>\b(mail|letter|correspondence)\b</b>. This regexp means "match
|
---|
| 214 | a word boundary followed by the expression in parentheses followed
|
---|
| 215 | by another word boundary". The <b>\b</b> assertion matches at a <em>position</em> in the regexp not a <em>character</em> in the regexp. A word
|
---|
| 216 | boundary is any non-word character such as a space a newline or
|
---|
| 217 | the beginning or end of the string.
|
---|
| 218 | <p> For our third example we want to replace ampersands with the HTML
|
---|
| 219 | entity '&amp;'. The regexp to match is simple: <b>&</b>, i.e.
|
---|
| 220 | match one ampersand. Unfortunately this will mess up our text if
|
---|
| 221 | some of the ampersands have already been turned into HTML
|
---|
| 222 | entities. So what we really want to say is replace an ampersand
|
---|
| 223 | providing it is not followed by 'amp;'. For this we need the
|
---|
| 224 | negative lookahead assertion and our regexp becomes:
|
---|
| 225 | <b>&(?!amp;)</b>. The negative lookahead assertion is introduced
|
---|
| 226 | with '(?!' and finishes at the ')'. It means that the text it
|
---|
| 227 | contains, 'amp;' in our example, must <em>not</em> follow the expression
|
---|
| 228 | that preceeds it.
|
---|
| 229 | <p> Regexps provide a rich language that can be used in a variety of
|
---|
| 230 | ways. For example suppose we want to count all the occurrences of
|
---|
| 231 | 'Eric' and 'Eirik' in a string. Two valid regexps to match these
|
---|
| 232 | are <b>\b(Eric|Eirik)\b</b> and <b>\bEi?ri[ck]\b</b>. We need
|
---|
| 233 | the word boundary '\b' so we don't get 'Ericsson' etc. The second
|
---|
| 234 | regexp actually matches more than we want, 'Eric', 'Erik', 'Eiric'
|
---|
| 235 | and 'Eirik'.
|
---|
| 236 | <p> We will implement some the examples above in the
|
---|
| 237 | <a href="#code-examples">code examples</a> section.
|
---|
| 238 | <p> <a name="characters-and-abbreviations-for-sets-of-characters"></a>
|
---|
| 239 | <h3> Characters and Abbreviations for Sets of Characters
|
---|
| 240 | </h3>
|
---|
| 241 | <a name="1-1"></a><p> <center><table cellpadding="4" cellspacing="2" border="0">
|
---|
| 242 | <tr bgcolor="#a2c511"> <th valign="top">Element <th valign="top">Meaning
|
---|
| 243 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>c</b>
|
---|
| 244 | <td valign="top">Any character represents itself unless it has a special
|
---|
| 245 | regexp meaning. Thus <b>c</b> matches the character <em>c</em>.
|
---|
| 246 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>\c</b>
|
---|
| 247 | <td valign="top">A character that follows a backslash matches the character
|
---|
| 248 | itself except where mentioned below. For example if you
|
---|
| 249 | wished to match a literal caret at the beginning of a string
|
---|
| 250 | you would write <b>\^</b>.
|
---|
| 251 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>\a</b>
|
---|
| 252 | <td valign="top">This matches the ASCII bell character (BEL, 0x07).
|
---|
| 253 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>\f</b>
|
---|
| 254 | <td valign="top">This matches the ASCII form feed character (FF, 0x0C).
|
---|
| 255 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>\n</b>
|
---|
| 256 | <td valign="top">This matches the ASCII line feed character (LF, 0x0A, Unix newline).
|
---|
| 257 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>\r</b>
|
---|
| 258 | <td valign="top">This matches the ASCII carriage return character (CR, 0x0D).
|
---|
| 259 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>\t</b>
|
---|
| 260 | <td valign="top">This matches the ASCII horizontal tab character (HT, 0x09).
|
---|
| 261 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>\v</b>
|
---|
| 262 | <td valign="top">This matches the ASCII vertical tab character (VT, 0x0B).
|
---|
| 263 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>\xhhhh</b>
|
---|
| 264 | <td valign="top">This matches the Unicode character corresponding to the
|
---|
| 265 | hexadecimal number hhhh (between 0x0000 and 0xFFFF). \0ooo
|
---|
| 266 | (i.e., \zero ooo) matches the ASCII/Latin-1 character
|
---|
| 267 | corresponding to the octal number ooo (between 0 and 0377).
|
---|
| 268 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>. (dot)</b>
|
---|
| 269 | <td valign="top">This matches any character (including newline).
|
---|
| 270 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>\d</b>
|
---|
| 271 | <td valign="top">This matches a digit (<a href="qchar.html#isDigit">QChar::isDigit</a>()).
|
---|
| 272 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>\D</b>
|
---|
| 273 | <td valign="top">This matches a non-digit.
|
---|
| 274 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>\s</b>
|
---|
| 275 | <td valign="top">This matches a whitespace (<a href="qchar.html#isSpace">QChar::isSpace</a>()).
|
---|
| 276 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>\S</b>
|
---|
| 277 | <td valign="top">This matches a non-whitespace.
|
---|
| 278 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>\w</b>
|
---|
| 279 | <td valign="top">This matches a word character (<a href="qchar.html#isLetterOrNumber">QChar::isLetterOrNumber</a>() or '_').
|
---|
| 280 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>\W</b>
|
---|
| 281 | <td valign="top">This matches a non-word character.
|
---|
| 282 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>\n</b>
|
---|
| 283 | <td valign="top">The n-th <a href="#capturing-text">backreference</a>,
|
---|
| 284 | e.g. \1, \2, etc.
|
---|
| 285 | </table></center>
|
---|
| 286 | <p> <em>Note that the C++ compiler transforms backslashes in strings so to include a <b>\</b> in a regexp you will need to enter it twice, i.e. <b>\\</b>.</em>
|
---|
| 287 | <p> <a name="sets-of-characters"></a>
|
---|
| 288 | <h3> Sets of Characters
|
---|
| 289 | </h3>
|
---|
| 290 | <a name="1-2"></a><p> Square brackets are used to match any character in the set of
|
---|
| 291 | characters contained within the square brackets. All the character
|
---|
| 292 | set abbreviations described above can be used within square
|
---|
| 293 | brackets. Apart from the character set abbreviations and the
|
---|
| 294 | following two exceptions no characters have special meanings in
|
---|
| 295 | square brackets.
|
---|
| 296 | <p> <center><table cellpadding="4" cellspacing="2" border="0">
|
---|
| 297 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>^</b>
|
---|
| 298 | <td valign="top">The caret negates the character set if it occurs as the
|
---|
| 299 | first character, i.e. immediately after the opening square
|
---|
| 300 | bracket. For example, <b>[abc]</b> matches 'a' or 'b' or 'c',
|
---|
| 301 | but <b>[^abc]</b> matches anything <em>except</em> 'a' or 'b' or
|
---|
| 302 | 'c'.
|
---|
| 303 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>-</b>
|
---|
| 304 | <td valign="top">The dash is used to indicate a range of characters, for
|
---|
| 305 | example <b>[W-Z]</b> matches 'W' or 'X' or 'Y' or 'Z'.
|
---|
| 306 | </table></center>
|
---|
| 307 | <p> Using the predefined character set abbreviations is more portable
|
---|
| 308 | than using character ranges across platforms and languages. For
|
---|
| 309 | example, <b>[0-9]</b> matches a digit in Western alphabets but
|
---|
| 310 | <b>\d</b> matches a digit in <em>any</em> alphabet.
|
---|
| 311 | <p> Note that in most regexp literature sets of characters are called
|
---|
| 312 | "character classes".
|
---|
| 313 | <p> <a name="quantifiers"></a>
|
---|
| 314 | <h3> Quantifiers
|
---|
| 315 | </h3>
|
---|
| 316 | <a name="1-3"></a><p> By default an expression is automatically quantified by
|
---|
| 317 | <b>{1,1}</b>, i.e. it should occur exactly once. In the following
|
---|
| 318 | list <b><em>E</em></b> stands for any expression. An expression is a
|
---|
| 319 | character or an abbreviation for a set of characters or a set of
|
---|
| 320 | characters in square brackets or any parenthesised expression.
|
---|
| 321 | <p> <center><table cellpadding="4" cellspacing="2" border="0">
|
---|
| 322 | <tr bgcolor="#d0d0d0"> <td valign="top"><b><em>E</em>?</b>
|
---|
| 323 | <td valign="top">Matches zero or one occurrence of <em>E</em>. This quantifier
|
---|
| 324 | means "the previous expression is optional" since it will
|
---|
| 325 | match whether or not the expression occurs in the string. It
|
---|
| 326 | is the same as <b><em>E</em>{0,1}</b>. For example <b>dents?</b>
|
---|
| 327 | will match 'dent' and 'dents'.
|
---|
| 328 | <tr bgcolor="#f0f0f0"> <td valign="top"><b><em>E</em>+</b>
|
---|
| 329 | <td valign="top">Matches one or more occurrences of <em>E</em>. This is the same
|
---|
| 330 | as <b><em>E</em>{1,MAXINT}</b>. For example, <b>0+</b> will match
|
---|
| 331 | '0', '00', '000', etc.
|
---|
| 332 | <tr bgcolor="#d0d0d0"> <td valign="top"><b><em>E</em>*</b>
|
---|
| 333 | <td valign="top">Matches zero or more occurrences of <em>E</em>. This is the same
|
---|
| 334 | as <b><em>E</em>{0,MAXINT}</b>. The <b>*</b> quantifier is often
|
---|
| 335 | used by a mistake. Since it matches <em>zero</em> or more
|
---|
| 336 | occurrences it will match no occurrences at all. For example
|
---|
| 337 | if we want to match strings that end in whitespace and use
|
---|
| 338 | the regexp <b>\s*$</b> we would get a match on every string.
|
---|
| 339 | This is because we have said find zero or more whitespace
|
---|
| 340 | followed by the end of string, so even strings that don't end
|
---|
| 341 | in whitespace will match. The regexp we want in this case is
|
---|
| 342 | <b>\s+$</b> to match strings that have at least one
|
---|
| 343 | whitespace at the end.
|
---|
| 344 | <tr bgcolor="#f0f0f0"> <td valign="top"><b><em>E</em>{n}</b>
|
---|
| 345 | <td valign="top">Matches exactly <em>n</em> occurrences of the expression. This
|
---|
| 346 | is the same as repeating the expression <em>n</em> times. For
|
---|
| 347 | example, <b>x{5}</b> is the same as <b>xxxxx</b>. It is also
|
---|
| 348 | the same as <b><em>E</em>{n,n}</b>, e.g. <b>x{5,5}</b>.
|
---|
| 349 | <tr bgcolor="#d0d0d0"> <td valign="top"><b><em>E</em>{n,}</b>
|
---|
| 350 | <td valign="top">Matches at least <em>n</em> occurrences of the expression. This
|
---|
| 351 | is the same as <b><em>E</em>{n,MAXINT}</b>.
|
---|
| 352 | <tr bgcolor="#f0f0f0"> <td valign="top"><b><em>E</em>{,m}</b>
|
---|
| 353 | <td valign="top">Matches at most <em>m</em> occurrences of the expression. This
|
---|
| 354 | is the same as <b><em>E</em>{0,m}</b>.
|
---|
| 355 | <tr bgcolor="#d0d0d0"> <td valign="top"><b><em>E</em>{n,m}</b>
|
---|
| 356 | <td valign="top">Matches at least <em>n</em> occurrences of the expression and at
|
---|
| 357 | most <em>m</em> occurrences of the expression.
|
---|
| 358 | </table></center>
|
---|
| 359 | <p> (MAXINT is implementation dependent but will not be smaller than
|
---|
| 360 | 1024.)
|
---|
| 361 | <p> If we wish to apply a quantifier to more than just the preceding
|
---|
| 362 | character we can use parentheses to group characters together in
|
---|
| 363 | an expression. For example, <b>tag+</b> matches a 't' followed by
|
---|
| 364 | an 'a' followed by at least one 'g', whereas <b>(tag)+</b> matches
|
---|
| 365 | at least one occurrence of 'tag'.
|
---|
| 366 | <p> Note that quantifiers are "greedy". They will match as much text
|
---|
| 367 | as they can. For example, <b>0+</b> will match as many zeros as it
|
---|
| 368 | can from the first zero it finds, e.g. '2.<u>000</u>5'.
|
---|
| 369 | Quantifiers can be made non-greedy, see <a href="#setMinimal">setMinimal</a>().
|
---|
| 370 | <p> <a name="capturing-text"></a>
|
---|
| 371 | <h3> Capturing Text
|
---|
| 372 | </h3>
|
---|
| 373 | <a name="1-4"></a><p> Parentheses allow us to group elements together so that we can
|
---|
| 374 | quantify and capture them. For example if we have the expression
|
---|
| 375 | <b>mail|letter|correspondence</b> that matches a string we know
|
---|
| 376 | that <em>one</em> of the words matched but not which one. Using
|
---|
| 377 | parentheses allows us to "capture" whatever is matched within
|
---|
| 378 | their bounds, so if we used <b>(mail|letter|correspondence)</b>
|
---|
| 379 | and matched this regexp against the string "I sent you some email"
|
---|
| 380 | we can use the <a href="#cap">cap</a>() or <a href="#capturedTexts">capturedTexts</a>() functions to extract the
|
---|
| 381 | matched characters, in this case 'mail'.
|
---|
| 382 | <p> We can use captured text within the regexp itself. To refer to the
|
---|
| 383 | captured text we use <em>backreferences</em> which are indexed from 1,
|
---|
| 384 | the same as for cap(). For example we could search for duplicate
|
---|
| 385 | words in a string using <b>\b(\w+)\W+\1\b</b> which means match a
|
---|
| 386 | word boundary followed by one or more word characters followed by
|
---|
| 387 | one or more non-word characters followed by the same text as the
|
---|
| 388 | first parenthesised expression followed by a word boundary.
|
---|
| 389 | <p> If we want to use parentheses purely for grouping and not for
|
---|
| 390 | capturing we can use the non-capturing syntax, e.g.
|
---|
| 391 | <b>(?:green|blue)</b>. Non-capturing parentheses begin '(?:' and
|
---|
| 392 | end ')'. In this example we match either 'green' or 'blue' but we
|
---|
| 393 | do not capture the match so we only know whether or not we matched
|
---|
| 394 | but not which color we actually found. Using non-capturing
|
---|
| 395 | parentheses is more efficient than using capturing parentheses
|
---|
| 396 | since the regexp engine has to do less book-keeping.
|
---|
| 397 | <p> Both capturing and non-capturing parentheses may be nested.
|
---|
| 398 | <p> <a name="assertions"></a>
|
---|
| 399 | <h3> Assertions
|
---|
| 400 | </h3>
|
---|
| 401 | <a name="1-5"></a><p> Assertions make some statement about the text at the point where
|
---|
| 402 | they occur in the regexp but they do not match any characters. In
|
---|
| 403 | the following list <b><em>E</em></b> stands for any expression.
|
---|
| 404 | <p> <center><table cellpadding="4" cellspacing="2" border="0">
|
---|
| 405 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>^</b>
|
---|
| 406 | <td valign="top">The caret signifies the beginning of the string. If you
|
---|
| 407 | wish to match a literal <tt>^</tt> you must escape it by
|
---|
| 408 | writing <b>\^</b>. For example, <b>^#include</b> will only
|
---|
| 409 | match strings which <em>begin</em> with the characters '#include'.
|
---|
| 410 | (When the caret is the first character of a character set it
|
---|
| 411 | has a special meaning, see <a href="#sets-of-characters">Sets of
|
---|
| 412 | Characters</a>.)
|
---|
| 413 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>$</b>
|
---|
| 414 | <td valign="top">The dollar signifies the end of the string. For example
|
---|
| 415 | <b>\d\s*$</b> will match strings which end with a digit
|
---|
| 416 | optionally followed by whitespace. If you wish to match a
|
---|
| 417 | literal <tt>$</tt> you must escape it by writing
|
---|
| 418 | <b>\$</b>.
|
---|
| 419 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>\b</b>
|
---|
| 420 | <td valign="top">A word boundary. For example the regexp
|
---|
| 421 | <b>\bOK\b</b> means match immediately after a word
|
---|
| 422 | boundary (e.g. start of string or whitespace) the letter 'O'
|
---|
| 423 | then the letter 'K' immediately before another word boundary
|
---|
| 424 | (e.g. end of string or whitespace). But note that the
|
---|
| 425 | assertion does not actually match any whitespace so if we
|
---|
| 426 | write <b>(\bOK\b)</b> and we have a match it will only
|
---|
| 427 | contain 'OK' even if the string is "Its <u>OK</u> now".
|
---|
| 428 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>\B</b>
|
---|
| 429 | <td valign="top">A non-word boundary. This assertion is true wherever
|
---|
| 430 | <b>\b</b> is false. For example if we searched for
|
---|
| 431 | <b>\Bon\B</b> in "Left on" the match would fail (space
|
---|
| 432 | and end of string aren't non-word boundaries), but it would
|
---|
| 433 | match in "t<u>on</u>ne".
|
---|
| 434 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>(?=<em>E</em>)</b>
|
---|
| 435 | <td valign="top">Positive lookahead. This assertion is true if the
|
---|
| 436 | expression matches at this point in the regexp. For example,
|
---|
| 437 | <b>const(?=\s+char)</b> matches 'const' whenever it is
|
---|
| 438 | followed by 'char', as in 'static <u>const</u> char *'.
|
---|
| 439 | (Compare with <b>const\s+char</b>, which matches 'static
|
---|
| 440 | <u>const char</u> *'.)
|
---|
| 441 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>(?!<em>E</em>)</b>
|
---|
| 442 | <td valign="top">Negative lookahead. This assertion is true if the
|
---|
| 443 | expression does not match at this point in the regexp. For
|
---|
| 444 | example, <b>const(?!\s+char)</b> matches 'const' <em>except</em>
|
---|
| 445 | when it is followed by 'char'.
|
---|
| 446 | </table></center>
|
---|
| 447 | <p> <a name="wildcard-matching"></a>
|
---|
| 448 | <h3> Wildcard Matching (globbing)
|
---|
| 449 | </h3>
|
---|
| 450 | <a name="1-6"></a><p> Most command shells such as <em>bash</em> or <em>cmd.exe</em> support "file
|
---|
| 451 | globbing", the ability to identify a group of files by using
|
---|
| 452 | wildcards. The <a href="#setWildcard">setWildcard</a>() function is used to switch between
|
---|
| 453 | regexp and wildcard mode. Wildcard matching is much simpler than
|
---|
| 454 | full regexps and has only four features:
|
---|
| 455 | <p> <center><table cellpadding="4" cellspacing="2" border="0">
|
---|
| 456 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>c</b>
|
---|
| 457 | <td valign="top">Any character represents itself apart from those mentioned
|
---|
| 458 | below. Thus <b>c</b> matches the character <em>c</em>.
|
---|
| 459 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>?</b>
|
---|
| 460 | <td valign="top">This matches any single character. It is the same as
|
---|
| 461 | <b>.</b> in full regexps.
|
---|
| 462 | <tr bgcolor="#f0f0f0"> <td valign="top"><b>*</b>
|
---|
| 463 | <td valign="top">This matches zero or more of any characters. It is the
|
---|
| 464 | same as <b>.*</b> in full regexps.
|
---|
| 465 | <tr bgcolor="#d0d0d0"> <td valign="top"><b>[...]</b>
|
---|
| 466 | <td valign="top">Sets of characters can be represented in square brackets,
|
---|
| 467 | similar to full regexps. Within the character class, like
|
---|
| 468 | outside, backslash has no special meaning.
|
---|
| 469 | </table></center>
|
---|
| 470 | <p> For example if we are in wildcard mode and have strings which
|
---|
| 471 | contain filenames we could identify HTML files with <b>*.html</b>.
|
---|
| 472 | This will match zero or more characters followed by a dot followed
|
---|
| 473 | by 'h', 't', 'm' and 'l'.
|
---|
| 474 | <p> <a name="perl-users"></a>
|
---|
| 475 | <h3> Notes for Perl Users
|
---|
| 476 | </h3>
|
---|
| 477 | <a name="1-7"></a><p> Most of the character class abbreviations supported by Perl are
|
---|
| 478 | supported by QRegExp, see <a href="#characters-and-abbreviations-for-sets-of-characters">characters
|
---|
| 479 | and abbreviations for sets of characters</a>.
|
---|
| 480 | <p> In QRegExp, apart from within character classes, <tt>^</tt> always
|
---|
| 481 | signifies the start of the string, so carets must always be
|
---|
| 482 | escaped unless used for that purpose. In Perl the meaning of caret
|
---|
| 483 | varies automagically depending on where it occurs so escaping it
|
---|
| 484 | is rarely necessary. The same applies to <tt>$</tt> which in
|
---|
| 485 | QRegExp always signifies the end of the string.
|
---|
| 486 | <p> QRegExp's quantifiers are the same as Perl's greedy quantifiers.
|
---|
| 487 | Non-greedy matching cannot be applied to individual quantifiers,
|
---|
| 488 | but can be applied to all the quantifiers in the pattern. For
|
---|
| 489 | example, to match the Perl regexp <b>ro+?m</b> requires:
|
---|
| 490 | <pre>
|
---|
| 491 | QRegExp rx( "ro+m" );
|
---|
| 492 | rx.<a href="#setMinimal">setMinimal</a>( TRUE );
|
---|
| 493 | </pre>
|
---|
| 494 |
|
---|
| 495 | <p> The equivalent of Perl's <tt>/i</tt> option is
|
---|
| 496 | <a href="#setCaseSensitive">setCaseSensitive</a>(FALSE).
|
---|
| 497 | <p> Perl's <tt>/g</tt> option can be emulated using a <a href="#cap_in_a_loop">loop</a>.
|
---|
| 498 | <p> In QRegExp <b>.</b> matches any character, therefore all QRegExp
|
---|
| 499 | regexps have the equivalent of Perl's <tt>/s</tt> option. QRegExp
|
---|
| 500 | does not have an equivalent to Perl's <tt>/m</tt> option, but this
|
---|
| 501 | can be emulated in various ways for example by splitting the input
|
---|
| 502 | into lines or by looping with a regexp that searches for newlines.
|
---|
| 503 | <p> Because QRegExp is string oriented there are no \A, \Z or \z
|
---|
| 504 | assertions. The \G assertion is not supported but can be emulated
|
---|
| 505 | in a loop.
|
---|
| 506 | <p> Perl's $& is <a href="#cap">cap</a>(0) or <a href="#capturedTexts">capturedTexts</a>()[0]. There are no QRegExp
|
---|
| 507 | equivalents for $`, $' or $+. Perl's capturing variables, $1, $2,
|
---|
| 508 | ... correspond to cap(1) or capturedTexts()[1], cap(2) or
|
---|
| 509 | capturedTexts()[2], etc.
|
---|
| 510 | <p> To substitute a pattern use <a href="qstring.html#replace">QString::replace</a>().
|
---|
| 511 | <p> Perl's extended <tt>/x</tt> syntax is not supported, nor are
|
---|
| 512 | directives, e.g. (?i), or regexp comments, e.g. (?#comment). On
|
---|
| 513 | the other hand, C++'s rules for literal strings can be used to
|
---|
| 514 | achieve the same:
|
---|
| 515 | <pre>
|
---|
| 516 | QRegExp mark( "\\b" // word boundary
|
---|
| 517 | "[Mm]ark" // the word we want to match
|
---|
| 518 | );
|
---|
| 519 | </pre>
|
---|
| 520 |
|
---|
| 521 | <p> Both zero-width positive and zero-width negative lookahead
|
---|
| 522 | assertions (?=pattern) and (?!pattern) are supported with the same
|
---|
| 523 | syntax as Perl. Perl's lookbehind assertions, "independent"
|
---|
| 524 | subexpressions and conditional expressions are not supported.
|
---|
| 525 | <p> Non-capturing parentheses are also supported, with the same
|
---|
| 526 | (?:pattern) syntax.
|
---|
| 527 | <p> See <a href="qstringlist.html#split">QStringList::split</a>() and <a href="qstringlist.html#join">QStringList::join</a>() for equivalents
|
---|
| 528 | to Perl's split and join functions.
|
---|
| 529 | <p> Note: because C++ transforms \'s they must be written <em>twice</em> in
|
---|
| 530 | code, e.g. <b>\b</b> must be written <b>\\b</b>.
|
---|
| 531 | <p> <a name="code-examples"></a>
|
---|
| 532 | <h3> Code Examples
|
---|
| 533 | </h3>
|
---|
| 534 | <a name="1-8"></a><p> <pre>
|
---|
| 535 | QRegExp rx( "^\\d\\d?$" ); // match integers 0 to 99
|
---|
| 536 | rx.<a href="#search">search</a>( "123" ); // returns -1 (no match)
|
---|
| 537 | rx.<a href="#search">search</a>( "-6" ); // returns -1 (no match)
|
---|
| 538 | rx.<a href="#search">search</a>( "6" ); // returns 0 (matched as position 0)
|
---|
| 539 | </pre>
|
---|
| 540 |
|
---|
| 541 | <p> The third string matches '<u>6</u>'. This is a simple validation
|
---|
| 542 | regexp for integers in the range 0 to 99.
|
---|
| 543 | <p> <pre>
|
---|
| 544 | QRegExp rx( "^\\S+$" ); // match strings without whitespace
|
---|
| 545 | rx.<a href="#search">search</a>( "Hello world" ); // returns -1 (no match)
|
---|
| 546 | rx.<a href="#search">search</a>( "This_is-OK" ); // returns 0 (matched at position 0)
|
---|
| 547 | </pre>
|
---|
| 548 |
|
---|
| 549 | <p> The second string matches '<u>This_is-OK</u>'. We've used the
|
---|
| 550 | character set abbreviation '\S' (non-whitespace) and the anchors
|
---|
| 551 | to match strings which contain no whitespace.
|
---|
| 552 | <p> In the following example we match strings containing 'mail' or
|
---|
| 553 | 'letter' or 'correspondence' but only match whole words i.e. not
|
---|
| 554 | 'email'
|
---|
| 555 | <p> <pre>
|
---|
| 556 | QRegExp rx( "\\b(mail|letter|correspondence)\\b" );
|
---|
| 557 | rx.<a href="#search">search</a>( "I sent you an email" ); // returns -1 (no match)
|
---|
| 558 | rx.<a href="#search">search</a>( "Please write the letter" ); // returns 17
|
---|
| 559 | </pre>
|
---|
| 560 |
|
---|
| 561 | <p> The second string matches "Please write the <u>letter</u>". The
|
---|
| 562 | word 'letter' is also captured (because of the parentheses). We
|
---|
| 563 | can see what text we've captured like this:
|
---|
| 564 | <p> <pre>
|
---|
| 565 | <a href="qstring.html">QString</a> captured = rx.cap( 1 ); // captured == "letter"
|
---|
| 566 | </pre>
|
---|
| 567 |
|
---|
| 568 | <p> This will capture the text from the first set of capturing
|
---|
| 569 | parentheses (counting capturing left parentheses from left to
|
---|
| 570 | right). The parentheses are counted from 1 since <a href="#cap">cap</a>( 0 ) is the
|
---|
| 571 | whole matched regexp (equivalent to '&' in most regexp engines).
|
---|
| 572 | <p> <pre>
|
---|
| 573 | QRegExp rx( "&(?!amp;)" ); // match ampersands but not &amp;
|
---|
| 574 | <a href="qstring.html">QString</a> line1 = "This & that";
|
---|
| 575 | line1.<a href="qstring.html#replace">replace</a>( rx, "&amp;" );
|
---|
| 576 | // line1 == "This &amp; that"
|
---|
| 577 | <a href="qstring.html">QString</a> line2 = "His &amp; hers & theirs";
|
---|
| 578 | line2.<a href="qstring.html#replace">replace</a>( rx, "&amp;" );
|
---|
| 579 | // line2 == "His &amp; hers &amp; theirs"
|
---|
| 580 | </pre>
|
---|
| 581 |
|
---|
| 582 | <p> Here we've passed the QRegExp to <a href="qstring.html">QString</a>'s replace() function to
|
---|
| 583 | replace the matched text with new text.
|
---|
| 584 | <p> <pre>
|
---|
| 585 | <a href="qstring.html">QString</a> str = "One Eric another Eirik, and an Ericsson."
|
---|
| 586 | " How many Eiriks, Eric?";
|
---|
| 587 | QRegExp rx( "\\b(Eric|Eirik)\\b" ); // match Eric or Eirik
|
---|
| 588 | int pos = 0; // where we are in the string
|
---|
| 589 | int count = 0; // how many Eric and Eirik's we've counted
|
---|
| 590 | while ( pos >= 0 ) {
|
---|
| 591 | pos = rx.<a href="#search">search</a>( str, pos );
|
---|
| 592 | if ( pos >= 0 ) {
|
---|
| 593 | pos++; // move along in str
|
---|
| 594 | count++; // count our Eric or Eirik
|
---|
| 595 | }
|
---|
| 596 | }
|
---|
| 597 | </pre>
|
---|
| 598 |
|
---|
| 599 | <p> We've used the <a href="#search">search</a>() function to repeatedly match the regexp in
|
---|
| 600 | the string. Note that instead of moving forward by one character
|
---|
| 601 | at a time <tt>pos++</tt> we could have written <tt>pos += rx.matchedLength()</tt> to skip over the already matched string. The
|
---|
| 602 | count will equal 3, matching 'One <u>Eric</u> another
|
---|
| 603 | <u>Eirik</u>, and an Ericsson. How many Eiriks, <u>Eric</u>?'; it
|
---|
| 604 | doesn't match 'Ericsson' or 'Eiriks' because they are not bounded
|
---|
| 605 | by non-word boundaries.
|
---|
| 606 | <p> One common use of regexps is to split lines of delimited data into
|
---|
| 607 | their component fields.
|
---|
| 608 | <p> <pre>
|
---|
| 609 | str = "Trolltech AS\twww.trolltech.com\tNorway";
|
---|
| 610 | <a href="qstring.html">QString</a> company, web, country;
|
---|
| 611 | rx.setPattern( "^([^\t]+)\t([^\t]+)\t([^\t]+)$" );
|
---|
| 612 | if ( rx.search( str ) != -1 ) {
|
---|
| 613 | company = rx.cap( 1 );
|
---|
| 614 | web = rx.cap( 2 );
|
---|
| 615 | country = rx.cap( 3 );
|
---|
| 616 | }
|
---|
| 617 | </pre>
|
---|
| 618 |
|
---|
| 619 | <p> In this example our input lines have the format company name, web
|
---|
| 620 | address and country. Unfortunately the regexp is rather long and
|
---|
| 621 | not very versatile -- the code will break if we add any more
|
---|
| 622 | fields. A simpler and better solution is to look for the
|
---|
| 623 | separator, '\t' in this case, and take the surrounding text. The
|
---|
| 624 | <a href="qstringlist.html">QStringList</a> split() function can take a separator string or regexp
|
---|
| 625 | as an argument and split a string accordingly.
|
---|
| 626 | <p> <pre>
|
---|
| 627 | <a href="qstringlist.html">QStringList</a> field = QStringList::<a href="qstringlist.html#split">split</a>( "\t", str );
|
---|
| 628 | </pre>
|
---|
| 629 |
|
---|
| 630 | <p> Here field[0] is the company, field[1] the web address and so on.
|
---|
| 631 | <p> To imitate the matching of a shell we can use wildcard mode.
|
---|
| 632 | <p> <pre>
|
---|
| 633 | QRegExp rx( "*.html" ); // invalid regexp: * doesn't quantify anything
|
---|
| 634 | rx.<a href="#setWildcard">setWildcard</a>( TRUE ); // now it's a valid wildcard regexp
|
---|
| 635 | rx.<a href="#exactMatch">exactMatch</a>( "index.html" ); // returns TRUE
|
---|
| 636 | rx.<a href="#exactMatch">exactMatch</a>( "default.htm" ); // returns FALSE
|
---|
| 637 | rx.<a href="#exactMatch">exactMatch</a>( "readme.txt" ); // returns FALSE
|
---|
| 638 | </pre>
|
---|
| 639 |
|
---|
| 640 | <p> Wildcard matching can be convenient because of its simplicity, but
|
---|
| 641 | any wildcard regexp can be defined using full regexps, e.g.
|
---|
| 642 | <b>.*\.html$</b>. Notice that we can't match both <tt>.html</tt> and <tt>.htm</tt> files with a wildcard unless we use <b>*.htm*</b> which will
|
---|
| 643 | also match 'test.html.bak'. A full regexp gives us the precision
|
---|
| 644 | we need, <b>.*\.html?$</b>.
|
---|
| 645 | <p> QRegExp can match case insensitively using <a href="#setCaseSensitive">setCaseSensitive</a>(), and
|
---|
| 646 | can use non-greedy matching, see <a href="#setMinimal">setMinimal</a>(). By default QRegExp
|
---|
| 647 | uses full regexps but this can be changed with <a href="#setWildcard">setWildcard</a>().
|
---|
| 648 | Searching can be forward with <a href="#search">search</a>() or backward with
|
---|
| 649 | <a href="#searchRev">searchRev</a>(). Captured text can be accessed using <a href="#capturedTexts">capturedTexts</a>()
|
---|
| 650 | which returns a string list of all captured strings, or using
|
---|
| 651 | <a href="#cap">cap</a>() which returns the captured string for the given index. The
|
---|
| 652 | <a href="#pos">pos</a>() function takes a match index and returns the position in the
|
---|
| 653 | string where the match was made (or -1 if there was no match).
|
---|
| 654 | <p> <p>See also <a href="qregexpvalidator.html">QRegExpValidator</a>, <a href="qstring.html">QString</a>, <a href="qstringlist.html">QStringList</a>, <a href="misc.html">Miscellaneous Classes</a>, <a href="shared.html">Implicitly and Explicitly Shared Classes</a>, and <a href="tools.html">Non-GUI Classes</a>.
|
---|
| 655 |
|
---|
| 656 | <p> <a name="member-function-documentation"></a>
|
---|
| 657 |
|
---|
| 658 | <hr><h2>Member Type Documentation</h2>
|
---|
| 659 | <h3 class=fn><a name="CaretMode-enum"></a>QRegExp::CaretMode</h3>
|
---|
| 660 |
|
---|
| 661 | <p> The CaretMode enum defines the different meanings of the caret
|
---|
| 662 | (<b>^</b>) in a <a href="qregexp.html#regular-expression">regular expression</a>. The possible values are:
|
---|
| 663 | <ul>
|
---|
| 664 | <li><tt>QRegExp::CaretAtZero</tt> -
|
---|
| 665 | The caret corresponds to index 0 in the searched string.
|
---|
| 666 | <li><tt>QRegExp::CaretAtOffset</tt> -
|
---|
| 667 | The caret corresponds to the start offset of the search.
|
---|
| 668 | <li><tt>QRegExp::CaretWontMatch</tt> -
|
---|
| 669 | The caret never matches.
|
---|
| 670 | </ul>
|
---|
| 671 | <hr><h2>Member Function Documentation</h2>
|
---|
| 672 | <h3 class=fn><a name="QRegExp"></a>QRegExp::QRegExp ()
|
---|
| 673 | </h3>
|
---|
| 674 | Constructs an empty regexp.
|
---|
| 675 | <p> <p>See also <a href="#isValid">isValid</a>() and <a href="#errorString">errorString</a>().
|
---|
| 676 |
|
---|
| 677 | <h3 class=fn><a name="QRegExp-2"></a>QRegExp::QRegExp ( const <a href="qstring.html">QString</a> & pattern, bool caseSensitive = TRUE, bool wildcard = FALSE )
|
---|
| 678 | </h3>
|
---|
| 679 | Constructs a <a href="qregexp.html#regular-expression">regular expression</a> object for the given <em>pattern</em>
|
---|
| 680 | string. The pattern must be given using wildcard notation if <em>wildcard</em> is TRUE (default is FALSE). The pattern is case
|
---|
| 681 | sensitive, unless <em>caseSensitive</em> is FALSE. Matching is greedy
|
---|
| 682 | (maximal), but can be changed by calling <a href="#setMinimal">setMinimal</a>().
|
---|
| 683 | <p> <p>See also <a href="#setPattern">setPattern</a>(), <a href="#setCaseSensitive">setCaseSensitive</a>(), <a href="#setWildcard">setWildcard</a>(), and <a href="#setMinimal">setMinimal</a>().
|
---|
| 684 |
|
---|
| 685 | <h3 class=fn><a name="QRegExp-3"></a>QRegExp::QRegExp ( const <a href="qregexp.html">QRegExp</a> & rx )
|
---|
| 686 | </h3>
|
---|
| 687 | Constructs a <a href="qregexp.html#regular-expression">regular expression</a> as a copy of <em>rx</em>.
|
---|
| 688 | <p> <p>See also <a href="#operator-eq">operator=</a>().
|
---|
| 689 |
|
---|
| 690 | <h3 class=fn><a name="~QRegExp"></a>QRegExp::~QRegExp ()
|
---|
| 691 | </h3>
|
---|
| 692 | Destroys the <a href="qregexp.html#regular-expression">regular expression</a> and cleans up its internal data.
|
---|
| 693 |
|
---|
| 694 | <h3 class=fn><a href="qstring.html">QString</a> <a name="cap"></a>QRegExp::cap ( int nth = 0 )
|
---|
| 695 | </h3>
|
---|
| 696 | Returns the text captured by the <em>nth</em> subexpression. The entire
|
---|
| 697 | match has index 0 and the parenthesized subexpressions have
|
---|
| 698 | indices starting from 1 (excluding non-capturing parentheses).
|
---|
| 699 | <p> <pre>
|
---|
| 700 | QRegExp rxlen( "(\\d+)(?:\\s*)(cm|inch)" );
|
---|
| 701 | int pos = rxlen.<a href="#search">search</a>( "Length: 189cm" );
|
---|
| 702 | if ( pos > -1 ) {
|
---|
| 703 | <a href="qstring.html">QString</a> value = rxlen.<a href="#cap">cap</a>( 1 ); // "189"
|
---|
| 704 | <a href="qstring.html">QString</a> unit = rxlen.<a href="#cap">cap</a>( 2 ); // "cm"
|
---|
| 705 | // ...
|
---|
| 706 | }
|
---|
| 707 | </pre>
|
---|
| 708 |
|
---|
| 709 | <p> The order of elements matched by <a href="#cap">cap</a>() is as follows. The first
|
---|
| 710 | element, cap(0), is the entire matching string. Each subsequent
|
---|
| 711 | element corresponds to the next capturing open left parentheses.
|
---|
| 712 | Thus cap(1) is the text of the first capturing parentheses, cap(2)
|
---|
| 713 | is the text of the second, and so on.
|
---|
| 714 | <p> <a name="cap_in_a_loop"></a>
|
---|
| 715 | Some patterns may lead to a number of matches which cannot be
|
---|
| 716 | determined in advance, for example:
|
---|
| 717 | <p> <pre>
|
---|
| 718 | QRegExp rx( "(\\d+)" );
|
---|
| 719 | str = "Offsets: 12 14 99 231 7";
|
---|
| 720 | <a href="qstringlist.html">QStringList</a> list;
|
---|
| 721 | pos = 0;
|
---|
| 722 | while ( pos >= 0 ) {
|
---|
| 723 | pos = rx.<a href="#search">search</a>( str, pos );
|
---|
| 724 | if ( pos > -1 ) {
|
---|
| 725 | list += rx.<a href="#cap">cap</a>( 1 );
|
---|
| 726 | pos += rx.<a href="#matchedLength">matchedLength</a>();
|
---|
| 727 | }
|
---|
| 728 | }
|
---|
| 729 | // list contains "12", "14", "99", "231", "7"
|
---|
| 730 | </pre>
|
---|
| 731 |
|
---|
| 732 | <p> <p>See also <a href="#capturedTexts">capturedTexts</a>(), <a href="#pos">pos</a>(), <a href="#exactMatch">exactMatch</a>(), <a href="#search">search</a>(), and <a href="#searchRev">searchRev</a>().
|
---|
| 733 |
|
---|
| 734 | <p>Examples: <a href="archivesearch-example.html#x479">network/archivesearch/archivedialog.ui.h</a> and <a href="regexptester-example.html#x2485">regexptester/regexptester.cpp</a>.
|
---|
| 735 | <h3 class=fn><a href="qstringlist.html">QStringList</a> <a name="capturedTexts"></a>QRegExp::capturedTexts ()
|
---|
| 736 | </h3>
|
---|
| 737 | Returns a list of the captured text strings.
|
---|
| 738 | <p> The first string in the list is the entire matched string. Each
|
---|
| 739 | subsequent list element contains a string that matched a
|
---|
| 740 | (capturing) subexpression of the regexp.
|
---|
| 741 | <p> For example:
|
---|
| 742 | <pre>
|
---|
| 743 | QRegExp rx( "(\\d+)(\\s*)(cm|inch(es)?)" );
|
---|
| 744 | int pos = rx.<a href="#search">search</a>( "Length: 36 inches" );
|
---|
| 745 | <a href="qstringlist.html">QStringList</a> list = rx.<a href="#capturedTexts">capturedTexts</a>();
|
---|
| 746 | // list is now ( "36 inches", "36", " ", "inches", "es" )
|
---|
| 747 | </pre>
|
---|
| 748 |
|
---|
| 749 | <p> The above example also captures elements that may be present but
|
---|
| 750 | which we have no interest in. This problem can be solved by using
|
---|
| 751 | non-capturing parentheses:
|
---|
| 752 | <p> <pre>
|
---|
| 753 | QRegExp rx( "(\\d+)(?:\\s*)(cm|inch(?:es)?)" );
|
---|
| 754 | int pos = rx.<a href="#search">search</a>( "Length: 36 inches" );
|
---|
| 755 | <a href="qstringlist.html">QStringList</a> list = rx.<a href="#capturedTexts">capturedTexts</a>();
|
---|
| 756 | // list is now ( "36 inches", "36", "inches" )
|
---|
| 757 | </pre>
|
---|
| 758 |
|
---|
| 759 | <p> Note that if you want to iterate over the list, you should iterate
|
---|
| 760 | over a copy, e.g.
|
---|
| 761 | <pre>
|
---|
| 762 | <a href="qstringlist.html">QStringList</a> list = rx.capturedTexts();
|
---|
| 763 | QStringList::Iterator it = list.<a href="qvaluelist.html#begin">begin</a>();
|
---|
| 764 | while( it != list.<a href="qvaluelist.html#end">end</a>() ) {
|
---|
| 765 | myProcessing( *it );
|
---|
| 766 | ++it;
|
---|
| 767 | }
|
---|
| 768 | </pre>
|
---|
| 769 |
|
---|
| 770 | <p> Some regexps can match an indeterminate number of times. For
|
---|
| 771 | example if the input string is "Offsets: 12 14 99 231 7" and the
|
---|
| 772 | regexp, <tt>rx</tt>, is <b>(\d+)+</b>, we would hope to get a list of
|
---|
| 773 | all the numbers matched. However, after calling
|
---|
| 774 | <tt>rx.search(str)</tt>, <a href="#capturedTexts">capturedTexts</a>() will return the list ( "12",
|
---|
| 775 | "12" ), i.e. the entire match was "12" and the first subexpression
|
---|
| 776 | matched was "12". The correct approach is to use <a href="#cap">cap</a>() in a <a href="#cap_in_a_loop">loop</a>.
|
---|
| 777 | <p> The order of elements in the string list is as follows. The first
|
---|
| 778 | element is the entire matching string. Each subsequent element
|
---|
| 779 | corresponds to the next capturing open left parentheses. Thus
|
---|
| 780 | capturedTexts()[1] is the text of the first capturing parentheses,
|
---|
| 781 | capturedTexts()[2] is the text of the second and so on
|
---|
| 782 | (corresponding to $1, $2, etc., in some other regexp languages).
|
---|
| 783 | <p> <p>See also <a href="#cap">cap</a>(), <a href="#pos">pos</a>(), <a href="#exactMatch">exactMatch</a>(), <a href="#search">search</a>(), and <a href="#searchRev">searchRev</a>().
|
---|
| 784 |
|
---|
| 785 | <h3 class=fn>bool <a name="caseSensitive"></a>QRegExp::caseSensitive () const
|
---|
| 786 | </h3>
|
---|
| 787 | Returns TRUE if case sensitivity is enabled; otherwise returns
|
---|
| 788 | FALSE. The default is TRUE.
|
---|
| 789 | <p> <p>See also <a href="#setCaseSensitive">setCaseSensitive</a>().
|
---|
| 790 |
|
---|
| 791 | <h3 class=fn><a href="qstring.html">QString</a> <a name="errorString"></a>QRegExp::errorString ()
|
---|
| 792 | </h3>
|
---|
| 793 | Returns a text string that explains why a regexp pattern is
|
---|
| 794 | invalid the case being; otherwise returns "no error occurred".
|
---|
| 795 | <p> <p>See also <a href="#isValid">isValid</a>().
|
---|
| 796 |
|
---|
| 797 | <p>Example: <a href="regexptester-example.html#x2486">regexptester/regexptester.cpp</a>.
|
---|
| 798 | <h3 class=fn><a href="qstring.html">QString</a> <a name="escape"></a>QRegExp::escape ( const <a href="qstring.html">QString</a> & str )<tt> [static]</tt>
|
---|
| 799 | </h3>
|
---|
| 800 | Returns the string <em>str</em> with every regexp special character
|
---|
| 801 | escaped with a backslash. The special characters are $, (, ), *, +,
|
---|
| 802 | ., ?, [, \, ], ^, {, | and }.
|
---|
| 803 | <p> Example:
|
---|
| 804 | <pre>
|
---|
| 805 | s1 = QRegExp::<a href="#escape">escape</a>( "bingo" ); // s1 == "bingo"
|
---|
| 806 | s2 = QRegExp::<a href="#escape">escape</a>( "f(x)" ); // s2 == "f\\(x\\)"
|
---|
| 807 | </pre>
|
---|
| 808 |
|
---|
| 809 | <p> This function is useful to construct regexp patterns dynamically:
|
---|
| 810 | <p> <pre>
|
---|
| 811 | QRegExp rx( "(" + QRegExp::escape(name) +
|
---|
| 812 | "|" + QRegExp::escape(alias) + ")" );
|
---|
| 813 | </pre>
|
---|
| 814 |
|
---|
| 815 |
|
---|
| 816 | <h3 class=fn>bool <a name="exactMatch"></a>QRegExp::exactMatch ( const <a href="qstring.html">QString</a> & str ) const
|
---|
| 817 | </h3>
|
---|
| 818 | Returns TRUE if <em>str</em> is matched exactly by this <a href="qregexp.html#regular-expression">regular expression</a>; otherwise returns FALSE. You can determine how much of
|
---|
| 819 | the string was matched by calling <a href="#matchedLength">matchedLength</a>().
|
---|
| 820 | <p> For a given regexp string, R, <a href="#exactMatch">exactMatch</a>("R") is the equivalent of
|
---|
| 821 | <a href="#search">search</a>("^R$") since exactMatch() effectively encloses the regexp
|
---|
| 822 | in the start of string and end of string anchors, except that it
|
---|
| 823 | sets matchedLength() differently.
|
---|
| 824 | <p> For example, if the regular expression is <b>blue</b>, then
|
---|
| 825 | exactMatch() returns TRUE only for input <tt>blue</tt>. For inputs <tt>bluebell</tt>, <tt>blutak</tt> and <tt>lightblue</tt>, exactMatch() returns FALSE
|
---|
| 826 | and matchedLength() will return 4, 3 and 0 respectively.
|
---|
| 827 | <p> Although const, this function sets matchedLength(),
|
---|
| 828 | <a href="#capturedTexts">capturedTexts</a>() and <a href="#pos">pos</a>().
|
---|
| 829 | <p> <p>See also <a href="#search">search</a>(), <a href="#searchRev">searchRev</a>(), and <a href="qregexpvalidator.html">QRegExpValidator</a>.
|
---|
| 830 |
|
---|
| 831 | <h3 class=fn>bool <a name="isEmpty"></a>QRegExp::isEmpty () const
|
---|
| 832 | </h3>
|
---|
| 833 | Returns TRUE if the pattern string is empty; otherwise returns
|
---|
| 834 | FALSE.
|
---|
| 835 | <p> If you call <a href="#exactMatch">exactMatch</a>() with an empty pattern on an empty string
|
---|
| 836 | it will return TRUE; otherwise it returns FALSE since it operates
|
---|
| 837 | over the whole string. If you call <a href="#search">search</a>() with an empty pattern
|
---|
| 838 | on <em>any</em> string it will return the start offset (0 by default)
|
---|
| 839 | because the empty pattern matches the 'emptiness' at the start of
|
---|
| 840 | the string. In this case the length of the match returned by
|
---|
| 841 | <a href="#matchedLength">matchedLength</a>() will be 0.
|
---|
| 842 | <p> See <a href="qstring.html#isEmpty">QString::isEmpty</a>().
|
---|
| 843 |
|
---|
| 844 | <h3 class=fn>bool <a name="isValid"></a>QRegExp::isValid () const
|
---|
| 845 | </h3>
|
---|
| 846 | Returns TRUE if the <a href="qregexp.html#regular-expression">regular expression</a> is valid; otherwise returns
|
---|
| 847 | FALSE. An invalid regular expression never matches.
|
---|
| 848 | <p> The pattern <b>[a-z</b> is an example of an invalid pattern, since
|
---|
| 849 | it lacks a closing square bracket.
|
---|
| 850 | <p> Note that the validity of a regexp may also depend on the setting
|
---|
| 851 | of the wildcard flag, for example <b>*.html</b> is a valid
|
---|
| 852 | wildcard regexp but an invalid full regexp.
|
---|
| 853 | <p> <p>See also <a href="#errorString">errorString</a>().
|
---|
| 854 |
|
---|
| 855 | <p>Example: <a href="regexptester-example.html#x2487">regexptester/regexptester.cpp</a>.
|
---|
| 856 | <h3 class=fn>int <a name="match"></a>QRegExp::match ( const <a href="qstring.html">QString</a> & str, int index = 0, int * len = 0, bool indexIsStart = TRUE ) const
|
---|
| 857 | </h3> <b>This function is obsolete.</b> It is provided to keep old source working. We strongly advise against using it in new code.
|
---|
| 858 | <p> Attempts to match in <em>str</em>, starting from position <em>index</em>.
|
---|
| 859 | Returns the position of the match, or -1 if there was no match.
|
---|
| 860 | <p> The length of the match is stored in <em>*len</em>, unless <em>len</em> is a
|
---|
| 861 | null pointer.
|
---|
| 862 | <p> If <em>indexIsStart</em> is TRUE (the default), the position <em>index</em> in
|
---|
| 863 | the string will match the start of string anchor, <b>^</b>, in the
|
---|
| 864 | regexp, if present. Otherwise, position 0 in <em>str</em> will match.
|
---|
| 865 | <p> Use <a href="#search">search</a>() and <a href="#matchedLength">matchedLength</a>() instead of this function.
|
---|
| 866 | <p> <p>See also <a href="qstring.html#mid">QString::mid</a>() and <a href="qconststring.html">QConstString</a>.
|
---|
| 867 |
|
---|
| 868 | <p>Example: <a href="qmag-example.html#x1791">qmag/qmag.cpp</a>.
|
---|
| 869 | <h3 class=fn>int <a name="matchedLength"></a>QRegExp::matchedLength () const
|
---|
| 870 | </h3>
|
---|
| 871 | Returns the length of the last matched string, or -1 if there was
|
---|
| 872 | no match.
|
---|
| 873 | <p> <p>See also <a href="#exactMatch">exactMatch</a>(), <a href="#search">search</a>(), and <a href="#searchRev">searchRev</a>().
|
---|
| 874 |
|
---|
| 875 | <p>Examples: <a href="archivesearch-example.html#x480">network/archivesearch/archivedialog.ui.h</a> and <a href="regexptester-example.html#x2488">regexptester/regexptester.cpp</a>.
|
---|
| 876 | <h3 class=fn>bool <a name="minimal"></a>QRegExp::minimal () const
|
---|
| 877 | </h3>
|
---|
| 878 | Returns TRUE if minimal (non-greedy) matching is enabled;
|
---|
| 879 | otherwise returns FALSE.
|
---|
| 880 | <p> <p>See also <a href="#setMinimal">setMinimal</a>().
|
---|
| 881 |
|
---|
| 882 | <h3 class=fn>int <a name="numCaptures"></a>QRegExp::numCaptures () const
|
---|
| 883 | </h3>
|
---|
| 884 | Returns the number of captures contained in the <a href="qregexp.html#regular-expression">regular expression</a>.
|
---|
| 885 |
|
---|
| 886 | <p>Example: <a href="regexptester-example.html#x2489">regexptester/regexptester.cpp</a>.
|
---|
| 887 | <h3 class=fn>bool <a name="operator!-eq"></a>QRegExp::operator!= ( const <a href="qregexp.html">QRegExp</a> & rx ) const
|
---|
| 888 | </h3>
|
---|
| 889 |
|
---|
| 890 | <p> Returns TRUE if this <a href="qregexp.html#regular-expression">regular expression</a> is not equal to <em>rx</em>;
|
---|
| 891 | otherwise returns FALSE.
|
---|
| 892 | <p> <p>See also <a href="#operator-eq-eq">operator==</a>().
|
---|
| 893 |
|
---|
| 894 | <h3 class=fn><a href="qregexp.html">QRegExp</a> & <a name="operator-eq"></a>QRegExp::operator= ( const <a href="qregexp.html">QRegExp</a> & rx )
|
---|
| 895 | </h3>
|
---|
| 896 | Copies the <a href="qregexp.html#regular-expression">regular expression</a> <em>rx</em> and returns a reference to the
|
---|
| 897 | copy. The case sensitivity, wildcard and minimal matching options
|
---|
| 898 | are also copied.
|
---|
| 899 |
|
---|
| 900 | <h3 class=fn>bool <a name="operator-eq-eq"></a>QRegExp::operator== ( const <a href="qregexp.html">QRegExp</a> & rx ) const
|
---|
| 901 | </h3>
|
---|
| 902 | Returns TRUE if this <a href="qregexp.html#regular-expression">regular expression</a> is equal to <em>rx</em>;
|
---|
| 903 | otherwise returns FALSE.
|
---|
| 904 | <p> Two QRegExp objects are equal if they have the same pattern
|
---|
| 905 | strings and the same settings for case sensitivity, wildcard and
|
---|
| 906 | minimal matching.
|
---|
| 907 |
|
---|
| 908 | <h3 class=fn><a href="qstring.html">QString</a> <a name="pattern"></a>QRegExp::pattern () const
|
---|
| 909 | </h3>
|
---|
| 910 | Returns the pattern string of the <a href="qregexp.html#regular-expression">regular expression</a>. The pattern
|
---|
| 911 | has either regular expression syntax or wildcard syntax, depending
|
---|
| 912 | on <a href="#wildcard">wildcard</a>().
|
---|
| 913 | <p> <p>See also <a href="#setPattern">setPattern</a>().
|
---|
| 914 |
|
---|
| 915 | <h3 class=fn>int <a name="pos"></a>QRegExp::pos ( int nth = 0 )
|
---|
| 916 | </h3>
|
---|
| 917 | Returns the position of the <em>nth</em> captured text in the searched
|
---|
| 918 | string. If <em>nth</em> is 0 (the default), <a href="#pos">pos</a>() returns the position
|
---|
| 919 | of the whole match.
|
---|
| 920 | <p> Example:
|
---|
| 921 | <pre>
|
---|
| 922 | QRegExp rx( "/([a-z]+)/([a-z]+)" );
|
---|
| 923 | rx.<a href="#search">search</a>( "Output /dev/null" ); // returns 7 (position of /dev/null)
|
---|
| 924 | rx.<a href="#pos">pos</a>( 0 ); // returns 7 (position of /dev/null)
|
---|
| 925 | rx.<a href="#pos">pos</a>( 1 ); // returns 8 (position of dev)
|
---|
| 926 | rx.<a href="#pos">pos</a>( 2 ); // returns 12 (position of null)
|
---|
| 927 | </pre>
|
---|
| 928 |
|
---|
| 929 | <p> For zero-length matches, pos() always returns -1. (For example, if
|
---|
| 930 | <a href="#cap">cap</a>(4) would return an empty string, pos(4) returns -1.) This is
|
---|
| 931 | due to an implementation tradeoff.
|
---|
| 932 | <p> <p>See also <a href="#capturedTexts">capturedTexts</a>(), <a href="#exactMatch">exactMatch</a>(), <a href="#search">search</a>(), and <a href="#searchRev">searchRev</a>().
|
---|
| 933 |
|
---|
| 934 | <h3 class=fn>int <a name="search"></a>QRegExp::search ( const <a href="qstring.html">QString</a> & str, int offset = 0, <a href="qregexp.html#CaretMode-enum">CaretMode</a> caretMode = CaretAtZero ) const
|
---|
| 935 | </h3>
|
---|
| 936 | Attempts to find a match in <em>str</em> from position <em>offset</em> (0 by
|
---|
| 937 | default). If <em>offset</em> is -1, the search starts at the last
|
---|
| 938 | character; if -2, at the next to last character; etc.
|
---|
| 939 | <p> Returns the position of the first match, or -1 if there was no
|
---|
| 940 | match.
|
---|
| 941 | <p> The <em>caretMode</em> parameter can be used to instruct whether <b>^</b>
|
---|
| 942 | should match at index 0 or at <em>offset</em>.
|
---|
| 943 | <p> You might prefer to use <a href="qstring.html#find">QString::find</a>(), <a href="qstring.html#contains">QString::contains</a>() or
|
---|
| 944 | even <a href="qstringlist.html#grep">QStringList::grep</a>(). To replace matches use
|
---|
| 945 | <a href="qstring.html#replace">QString::replace</a>().
|
---|
| 946 | <p> Example:
|
---|
| 947 | <pre>
|
---|
| 948 | <a href="qstring.html">QString</a> str = "offsets: 1.23 .50 71.00 6.00";
|
---|
| 949 | QRegExp rx( "\\d*\\.\\d+" ); // primitive floating point matching
|
---|
| 950 | int count = 0;
|
---|
| 951 | int pos = 0;
|
---|
| 952 | while ( (pos = rx.<a href="#search">search</a>(str, pos)) != -1 ) {
|
---|
| 953 | count++;
|
---|
| 954 | pos += rx.<a href="#matchedLength">matchedLength</a>();
|
---|
| 955 | }
|
---|
| 956 | // pos will be 9, 14, 18 and finally 24; count will end up as 4
|
---|
| 957 | </pre>
|
---|
| 958 |
|
---|
| 959 | <p> Although const, this function sets <a href="#matchedLength">matchedLength</a>(),
|
---|
| 960 | <a href="#capturedTexts">capturedTexts</a>() and <a href="#pos">pos</a>().
|
---|
| 961 | <p> <p>See also <a href="#searchRev">searchRev</a>() and <a href="#exactMatch">exactMatch</a>().
|
---|
| 962 |
|
---|
| 963 | <p>Examples: <a href="archivesearch-example.html#x481">network/archivesearch/archivedialog.ui.h</a> and <a href="regexptester-example.html#x2490">regexptester/regexptester.cpp</a>.
|
---|
| 964 | <h3 class=fn>int <a name="searchRev"></a>QRegExp::searchRev ( const <a href="qstring.html">QString</a> & str, int offset = -1, <a href="qregexp.html#CaretMode-enum">CaretMode</a> caretMode = CaretAtZero ) const
|
---|
| 965 | </h3>
|
---|
| 966 | Attempts to find a match backwards in <em>str</em> from position <em>offset</em>. If <em>offset</em> is -1 (the default), the search starts at the
|
---|
| 967 | last character; if -2, at the next to last character; etc.
|
---|
| 968 | <p> Returns the position of the first match, or -1 if there was no
|
---|
| 969 | match.
|
---|
| 970 | <p> The <em>caretMode</em> parameter can be used to instruct whether <b>^</b>
|
---|
| 971 | should match at index 0 or at <em>offset</em>.
|
---|
| 972 | <p> Although const, this function sets <a href="#matchedLength">matchedLength</a>(),
|
---|
| 973 | <a href="#capturedTexts">capturedTexts</a>() and <a href="#pos">pos</a>().
|
---|
| 974 | <p> <b>Warning:</b> Searching backwards is much slower than searching
|
---|
| 975 | forwards.
|
---|
| 976 | <p> <p>See also <a href="#search">search</a>() and <a href="#exactMatch">exactMatch</a>().
|
---|
| 977 |
|
---|
| 978 | <h3 class=fn>void <a name="setCaseSensitive"></a>QRegExp::setCaseSensitive ( bool sensitive )
|
---|
| 979 | </h3>
|
---|
| 980 | Sets case sensitive matching to <em>sensitive</em>.
|
---|
| 981 | <p> If <em>sensitive</em> is TRUE, <b>\.txt$</b> matches <tt>readme.txt</tt> but
|
---|
| 982 | not <tt>README.TXT</tt>.
|
---|
| 983 | <p> <p>See also <a href="#caseSensitive">caseSensitive</a>().
|
---|
| 984 |
|
---|
| 985 | <p>Example: <a href="regexptester-example.html#x2491">regexptester/regexptester.cpp</a>.
|
---|
| 986 | <h3 class=fn>void <a name="setMinimal"></a>QRegExp::setMinimal ( bool minimal )
|
---|
| 987 | </h3>
|
---|
| 988 | Enables or disables minimal matching. If <em>minimal</em> is FALSE,
|
---|
| 989 | matching is greedy (maximal) which is the default.
|
---|
| 990 | <p> For example, suppose we have the input string "We must be
|
---|
| 991 | <b>bold</b>, very <b>bold</b>!" and the pattern
|
---|
| 992 | <b><b>.*</b></b>. With the default greedy (maximal) matching,
|
---|
| 993 | the match is "We must be <u><b>bold</b>, very
|
---|
| 994 | <b>bold</b></u>!". But with minimal (non-greedy) matching the
|
---|
| 995 | first match is: "We must be <u><b>bold</b></u>, very
|
---|
| 996 | <b>bold</b>!" and the second match is "We must be <b>bold</b>,
|
---|
| 997 | very <u><b>bold</b></u>!". In practice we might use the pattern
|
---|
| 998 | <b><b>[^<]+</b></b> instead, although this will still fail for
|
---|
| 999 | nested tags.
|
---|
| 1000 | <p> <p>See also <a href="#minimal">minimal</a>().
|
---|
| 1001 |
|
---|
| 1002 | <p>Examples: <a href="archivesearch-example.html#x482">network/archivesearch/archivedialog.ui.h</a> and <a href="regexptester-example.html#x2492">regexptester/regexptester.cpp</a>.
|
---|
| 1003 | <h3 class=fn>void <a name="setPattern"></a>QRegExp::setPattern ( const <a href="qstring.html">QString</a> & pattern )
|
---|
| 1004 | </h3>
|
---|
| 1005 | Sets the pattern string to <em>pattern</em>. The case sensitivity,
|
---|
| 1006 | wildcard and minimal matching options are not changed.
|
---|
| 1007 | <p> <p>See also <a href="#pattern">pattern</a>().
|
---|
| 1008 |
|
---|
| 1009 | <h3 class=fn>void <a name="setWildcard"></a>QRegExp::setWildcard ( bool wildcard )
|
---|
| 1010 | </h3>
|
---|
| 1011 | Sets the wildcard mode for the <a href="qregexp.html#regular-expression">regular expression</a>. The default is
|
---|
| 1012 | FALSE.
|
---|
| 1013 | <p> Setting <em>wildcard</em> to TRUE enables simple shell-like wildcard
|
---|
| 1014 | matching. (See <a href="#wildcard-matching">wildcard matching
|
---|
| 1015 | (globbing)</a>.)
|
---|
| 1016 | <p> For example, <b>r*.txt</b> matches the string <tt>readme.txt</tt> in
|
---|
| 1017 | wildcard mode, but does not match <tt>readme</tt>.
|
---|
| 1018 | <p> <p>See also <a href="#wildcard">wildcard</a>().
|
---|
| 1019 |
|
---|
| 1020 | <p>Example: <a href="regexptester-example.html#x2493">regexptester/regexptester.cpp</a>.
|
---|
| 1021 | <h3 class=fn>bool <a name="wildcard"></a>QRegExp::wildcard () const
|
---|
| 1022 | </h3>
|
---|
| 1023 | Returns TRUE if wildcard mode is enabled; otherwise returns FALSE.
|
---|
| 1024 | The default is FALSE.
|
---|
| 1025 | <p> <p>See also <a href="#setWildcard">setWildcard</a>().
|
---|
| 1026 |
|
---|
| 1027 | <!-- eof -->
|
---|
| 1028 | <hr><p>
|
---|
| 1029 | This file is part of the <a href="index.html">Qt toolkit</a>.
|
---|
| 1030 | Copyright © 1995-2007
|
---|
| 1031 | <a href="http://www.trolltech.com/">Trolltech</a>. All Rights Reserved.<p><address><hr><div align=center>
|
---|
| 1032 | <table width=100% cellspacing=0 border=0><tr>
|
---|
| 1033 | <td>Copyright © 2007
|
---|
| 1034 | <a href="troll.html">Trolltech</a><td align=center><a href="trademarks.html">Trademarks</a>
|
---|
| 1035 | <td align=right><div align=right>Qt 3.3.8</div>
|
---|
| 1036 | </table></div></address></body>
|
---|
| 1037 | </html>
|
---|