String Patterns
String Patterns
A string pattern is a combination of characters that can be used to find very specific pieces — often called substrings — that exist inside a longer string. String patterns are used with several string functions provided by Lua.
Direct Matches
Direct matches can be done for any non-magic characters by simply using them literally in a Lua function like string.match()
. For example, these commands look for the word Roblox within a string:
Notice that a match is found in the first string, so Roblox
is output to the console. However, a match is not found in the second string and the output is nil
.
Character Classes
Character classes are essential for more advanced string searches. They’re a way to search for something that isn’t necessarily character-specific but it fits within a known category (class). In Lua, you can search a string for letters, digits, spaces, punctuation, and more.
The following table shows the official character classes for Lua string patterns:
Class | Represents | Example Match |
---|---|---|
. |
Any character | 32kasGJ1%fTlk?@94 |
%a |
An uppercase or lowercase letter | aBcDeFgHiJkLmNoPqRsTuVwXyZ |
%l |
A lowercase letter | abcdefghijklmnopqrstuvwxyz |
%u |
An uppercase letter | ABCDEFGHIJKLMNOPQRSTUVWXYZ |
%d |
Any digit (number) | 0123456789 |
%p |
Any punctuation character | !@#;,. |
%w |
An alphanumeric character (either a letter or a number) | aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789 |
%s |
A space or whitespace character | _ , \n , and \r |
%c |
A special control character | |
%x |
A hexadecimal character | 0123456789ABCDEF |
%z |
The NULL character (\0 ) |
%a
, %s
, etc. — the corresponding uppercase letter represents the “opposite” of the class. For instance, %p
represents a punctuation character while %P
represents all characters except punctuation.
Magic Characters
There are 12 “magic characters” which are reserved for special purposes in patterns:
$ |
% |
^ |
* |
( |
) |
. |
[ |
] |
+ |
- |
? |
Instead of using their special meaning, you can precede them with a %
symbol to search for them literally. This is called character escaping. For example, to search for roblox.com, you’ll need to escape the .
(period) symbol by preceding it with a %
.
Anchors
To ensure a pattern occurs at the beginning of a string, you can use the ^
symbol to represent the “head” of the string. Conversely, the $
symbol ensures a pattern occurs at the end of a string.
You can also use both ^
and $
together to ensure a pattern matches only the full string and not just some portion of it.
Class Modifiers
By itself, a character class will only match one character in a string. For instance, the pattern below ("%d"
) starts reading the string from left to right, finds the first digit (2
), and stops.
Fortunately, you can use modifiers with any character class to control the result:
Quantifier | Meaning |
---|---|
+ |
Match 1 or more of the preceding character class |
- |
Match as few of the preceding character class as possible |
* |
Match 0 or more of the preceding character class |
? |
Match 1 or less of the preceding character class |
%n |
For n between 1 and 9, matches a substring equal to the n -th captured string. |
%bxy |
The balanced capture matching x , y , and everything between (for example, %b() matches a pair of parentheses and everything between them) |
Adding a modifier to the same pattern above ("%d+"
instead of "%d"
), outputs 25
instead of 2
:
Class Sets
Sets should be used when a single character class can’t do the whole job. For instance, you might want to match both lowercase letters (%l
) and punctuation characters (%p
) using a single pattern.
Sets are defined by brackets []
around them. In the following example, notice the difference between using a set ("[%l%p]+"
) and not using a set ("%l%p+"
).
The first command (set) tells Lua to find both lowercase characters and punctuation. With the +
quantifier added after the entire set, it finds all of those characters (ello!!!
), stopping when it reaches the space.
In the second command (non-set), the +
quantifier only applies to the %p
class before it, so Lua grabs only the first lowercase character (o
) before the series of punctuation (!!!
).
^
character at the beginning of the set, directly after the opening [
. For instance, "[%p%s]+"
represents both punctuation and spaces, while "[^%p%s]+"
represents all characters except punctuation and spaces.
String Captures
String captures are sub-patterns within a pattern. These are enclosed in parentheses ()
and are used to get (capture) matching substrings and save them to variables. For example, the pattern below contains two captures, (%a+)
and (%d+)
, which return two substrings upon a successful match.
?
quantifier that follows both of the %s
classes is a safe addition because it makes the space on either side of the =
sign optional. That means the match will succeed if one (or both) spaces are missing around the equal sign.
String captures can also be nested as in the following example:
This pattern search works as follows:
- The
string.gmatch()
iterator looks for a match on the entire “description” pattern defined by the outer pair of parentheses. This stops at the first comma and captures the following:
# | Pattern | Capture |
---|---|---|
1 | (The%s(%a+%sKingdom)[%w%s]+) |
The Cloud Kingdom is heavenly |
- Using its successful first capture, the iterator then looks for a match on the “kingdom” pattern defined by the inner pair of parentheses. This nested pattern simply captures the following:
# | Pattern | Capture |
---|---|---|
2 | (%a+%sKingdom) |
Cloud Kingdom |
- The iterator then backs out and continues searching the full string, capturing the following:
# | Pattern | Capture |
---|---|---|
3 | (The%s(%a+%sKingdom)[%w%s]+) |
The Forest Kingdom is peaceful |
4 | (%a+%sKingdom) |
Forest Kingdom |