Difference between revisions of "String"
m (Text replacement - "<source" to "<syntaxhighlight") |
|||
(One intermediate revision by one other user not shown) | |||
Line 23: | Line 23: | ||
print(match1) -- Gideros | print(match1) -- Gideros | ||
print(match2) -- nil | print(match2) -- nil | ||
− | </ | + | </syntaxhighlight> |
Notice that the first string has a match, so Gideros outputs to the Output window, but the second string doesn't have a match, so nil outputs to the Output window. | Notice that the first string has a match, so Gideros outputs to the Output window, but the second string doesn't have a match, so nil outputs to the Output window. | ||
Line 66: | Line 66: | ||
You can escape and search for magic characters using the '''%''' symbol. For example, to search for giderosmobile.com, escape the . (period) symbol by preceding it with a % as in %.. | You can escape and search for magic characters using the '''%''' symbol. For example, to search for giderosmobile.com, escape the . (period) symbol by preceding it with a % as in %.. | ||
<syntaxhighlight lang="lua"> | <syntaxhighlight lang="lua"> | ||
− | -- Incorrect: " | + | -- Incorrect: "gideros.rocks" matches "gideros#rocks" because the period is interpreted as "any character" |
− | local match1 = string.match("What is | + | local match1 = string.match("What is gideros#rocks?", "gideros.rocks") |
-- Correct: Escape the period with % so it is interpreted as a literal period character | -- Correct: Escape the period with % so it is interpreted as a literal period character | ||
− | local match2 = string.match("I love | + | local match2 = string.match("I love gideros.rocks!", "gideros%.rocks") |
− | print(match1) -- | + | print(match1) -- gideros#rocks |
− | print(match2) -- | + | print(match2) -- gideros.rocks |
− | </ | + | </syntaxhighlight> |
==== Anchors ==== | ==== Anchors ==== | ||
Line 88: | Line 88: | ||
print(end1) -- third | print(end1) -- third | ||
print(end2) -- nil | print(end2) -- nil | ||
− | </ | + | </syntaxhighlight> |
You can also use both '''^''' and '''$''' together to ensure a pattern matches only the full string and not just some portion of it. | You can also use both '''^''' and '''$''' together to ensure a pattern matches only the full string and not just some portion of it. | ||
Line 99: | Line 99: | ||
print(match2) -- nil | print(match2) -- nil | ||
print(match3) -- Gideros | print(match3) -- Gideros | ||
− | </ | + | </syntaxhighlight> |
==== Class Modifiers ==== | ==== Class Modifiers ==== | ||
Line 106: | Line 106: | ||
local match = string.match("The Cloud Kingdom has 25 power gems", "%d") | local match = string.match("The Cloud Kingdom has 25 power gems", "%d") | ||
print(match) -- 2 | print(match) -- 2 | ||
− | </ | + | </syntaxhighlight> |
You can use modifiers with any character class to control the result:</br> | You can use modifiers with any character class to control the result:</br> | ||
Line 123: | Line 123: | ||
local match2 = string.match("The Cloud Kingdom has 25 power gems", "%d+") | local match2 = string.match("The Cloud Kingdom has 25 power gems", "%d+") | ||
print(match2) -- 25 | print(match2) -- 25 | ||
− | </ | + | </syntaxhighlight> |
==== Class Sets ==== | ==== Class Sets ==== | ||
Line 134: | Line 134: | ||
print(match1) -- ello!!! | print(match1) -- ello!!! | ||
print(match2) -- o!!! | print(match2) -- o!!! | ||
− | </ | + | </syntaxhighlight> |
The first command (set) tells Lua to find both lowercase characters and punctuation. With the + quantifier added after the entire set, it finds all of those characters (ello!!!), stopping when it reaches the space. | The first command (set) tells Lua to find both lowercase characters and punctuation. With the + quantifier added after the entire set, it finds all of those characters (ello!!!), stopping when it reaches the space. | ||
Line 154: | Line 154: | ||
local key3, val3 = string.match("OneMillion=1000000", pattern) | local key3, val3 = string.match("OneMillion=1000000", pattern) | ||
print(key3, val3) -- OneMillion 1000000 | print(key3, val3) -- OneMillion 1000000 | ||
− | </ | + | </syntaxhighlight> |
In the previous pattern, the ? quantifier that follows both of the %s classes is a safe addition because it makes the space on either side of the = sign optional. That means the match succeeds if one (or both) spaces are missing around the equal sign. | In the previous pattern, the ? quantifier that follows both of the %s classes is a safe addition because it makes the space on either side of the = sign optional. That means the match succeeds if one (or both) spaces are missing around the equal sign. | ||
Line 173: | Line 173: | ||
Forest Kingdom | Forest Kingdom | ||
]] | ]] | ||
− | </ | + | </syntaxhighlight> |
This pattern search works as follows: | This pattern search works as follows: | ||
Line 200: | Line 200: | ||
<syntaxhighlight lang="lua"> | <syntaxhighlight lang="lua"> | ||
local str = "My icon \u{2590} !!!") | local str = "My icon \u{2590} !!!") | ||
− | </ | + | </syntaxhighlight> |
{|- | {|- |
Latest revision as of 02:05, 9 November 2024
Supported platforms:
Available since: Gideros 2011.6
Description
This library provides generic functions to manipulate strings, such as finding and extracting substrings, and pattern matching.
When indexing a string in Lua, the first character is at position 1 (not at 0, as in C). Indices are allowed to be negative and are interpreted as indexing backwards, from the end of the string. Thus, the last character is at position -1, and so on.
The string library provides all its functions inside the table string. It also sets a metatable for strings where the __index field points to the string table. Therefore, you can use the string functions in object-oriented style. For instance, string.byte(s, i) can be written as s:byte(i).
The string library assumes one-byte character encodings
String Patterns
A string pattern is a combination of characters that you can use with string.match(), string.gmatch(), and other functions to find a piece, or substring, of a longer string.
Direct Matches
You can use direct matches in a Lua function like string.match(), except for magic characters. For example, these commands look for the word Gideros within a string:
local match1 = string.match("Welcome to Gideros!", "Gideros")
local match2 = string.match("Welcome to my awesome game!", "Gideros")
print(match1) -- Gideros
print(match2) -- nil
Notice that the first string has a match, so Gideros outputs to the Output window, but the second string doesn't have a match, so nil outputs to the Output window.
Character Classes
Character classes are essential for more advanced string searches. You can use them to search for something that isn't necessarily character-specific but fits within a known category (class), including letters, digits, spaces, punctuation, and more.
The following table shows the official character classes for Lua string patterns:
Class | Represents | Example Match |
---|---|---|
. | Any character | 32kasGJ1%fTlk?@94 |
%a | An uppercase or lowercase letter | aBcDeFgHiJkLmNoPqRsTuVwXyZ |
%l | A lowercase letter | abcdefghijklmnopqrstuvwxyz |
%u | An uppercase letter | ABCDEFGHIJKLMNOPQRSTUVWXYZ |
%d | Any digit (number) | 0123456789 |
%p | Any punctuation character | !@#;,. |
%w | An alphanumeric character (either a letter or a number) | aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789 |
%s | A space or whitespace character | , \n, and \r |
%c | A special control character | |
%x | A hexadecimal character | 0123456789ABCDEF |
%z | The NULL character (\0) |
For single-letter character classes such as %a and %s, the corresponding uppercase letter represents the "opposite" of the class. For instance, %p represents a punctuation character while %P represents all characters except punctuation.
Magic Characters
There are 12 "magic characters" which are reserved for special purposes in patterns:
- $ % ^ * ( ) . [ ] + - ?
You can escape and search for magic characters using the % symbol. For example, to search for giderosmobile.com, escape the . (period) symbol by preceding it with a % as in %..
-- Incorrect: "gideros.rocks" matches "gideros#rocks" because the period is interpreted as "any character"
local match1 = string.match("What is gideros#rocks?", "gideros.rocks")
-- Correct: Escape the period with % so it is interpreted as a literal period character
local match2 = string.match("I love gideros.rocks!", "gideros%.rocks")
print(match1) -- gideros#rocks
print(match2) -- gideros.rocks
Anchors
You can search for a pattern at the beginning or end of a string by using the ^ and $ symbols.
-- Using ^ to match the beginning of a string
local start1 = string.match("first second third", "^first") -- Matches because "first" is at the beginning
local start2 = string.match("third second first", "^first") -- Doesn't match because "first" isn't at the beginning
print(start1) -- first
print(start2) -- nil
-- Using $ to match the end of a string
local end1 = string.match("first second third", "third$") -- Matches because "third" is at the end
local end2 = string.match("third second first", "third$") -- Doesn't match because "third" isn't at the end
print(end1) -- third
print(end2) -- nil
You can also use both ^ and $ together to ensure a pattern matches only the full string and not just some portion of it.
-- Using both ^ and $ to match across a full string
local match1 = string.match("Gideros", "^Gideros$") -- Matches because "Gideros" is the entire string (equality)
local match2 = string.match("I play Gideros", "^Gideros$") -- Doesn't match because "Gideros" isn't at the beginning AND end
local match3 = string.match("I play Gideros", "Gideros") -- Matches because "Gideros" is contained within "I play Gideros"
print(match1) -- Gideros
print(match2) -- nil
print(match3) -- Gideros
Class Modifiers
By itself, a character class only matches one character in a string. For instance, the following pattern ("%d") starts reading the string from left to right, finds the first digit (2), and stops.
local match = string.match("The Cloud Kingdom has 25 power gems", "%d")
print(match) -- 2
You can use modifiers with any character class to control the result:
- Quantifier Meaning
- + Match 1 or more of the preceding character class
- - Match as few of the preceding character class as possible
- * Match 0 or more of the preceding character class
- ? Match 1 or less of the preceding character class
- %n For n between 1 and 9, matches a substring equal to the n-th captured string.
- %bxy The balanced capture matching x, y, and everything between (for example, %b() matches a pair of parentheses and everything between them)
Adding a modifier to the same pattern ("%d+" instead of "%d"), outputs 25 instead of 2:
local match1 = string.match("The Cloud Kingdom has 25 power gems", "%d")
print(match1) -- 2
local match2 = string.match("The Cloud Kingdom has 25 power gems", "%d+")
print(match2) -- 25
Class Sets
Sets should be used when a single character class can't do the whole job. For instance, you might want to match both lowercase letters (%l) and punctuation characters (%p) using a single pattern.
Sets are defined by brackets [] around them. In the following example, notice the difference between using a set ("[%l%p]+") and not using a set ("%l%p+").
local match1 = string.match("Hello!!! I am another string.", "[%l%p]+") -- Set
local match2 = string.match("Hello!!! I am another string.", "%l%p+") -- Non-set
print(match1) -- ello!!!
print(match2) -- o!!!
The first command (set) tells Lua to find both lowercase characters and punctuation. With the + quantifier added after the entire set, it finds all of those characters (ello!!!), stopping when it reaches the space.
In the second command (non-set), the + quantifier only applies to the %p class before it, so Lua grabs only the first lowercase character (o) before the series of punctuation (!!!).
Like character classes, sets can be "opposites" of themselves. This is done by adding a ^ character at the beginning of the set, directly after the opening [. For instance, "[%p%s]+" represents both punctuation and spaces, while "[^%p%s]+" represents all characters except punctuation and spaces.
Sets also support ranges which let you find an entire range of matches between a starting and ending character. This is an advanced feature which is outlined in more detail on the Lua 5.1 Manual. String Captures
String captures are sub-patterns within a pattern. These are enclosed in parentheses () and are used to get (capture) matching substrings and save them to variables. For example, the following pattern contains two captures, (%a+) and (%d+), which return two substrings upon a successful match.
local pattern = "(%a+)%s?=%s?(%d+)"
local key1, val1 = string.match("TwentyOne = 21", pattern)
print(key1, val1) -- TwentyOne 21
local key2, val2 = string.match("TwoThousand= 2000", pattern)
print(key2, val2) -- TwoThousand 2000
local key3, val3 = string.match("OneMillion=1000000", pattern)
print(key3, val3) -- OneMillion 1000000
In the previous pattern, the ? quantifier that follows both of the %s classes is a safe addition because it makes the space on either side of the = sign optional. That means the match succeeds if one (or both) spaces are missing around the equal sign.
String captures can also be nested as the following example:
local places = "The Cloud Kingdom is heavenly, The Forest Kingdom is peaceful"
local pattern = "(The%s(%a+%sKingdom)[%w%s]+)"
for description, kingdom in string.gmatch(places, pattern) do
print(description)
print(kingdom)
end
--[[Expected Output:
The Cloud Kingdom is heavenly
Cloud Kingdom
The Forest Kingdom is peaceful
Forest Kingdom
]]
This pattern search works as follows:
The string.gmatch() iterator looks for a match on the entire "description" pattern defined by the outer pair of parentheses. This stops at the first comma and captures the following:
- # Pattern Capture
- 1 (The%s(%a+%sKingdom)[%w%s]+) The Cloud Kingdom is heavenly
Using its successful first capture, the iterator then looks for a match on the "kingdom" pattern defined by the inner pair of parentheses. This nested pattern simply captures the following:
- # Pattern Capture
- 2 (%a+%sKingdom) Cloud Kingdom
The iterator then backs out and continues searching the full string, capturing the following:
- # Pattern Capture
- 3 (The%s(%a+%sKingdom)[%w%s]+) The Forest Kingdom is peaceful
- 4 (%a+%sKingdom) Forest Kingdom
String literals
Luau implements support for hexadecimal (\x), Unicode (\u) and \z escapes for string literals.
This syntax follows Lua 5.3 syntax:
- \xAB inserts a character with the code 0xAB into the string
- \u{ABC} inserts a UTF8 byte sequence that encodes U+0ABC character into the string (note that braces are mandatory)
- \z at the end of the line inside a string literal ignores all following whitespace including newlines, which can be helpful for breaking long literals into multiple lines
local str = "My icon \u{2590} !!!")
Methodsstring.byte returns numerical code |
EventsConstants |