Difference between revisions of "String"

From GiderosMobile
(Undo revision 19238 by MoKaLux (talk))
Tag: Undo
Line 190: Line 190:
 
*3 (The%s(%a+%sKingdom)[%w%s]+) The Forest Kingdom is peaceful
 
*3 (The%s(%a+%sKingdom)[%w%s]+) The Forest Kingdom is peaceful
 
*4 (%a+%sKingdom) Forest Kingdom
 
*4 (%a+%sKingdom) Forest Kingdom
 +
 +
==== String literals ====
 +
Luau implements support for hexadecimal (\x), Unicode (\u) and \z escapes for string literals.
 +
 +
This syntax follows Lua 5.3 syntax:
 +
*'''\xAB''' inserts a character with the code 0xAB into the string
 +
*'''\u{ABC}''' inserts a UTF8 byte sequence that encodes U+0ABC character into the string (note that braces are mandatory)
 +
*'''\z''' at the end of the line inside a string literal ignores all following whitespace including newlines, which can be helpful for breaking long literals into multiple lines
 +
<source lang="lua">
 +
local str = "My icon \u{2590} !!!")
 +
</source>
  
 
{|-
 
{|-

Revision as of 22:45, 11 June 2023

Supported platforms: Platform android.pngPlatform ios.pngPlatform mac.pngPlatform pc.pngPlatform html5.pngPlatform winrt.pngPlatform win32.png
Available since: Gideros 2011.6

Description

This library provides generic functions to manipulate strings, such as finding and extracting substrings, and pattern matching.

When indexing a string in Lua, the first character is at position 1 (not at 0, as in C). Indices are allowed to be negative and are interpreted as indexing backwards, from the end of the string. Thus, the last character is at position -1, and so on.

The string library provides all its functions inside the table string. It also sets a metatable for strings where the __index field points to the string table. Therefore, you can use the string functions in object-oriented style. For instance, string.byte(s, i) can be written as s:byte(i).

The string library assumes one-byte character encodings

String Patterns

A string pattern is a combination of characters that you can use with string.match(), string.gmatch(), and other functions to find a piece, or substring, of a longer string.

Direct Matches

You can use direct matches in a Lua function like string.match(), except for magic characters. For example, these commands look for the word Gideros within a string:

local match1 = string.match("Welcome to Gideros!", "Gideros")
local match2 = string.match("Welcome to my awesome game!", "Gideros")
print(match1) -- Gideros
print(match2) -- nil

Notice that the first string has a match, so Gideros outputs to the Output window, but the second string doesn't have a match, so nil outputs to the Output window.

Character Classes

Character classes are essential for more advanced string searches. You can use them to search for something that isn't necessarily character-specific but fits within a known category (class), including letters, digits, spaces, punctuation, and more.

The following table shows the official character classes for Lua string patterns:

Class Represents Example Match
. Any character 32kasGJ1%fTlk?@94
%a An uppercase or lowercase letter aBcDeFgHiJkLmNoPqRsTuVwXyZ
%l A lowercase letter abcdefghijklmnopqrstuvwxyz
%u An uppercase letter ABCDEFGHIJKLMNOPQRSTUVWXYZ
%d Any digit (number) 0123456789
%p Any punctuation character !@#;,.
%w An alphanumeric character (either a letter or a number) aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789
%s A space or whitespace character , \n, and \r
%c A special control character
%x A hexadecimal character 0123456789ABCDEF
%z The NULL character (\0)

For single-letter character classes such as %a and %s, the corresponding uppercase letter represents the "opposite" of the class. For instance, %p represents a punctuation character while %P represents all characters except punctuation.

Magic Characters

There are 12 "magic characters" which are reserved for special purposes in patterns:

  • $ % ^ * ( ) . [ ] + - ?

You can escape and search for magic characters using the % symbol. For example, to search for giderosmobile.com, escape the . (period) symbol by preceding it with a % as in %..

-- Incorrect: "giderosmobile.com" matches "giderosmobile#com" because the period is interpreted as "any character"
local match1 = string.match("What is giderosmobile#com?", "giderosmobile.com")
-- Correct: Escape the period with % so it is interpreted as a literal period character
local match2 = string.match("I love giderosmobile.com!", "giderosmobile%.com")
print(match1) -- giderosmobile#com
print(match2) -- giderosmobile.com

Anchors

You can search for a pattern at the beginning or end of a string by using the ^ and $ symbols.

-- Using ^ to match the beginning of a string
local start1 = string.match("first second third", "^first")  -- Matches because "first" is at the beginning
local start2 = string.match("third second first", "^first")  -- Doesn't match because "first" isn't at the beginning
print(start1) -- first
print(start2) -- nil

-- Using $ to match the end of a string
local end1 = string.match("first second third", "third$")  -- Matches because "third" is at the end
local end2 = string.match("third second first", "third$")  -- Doesn't match because "third" isn't at the end
print(end1) -- third
print(end2) -- nil

You can also use both ^ and $ together to ensure a pattern matches only the full string and not just some portion of it.

-- Using both ^ and $ to match across a full string
local match1 = string.match("Gideros", "^Gideros$")  -- Matches because "Gideros" is the entire string (equality)
local match2 = string.match("I play Gideros", "^Gideros$")  -- Doesn't match because "Gideros" isn't at the beginning AND end
local match3 = string.match("I play Gideros", "Gideros")  -- Matches because "Gideros" is contained within "I play Gideros"
print(match1) -- Gideros
print(match2) -- nil
print(match3) -- Gideros

Class Modifiers

By itself, a character class only matches one character in a string. For instance, the following pattern ("%d") starts reading the string from left to right, finds the first digit (2), and stops.

local match = string.match("The Cloud Kingdom has 25 power gems", "%d")
print(match) -- 2

You can use modifiers with any character class to control the result:

  • Quantifier Meaning
  • + Match 1 or more of the preceding character class
  • - Match as few of the preceding character class as possible
  • * Match 0 or more of the preceding character class
  • ? Match 1 or less of the preceding character class
  • %n For n between 1 and 9, matches a substring equal to the n-th captured string.
  • %bxy The balanced capture matching x, y, and everything between (for example, %b() matches a pair of parentheses and everything between them)

Adding a modifier to the same pattern ("%d+" instead of "%d"), outputs 25 instead of 2:

local match1 = string.match("The Cloud Kingdom has 25 power gems", "%d")
print(match1) -- 2
local match2 = string.match("The Cloud Kingdom has 25 power gems", "%d+")
print(match2) -- 25

Class Sets

Sets should be used when a single character class can't do the whole job. For instance, you might want to match both lowercase letters (%l) and punctuation characters (%p) using a single pattern.

Sets are defined by brackets [] around them. In the following example, notice the difference between using a set ("[%l%p]+") and not using a set ("%l%p+").

local match1 = string.match("Hello!!! I am another string.", "[%l%p]+")  -- Set
local match2 = string.match("Hello!!! I am another string.", "%l%p+")    -- Non-set
print(match1) -- ello!!!
print(match2) -- o!!!

The first command (set) tells Lua to find both lowercase characters and punctuation. With the + quantifier added after the entire set, it finds all of those characters (ello!!!), stopping when it reaches the space.

In the second command (non-set), the + quantifier only applies to the %p class before it, so Lua grabs only the first lowercase character (o) before the series of punctuation (!!!).

Like character classes, sets can be "opposites" of themselves. This is done by adding a ^ character at the beginning of the set, directly after the opening [. For instance, "[%p%s]+" represents both punctuation and spaces, while "[^%p%s]+" represents all characters except punctuation and spaces.

Sets also support ranges which let you find an entire range of matches between a starting and ending character. This is an advanced feature which is outlined in more detail on the Lua 5.1 Manual. String Captures

String captures are sub-patterns within a pattern. These are enclosed in parentheses () and are used to get (capture) matching substrings and save them to variables. For example, the following pattern contains two captures, (%a+) and (%d+), which return two substrings upon a successful match.

local pattern = "(%a+)%s?=%s?(%d+)"
local key1, val1 = string.match("TwentyOne = 21", pattern)
print(key1, val1) -- TwentyOne 21
local key2, val2 = string.match("TwoThousand= 2000", pattern)
print(key2, val2) -- TwoThousand 2000
local key3, val3 = string.match("OneMillion=1000000", pattern)
print(key3, val3) -- OneMillion 1000000

In the previous pattern, the ? quantifier that follows both of the %s classes is a safe addition because it makes the space on either side of the = sign optional. That means the match succeeds if one (or both) spaces are missing around the equal sign.

String captures can also be nested as the following example:

local places = "The Cloud Kingdom is heavenly, The Forest Kingdom is peaceful"
local pattern = "(The%s(%a+%sKingdom)[%w%s]+)"

for description, kingdom in string.gmatch(places, pattern) do
  print(description)
  print(kingdom)
end
--[[Expected Output:
The Cloud Kingdom is heavenly
Cloud Kingdom
The Forest Kingdom is peaceful
Forest Kingdom
]]

This pattern search works as follows:

The string.gmatch() iterator looks for a match on the entire "description" pattern defined by the outer pair of parentheses. This stops at the first comma and captures the following:

  • # Pattern Capture
  • 1 (The%s(%a+%sKingdom)[%w%s]+) The Cloud Kingdom is heavenly

Using its successful first capture, the iterator then looks for a match on the "kingdom" pattern defined by the inner pair of parentheses. This nested pattern simply captures the following:

  • # Pattern Capture
  • 2 (%a+%sKingdom) Cloud Kingdom

The iterator then backs out and continues searching the full string, capturing the following:

  • # Pattern Capture
  • 3 (The%s(%a+%sKingdom)[%w%s]+) The Forest Kingdom is peaceful
  • 4 (%a+%sKingdom) Forest Kingdom

String literals

Luau implements support for hexadecimal (\x), Unicode (\u) and \z escapes for string literals.

This syntax follows Lua 5.3 syntax:

  • \xAB inserts a character with the code 0xAB into the string
  • \u{ABC} inserts a UTF8 byte sequence that encodes U+0ABC character into the string (note that braces are mandatory)
  • \z at the end of the line inside a string literal ignores all following whitespace including newlines, which can be helpful for breaking long literals into multiple lines
local str = "My icon \u{2590} !!!")

Methods

string.byte returns numerical code
string.char returns a string built from 0 or more integers
string.dump returns binary representation of function, used with loadstring
string.find matches pattern in s, returns start,end indices, else nil
string.format returns formatted string, printf-style
string.gmatch returns iterator function that returns next captures from pattern pat on s
string.gsub returns copy of s with pat replaced by repl, and substitutions made
string.len returns string length
string.lower returns string with letters in lower case
string.match searches a string for a pattern
string.pack returns a binary string containing the provided arguments
string.packsize returns the size in bytes of any string packed with a given description
string.rep returns string with n copies of string s
string.reverse returns a string that is the string s reversed
string.split splits a string into parts based on the defined separator character
string.sub returns substring from index i to j of s
string.unpack extracts the values packed in the provided binary string based on the description in the first argument
string.upper returns string with letters in upper case

Events

Constants