Strings
- String Literals
- String Interpolation
- Equality
- Comparisons
- Concatenating
- Iterating
- Indexing
- Containment
- Methods
A string, str
, is an immutable array of bytes.
-
$str(arg: any) -> str
-
Stringifies the argument — i.e. returns its default string representation. If the argument has a
$str()
method, the output of this method will be returned.Note that calling
$str()
on anf64
prints its value to 6 decimal digits of precision, stripping trailing zeros after the decimal point.
Pyro strings have methods that let you manipulate them as ASCII or as UTF-8 but the string type itself is agnostic about its encoding — a string can contain any sequence of byte values including null bytes or invalid UTF-8.
String Literals
String literals come in two flavours — regular (double-quoted) and raw (backticked). Both can contain arbitrary byte sequences, including literal newlines.
Regular string literals use double quotes:
var foo = "a string"; var bar = "a string with multiple lines";
Regular string literals process the following backslashed escapes:
\\
|
backslash |
\0
|
null byte |
\"
|
double quote |
\'
|
single quote |
\$
|
dollar symbol |
\b
|
backspace |
\e
|
escape |
\n
|
newline |
\r
|
carriage return |
\t
|
tab |
\x##
|
8-bit hex-encoded byte value |
\u####
|
16-bit hex-encoded unicode code point (output as UTF-8) |
\U########
|
32-bit hex-encoded unicode code point (output as UTF-8) |
Raw string literals use backticks:
var foo = `a raw string`; var bar = `a raw string with multiple lines`;
Raw string literals ignore backslashed escapes. The only character a raw string literal can't contain is a backtick as this would end the string.
String Interpolation
You can interpolate the value of an expression into a double-quoted string using ${}
, e.g.
var value = "xyz"; assert "abc ${value} def" == `abc xyz def`; assert "abc ${value:to_upper()} def" == `abc XYZ def`;
You can interpolate the value of any expression into a string using ${}
.
If the value of the expression isn't a string, it will be automatically stringified — this is equivalent to calling $str()
on the value, e.g.
var value = 123; assert "abc ${value} def" == `abc 123 def`; assert "abc ${value + 1} def" == `abc 124 def`;
You can backslash-escape a $
symbol in a double-quoted string to prevent it being treated as the opening of an interpolated expression, e.g.
var value = 123; assert "abc \${value} def" == `abc ${value} def`;
Interpolated expressions can be nested arbitrarily — i.e. an interpolated expression can contain a double-quoted string containing an interpolated expression containing a double-quoted string containing an interpolated expression, etc.
You can format the value of an interpolated expression by supplying a format-specifier after a semicolon, e.g.
var value = 123; assert "${value;05d}" == `00123`;
See the string formatting documentation for the syntax of format-specifiers.
Equality
Strings compare as equal using the ==
operator if they have the same content, e.g.
var foo = "foobar"; var bar = "foobar"; assert foo == bar;
Comparisons
You can compare strings using the comparison operators, <
, <=
, >
, >=
, e.g.
assert "abc" < "def";
Strings are compared lexicographically by byte value, e.g.
assert "a" < "aa"; assert "aa" < "aaa";
Concatenating
You can concatenate two strings using the +
operator, e.g.
assert "abc" + "def" == "abcdef";
You can multiply a string by an integer n
to produce a new string containing n
copies of the original, e.g.
assert "foo" * 3 == "foofoofoo"
Iterating
A string is an immutable sequence of bytes. You can iterate over this sequence in three different ways.
-
You can iterate over a string directly. This iterates over the individual byte values, returning each value as a single-byte string, e.g.
>>> for byte in "foo" { ... echo $debug(byte); ... } "f" "o" "o"
-
You can iterate over the string's byte values as integers using the
:bytes()
method, e.g.
>>> for byte in "foo":bytes() { ... echo $debug(byte); ... } 102 111 111
-
You can iterate over the string's
char
values, i.e. UTF-8 encoded Unicode code points, using the:chars()
method, e.g.
>>> for char in "foo":chars() { ... echo $debug(char); ... } 'f' 'o' 'o'
Indexing
You can index into a string to get (but not set) individual byte values. Each byte value is returned as a single-byte string, e.g.
assert "foo"[0] == "f"; assert "foo"[1] == "o"; assert "foo"[2] == "o";
Use the :byte()
method to access individual byte values as integers, e.g.
assert "foo":byte(0) == 102;
Use the :char()
method to access individual UTF-8 encoded code points, e.g.
assert "foo":char(0) == 'f';
Containment
You can check if a string contains a substring using the in
operator:
assert "foo" in "foobar";
You can also use the in
operator to check if a string contains a char
:
assert 'b' in "foobar";
This is equivalent to calling the string's :contains()
method.
Methods
-
:byte(index: i64) -> i64
-
Returns the byte value at
index
as an integer in the range[0, 255]
. -
:byte_count() -> i64
-
Returns the number of bytes in the string.
-
:bytes() -> iter[i64]
-
Returns an iterator over the string's individual byte values, returning each value as an integer.
-
:char(index: i64) -> char
-
Returns the
char
atindex
, whereindex
is a zero-based offset into the sequence of UTF-8 encoded Unicode code points in the string.This is a potentially expensive method as it needs to seek forward from the beginning of the string.
Note that it will panic if it encounters invalid UTF-8 along the way.
-
:char_count() -> i64
-
Returns the number of UTF-8 encoded Unicode code points in the string.
This is a potentially expensive method as it needs to traverse the string.
Note that it will panic if it encounters invalid UTF-8.
-
:chars() -> iter[char]
-
Returns an iterator over the string's
char
values, i.e. UTF-8 encoded Unicode code points.This method will panic if the string contains invalid UTF-8.
-
:contains(target: str|char) -> bool
-
Returns
true
if the string contains the substring or charactertarget
.(Note that every string contains the empty string
""
as the empty string is a valid substring of every string.) -
:count() -> i64
-
Returns the number of bytes in the string. This method is an alias for
:byte_count()
. -
:ends_with(suffix: str) -> bool
-
Returns
true
if the string ends with the stringsuffix
, otherwisefalse
. -
:index_of(target: str) -> i64|err
:index_of(target: str, start_index: i64) -> i64|err
-
Returns the byte index of the next matching instance of the string
target
. Starts searching atstart_index
, which defaults to 0 if not specified. Returns anerr
iftarget
is not found. -
:is_ascii() -> bool
-
Returns
true
if the string contains only byte values in the range[0, 127]
. (This is a potentially expensive method as it needs to traverse the string.)Returns
false
if the string is empty. -
:is_ascii_decimal() -> bool
-
Returns
true
if the string contains only ASCII decimal digits, i.e."0"
to"9"
.Returns
false
if the string is empty. -
:is_ascii_hex() -> bool
-
Returns
true
if the string contains only ASCII hexadecimal digits, i.e."0"
to"9"
,"a"
to"f"
, or"A"
to"F"
.Returns
false
if the string is empty. -
:is_ascii_octal() -> bool
-
Returns
true
if the string contains only ASCII octal digits, i.e."0"
to"7"
.Returns
false
if the string is empty. -
:is_ascii_ws() -> bool
-
Returns
true
if the string contains only ASCII whitespace characters.Returns
false
if the string is empty. -
:is_empty() -> bool
-
Returns
true
if the string is empty, i.e. if its length is zero. -
:is_utf8() -> bool
-
Returns
true
if the string contains only valid UTF-8. (This is a potentially expensive method as it needs to traverse the string.)Returns
false
if the string is empty. -
:is_utf8_ws() -> bool
-
Returns
true
if the string contains only UTF-8 encoded whitespace characters, as defined by the Unicode standard.Returns
false
if the string is empty. -
:iter() -> iter[str]
-
Returns an iterator over the string's individual byte values, returning each value as a single-byte string.
-
:join(items: iterable) -> str
-
Creates a new string by joining the stringified elements of the iterable argument using the receiver string as the separator. Elements are automatically stringified — this is equivalent to calling
$str()
on each element.Returns an empty string if the iterator is empty or exhausted.
-
:lines() -> iter[str]
-
Returns an iterator over the string's lines. Recognised line breaks are
\n
and\r\n
. Strips the line breaks from the returned strings.If this method is called on an empty string, the iterator will be empty, i.e. will return zero elements.
-
:match(target: str, index: i64) -> bool
-
Returns
true
if the stringtarget
matches at byte indexindex
. -
:replace(old: str, new: str) -> str
-
Returns a new string with all instances of the string
old
replaced by the stringnew
. -
:slice(start_index: i64) -> str
:slice(start_index: i64, length: i64) -> str
-
Copies a slice of the source string and returns it as a new string. The source string is left unchanged. The slice starts at byte index
start_index
and containslength
bytes.If
start_index
is negative, counts backwards from the end of the string — i.e. astart_index
of-1
refers to to the last byte in the string.If
length
is omitted, copies to the end of the source string.Panics if either argument is out of range.
-
:split() -> vec[str]
:split(sep: str) -> vec[str]
-
Splits the string on instances of the delimiter string
sep
. Returns a vector of strings.If no argument is specified, this method acts as an alias for
:split_on_ascii_ws()
.If this method is called on an empty string, it will return a vector containing a single empty string.
-
:split_on_ascii_ws() -> vec[str]
-
This method splits the string on contiguous sequences of ASCII whitespace characters. Leading and trailing whitespace is ignored. Returns a vector of strings.
If this method is called on an empty string, it will return a vector containing a single empty string.
-
:starts_with(prefix: str) -> bool
-
Returns
true
if the string starts with the stringprefix
, otherwisefalse
. -
:strip() -> str
:strip(bytes: str) -> str
-
When called with no arguments, this method returns the new string formed by stripping leading and trailing ASCII whitespace characters from the string. (In this case it functions as an alias for
:strip_ascii_ws()
.)When called with a single string argument, this method returns the new string formed by stripping any leading or trailing bytes that occur in
bytes
. (In this case it functions as an alias for:strip_bytes()
.) -
:strip_ascii_ws() -> str
-
Returns the new string formed by stripping leading and trailing ASCII whitespace characters.
-
:strip_bytes(bytes: str) -> str
-
Returns the new string formed by stripping any leading or trailing bytes that occur in
bytes
. -
:strip_chars(chars: str) -> str
-
Returns the new string formed by stripping any leading or trailing UTF-8 encoded codepoints that occur in
chars
. -
:strip_prefix(prefix: str) -> str
-
Returns a new string with the leading string
prefix
stripped if present. (Only a single instance ofprefix
will be stripped.) -
:strip_prefix_bytes(bytes: str) -> str
-
Returns the new string formed by stripping any leading bytes that occur in
bytes
. -
:strip_prefix_chars(chars: str) -> str
-
Returns the new string formed by stripping any leading UTF-8 encoded codepoints that occur in
chars
. -
:strip_suffix(suffix: str) -> str
-
Returns a new string with the trailing string
suffix
stripped if present. (Only a single instance ofsuffix
will be stripped.) -
:strip_suffix_bytes(bytes: str) -> str
-
Returns the new string formed by stripping any trailing bytes that occur in
bytes
. -
:strip_suffix_chars(chars: str) -> str
-
Returns the new string formed by stripping any trailing UTF-8 encoded codepoints that occur in
chars
. -
:strip_utf8_ws() -> str
-
Returns the new string formed by stripping leading and trailing UTF-8 encoded whitespace characters, as defined by the Unicode standard.
-
:to_ascii_lower() -> str
-
Returns a new string with all ASCII uppercase characters converted to lowercase.
-
:to_ascii_upper() -> str
-
Returns a new string with all ASCII lowercase characters converted to uppercase.
-
:to_hex() -> str
-
Returns a new string containing the hex-escaped byte values from the orginal string — e.g.
`foo`
becomes`\x66\x6F\x6F`
. Useful for inspecting and debugging Unicode. -
:to_lower() -> str
-
This method is an alias for
:to_ascii_lower()
. -
:to_upper() -> str
-
This method is an alias for
:to_ascii_upper()
.