struct
UTF8
Inheritance |
UnicodeCodec
View Protocol Hierarchy →
|
---|---|
Associated Types |
A type that can hold code unit values for this encoding. |
Import |
|
Initializers
Creates an instance of the UTF-8 codec.
Declaration
init
()
Static Methods
Encodes a Unicode scalar as a series of code units by calling the given closure on each code unit.
For example, the musical fermata symbol ("𝄐") is a single Unicode scalar
value (\u{1D110}
) but requires four code units for its UTF-8
representation. The following code encodes a fermata in UTF-8:
var
bytes
: [
UTF8.CodeUnit
] = []
UTF8.encode
(
"𝄐"
,
into
: {
bytes
.
append
($
0
) })
(
bytes
)
// Prints "[240, 157, 132, 144]"
Parameters: input: The Unicode scalar value to encode. processCodeUnit: A closure that processes one code unit argument at a time.
Declaration
static
func
encode
(
_
input
:
UnicodeScalar
,
into
processCodeUnit
: (
UTF8.CodeUnit
) -
>
Swift
.
Void
)
Returns a Boolean value indicating whether the specified code unit is a UTF-8 continuation byte.
Continuation bytes take the form 0b10xxxxxx
. For example, a lowercase
"e" with an acute accent above it ("é"
) uses 2 bytes for its UTF-8
representation: 0b11000011
(195) and 0b10101001
(169). The second
byte is a continuation byte.
let
eAcute
=
"é"
for
codePoint
in
eAcute
.
utf8
{
(
codePoint
,
UTF8.isContinuation
(
codePoint
))
}
// Prints "195 false"
// Prints "169 true"
byte
: A UTF-8 code unit.
Returns: true
if byte
is a continuation byte; otherwise, false
.
Declaration
static
func
isContinuation
(
_
byte
:
UTF8.CodeUnit
) -
>
Bool
Instance Methods
Starts or continues decoding a UTF-8 sequence.
To decode a code unit sequence completely, call this method repeatedly
until it returns UnicodeDecodingResult.emptyInput
. Checking that the
iterator was exhausted is not sufficient, because the decoder can store
buffered data from the input iterator.
Because of buffering, it is impossible to find the corresponding position
in the iterator for a given returned UnicodeScalar
or an error.
The following example decodes the UTF-8 encoded bytes of a string into an
array of UnicodeScalar
instances. This is a demonstration only---if
you need the Unicode scalar representation of a string, use its
unicodeScalars
view.
let
str
=
"✨Unicode✨"
(
Array
(
str
.
utf8
))
// Prints "[226, 156, 168, 85, 110, 105, 99, 111, 100, 101, 226, 156, 168]"
var
bytesIterator
=
str
.
utf8
.
makeIterator
()
var
scalars
: [
UnicodeScalar
] = []
var
utf8Decoder
=
UTF8
()
Decode
:
while
true
{
switch
utf8Decoder
.
decode
(
&
bytesIterator
) {
case
.
scalarValue
(
let
v
):
scalars
.
append
(
v
)
case
.
emptyInput
:
break
Decode
case
.
error
:
(
"Decoding error"
)
break
Decode
}
}
(
scalars
)
// Prints "["\u{2728}", "U", "n", "i", "c", "o", "d", "e", "\u{2728}"]"
input
: An iterator of code units to be decoded. input
must be
the same iterator instance in repeated calls to this method. Do not
advance the iterator or any copies of the iterator outside this
method.
Returns: A UnicodeDecodingResult
instance, representing the next
Unicode scalar, an indication of an error, or an indication that the
UTF sequence has been fully decoded.
Declaration
mutating
func
decode
<
I
:
IteratorProtocol
where
I
.
Element
==
CodeUnit
>
(
_
input
:
inout
I
) -
>
UnicodeDecodingResult
A codec for translating between Unicode scalar values and UTF-8 code units.