struct
UTF16
Inheritance |
UnicodeCodec
View Protocol Hierarchy β
|
---|---|
Associated Types |
A type that can hold code unit values for this encoding. |
Import |
|
Initializers
Creates an instance of the UTF-16 codec.
Declaration
init
()
Static Methods
Encodes a Unicode scalar as a series of code units by calling the given closure on each code unit.
For example, the musical fermata symbol ("π") is a single Unicode scalar
value (\u{1D110}
) but requires two code units for its UTF-16
representation. The following code encodes a fermata in UTF-16:
var
codeUnits
: [
UTF16.CodeUnit
] = []
UTF16.encode
(
"π"
,
into
: {
codeUnits
.
append
($
0
) })
(
codeUnits
)
// Prints "[55348, 56592]"
Parameters: input: The Unicode scalar value to encode. processCodeUnit: A closure that processes one code unit argument at a time.
Declaration
static
func
encode
(
_
input
:
UnicodeScalar
,
into
processCodeUnit
: (
UTF16.CodeUnit
) -
>
Swift
.
Void
)
Returns a Boolean value indicating whether the specified code unit is a high-surrogate code unit.
Here's an example of checking whether each code unit in a string's
utf16
view is a lead surrogate. The apple
string contains a single
emoji character made up of a surrogate pair when encoded in UTF-16.
let
apple
=
"π"
for
unit
in
apple
.
utf16
{
(
UTF16.isLeadSurrogate
(
unit
))
}
// Prints "true"
// Prints "false"
This method does not validate the encoding of a UTF-16 sequence beyond
the specified code unit. Specifically, it does not validate that a
low-surrogate code unit follows x
.
x
: A UTF-16 code unit.
Returns: true
if x
is a high-surrogate code unit; otherwise,
false
.
See Also: UTF16.width(_:)
, UTF16.leadSurrogate(_:)
Declaration
static
func
isLeadSurrogate
(
_
x
:
UTF16.CodeUnit
) -
>
Bool
Returns a Boolean value indicating whether the specified code unit is a low-surrogate code unit.
Here's an example of checking whether each code unit in a string's
utf16
view is a trailing surrogate. The apple
string contains a
single emoji character made up of a surrogate pair when encoded in
UTF-16.
let
apple
=
"π"
for
unit
in
apple
.
utf16
{
(
UTF16.isTrailSurrogate
(
unit
))
}
// Prints "false"
// Prints "true"
This method does not validate the encoding of a UTF-16 sequence beyond
the specified code unit. Specifically, it does not validate that a
high-surrogate code unit precedes x
.
x
: A UTF-16 code unit.
Returns: true
if x
is a low-surrogate code unit; otherwise,
false
.
See Also: UTF16.width(_:)
, UTF16.leadSurrogate(_:)
Declaration
static
func
isTrailSurrogate
(
_
x
:
UTF16.CodeUnit
) -
>
Bool
Returns the high-surrogate code unit of the surrogate pair representing the specified Unicode scalar.
Because a Unicode scalar value can require up to 21 bits to store its value, some Unicode scalars are represented in UTF-16 by a pair of 16-bit code units. The first and second code units of the pair, designated leading and trailing surrogates, make up a surrogate pair.
let
apple
:
UnicodeScalar
=
"π"
(
UTF16.leadSurrogate
(
apple
)
// Prints "55356"
x
: A Unicode scalar value. x
must be represented by a
surrogate pair when encoded in UTF-16. To check whether x
is
represented by a surrogate pair, use UTF16.width(x) == 2
.
Returns: The leading surrogate code unit of x
when encoded in UTF-16.
See Also: UTF16.width(_:)
, UTF16.trailSurrogate(_:)
Declaration
static
func
leadSurrogate
(
_
x
:
UnicodeScalar
) -
>
UTF16.CodeUnit
Returns the low-surrogate code unit of the surrogate pair representing the specified Unicode scalar.
Because a Unicode scalar value can require up to 21 bits to store its value, some Unicode scalars are represented in UTF-16 by a pair of 16-bit code units. The first and second code units of the pair, designated leading and trailing surrogates, make up a surrogate pair.
let
apple
:
UnicodeScalar
=
"π"
(
UTF16.trailSurrogate
(
apple
)
// Prints "57166"
x
: A Unicode scalar value. x
must be represented by a
surrogate pair when encoded in UTF-16. To check whether x
is
represented by a surrogate pair, use UTF16.width(x) == 2
.
Returns: The trailing surrogate code unit of x
when encoded in UTF-16.
See Also: UTF16.width(_:)
, UTF16.leadSurrogate(_:)
Declaration
static
func
trailSurrogate
(
_
x
:
UnicodeScalar
) -
>
UTF16.CodeUnit
Returns the number of UTF-16 code units required for the given code unit sequence when transcoded to UTF-16, and a Boolean value indicating whether the sequence was found to contain only ASCII characters.
The following example finds the length of the UTF-16 encoding of the
string "Fermata π"
, starting with its UTF-8 representation.
let
fermata
=
"Fermata π"
let
bytes
=
fermata
.
utf8
(
Array
(
bytes
))
// Prints "[70, 101, 114, 109, 97, 116, 97, 32, 240, 157, 132, 144]"
let
result
=
transcodedLength
(
of
:
bytes
.
makeIterator
(),
decodedAs
:
UTF8
.
self
,
repairingIllFormedSequences
:
false
)
(
result
)
// Prints "Optional((10, false))"
Parameters:
input: An iterator of code units to be translated, encoded as
sourceEncoding
. If repairingIllFormedSequences
is true
, the
entire iterator will be exhausted. Otherwise, iteration will stop if
an ill-formed sequence is detected.
sourceEncoding: The Unicode encoding of input
.
repairingIllFormedSequences: Pass true
to measure the length of
input
even when input
contains ill-formed sequences. Each
ill-formed sequence is replaced with a Unicode replacement character
("\u{FFFD}"
) and is measured as such. Pass false
to immediately
stop measuring input
when an ill-formed sequence is encountered.
Returns: A tuple containing the number of UTF-16 code units required to
encode input
and a Boolean value that indicates whether the input
contained only ASCII characters. If repairingIllFormedSequences
is
false
and an ill-formed sequence is detected, this method returns
nil
.
Declaration
static
func
transcodedLength
<
Input
,
Encoding
where
Input
:
IteratorProtocol
,
Encoding
:
UnicodeCodec
,
Encoding
.
CodeUnit
==
Input
.
Element
>
(
of
input
:
Input
,
decodedAs
sourceEncoding
:
Encoding
.
Type
,
repairingIllFormedSequences
:
Bool
) -
>
(
count
:
Int
,
isASCII
:
Bool
)?
Returns the number of code units required to encode the given Unicode scalar.
Because a Unicode scalar value can require up to 21 bits to store its value, some Unicode scalars are represented in UTF-16 by a pair of 16-bit code units. The first and second code units of the pair, designated leading and trailing surrogates, make up a surrogate pair.
let
anA
:
UnicodeScalar
=
"A"
(
anA
.
value
)
// Prints "65"
(
UTF16.width
(
anA
))
// Prints "1"
let
anApple
:
UnicodeScalar
=
"π"
(
anApple
.
value
)
// Prints "127822"
(
UTF16.width
(
anApple
))
// Prints "2"
x
: A Unicode scalar value.
Returns: The width of x
when encoded in UTF-16, either 1
or 2
.
Declaration
static
func
width
(
_
x
:
UnicodeScalar
) -
>
Int
Instance Methods
Starts or continues decoding a UTF-16 sequence.
To decode a code unit sequence completely, call this method repeatedly
until it returns UnicodeDecodingResult.emptyInput
. Checking that the
iterator was exhausted is not sufficient, because the decoder can store
buffered data from the input iterator.
Because of buffering, it is impossible to find the corresponding position
in the iterator for a given returned UnicodeScalar
or an error.
The following example decodes the UTF-16 encoded bytes of a string into an
array of UnicodeScalar
instances. This is a demonstration only---if
you need the Unicode scalar representation of a string, use its
unicodeScalars
view.
let
str
=
"β¨Unicodeβ¨"
(
Array
(
str
.
utf16
))
// Prints "[10024, 85, 110, 105, 99, 111, 100, 101, 10024]"
var
codeUnitIterator
=
str
.
utf16
.
makeIterator
()
var
scalars
: [
UnicodeScalar
] = []
var
utf16Decoder
=
UTF16
()
Decode
:
while
true
{
switch
utf16Decoder
.
decode
(
&
codeUnitIterator
) {
case
.
scalarValue
(
let
v
):
scalars
.
append
(
v
)
case
.
emptyInput
:
break
Decode
case
.
error
:
(
"Decoding error"
)
break
Decode
}
}
(
scalars
)
// Prints "["\u{2728}", "U", "n", "i", "c", "o", "d", "e", "\u{2728}"]"
input
: An iterator of code units to be decoded. input
must be
the same iterator instance in repeated calls to this method. Do not
advance the iterator or any copies of the iterator outside this
method.
Returns: A UnicodeDecodingResult
instance, representing the next
Unicode scalar, an indication of an error, or an indication that the
UTF sequence has been fully decoded.
Declaration
mutating
func
decode
<
I
:
IteratorProtocol
where
I
.
Element
==
CodeUnit
>
(
_
input
:
inout
I
) -
>
UnicodeDecodingResult
A codec for translating between Unicode scalar values and UTF-16 code units.