UnicodeCodec

protocol UnicodeCodec

A Unicode encoding form that translates between Unicode scalar values and form-specific code units.

The UnicodeCodec protocol declares methods that decode code unit sequences into Unicode scalar values and encode Unicode scalar values into code unit sequences. The standard library implements codecs for the UTF-8, UTF-16, and UTF-32 encoding schemes as the UTF8, UTF16, and UTF32 types, respectively. Use the UnicodeScalar type to work with decoded Unicode scalar values.

See Also: UTF8, UTF16, UTF32, UnicodeScalar

Inheritance View Protocol Hierarchy →
Associated Types
CodeUnit

A type that can hold code unit values for this encoding.

Import import Swift

Initializers

init() Required

Creates an instance of the codec.

Declaration

init()

Static Methods

static func encode(_:into:) Required

Encodes a Unicode scalar as a series of code units by calling the given closure on each code unit.

For example, the musical fermata symbol ("𝄐") is a single Unicode scalar value (\u{1D110}) but requires four code units for its UTF-8 representation. The following code uses the UTF8 codec to encode a fermata in UTF-8:

var bytes: [UTF8.CodeUnit] = []
UTF8.encode("𝄐", into: { bytes.append($0) })
print(bytes)
// Prints "[240, 157, 132, 144]"

Parameters: input: The Unicode scalar value to encode. processCodeUnit: A closure that processes one code unit argument at a time.

Declaration

static func encode(_ input: UnicodeScalar, into processCodeUnit: (Self.CodeUnit) -> Swift.Void)

Instance Methods

mutating func decode(_:) Required

Starts or continues decoding a code unit sequence into Unicode scalar values.

To decode a code unit sequence completely, call this method repeatedly until it returns UnicodeDecodingResult.emptyInput. Checking that the iterator was exhausted is not sufficient, because the decoder can store buffered data from the input iterator.

Because of buffering, it is impossible to find the corresponding position in the iterator for a given returned UnicodeScalar or an error.

The following example decodes the UTF-8 encoded bytes of a string into an array of UnicodeScalar instances:

let str = "✨Unicode✨"
print(Array(str.utf8))
// Prints "[226, 156, 168, 85, 110, 105, 99, 111, 100, 101, 226, 156, 168]"

var bytesIterator = str.utf8.makeIterator()
var scalars: [UnicodeScalar] = []
var utf8Decoder = UTF8()
Decode: while true {
    switch utf8Decoder.decode(&bytesIterator) {
    case .scalarValue(let v): scalars.append(v)
    case .emptyInput: break Decode
    case .error:
        print("Decoding error")
        break Decode
    }
}
print(scalars)
// Prints "["\u{2728}", "U", "n", "i", "c", "o", "d", "e", "\u{2728}"]"

input: An iterator of code units to be decoded. input must be the same iterator instance in repeated calls to this method. Do not advance the iterator or any copies of the iterator outside this method. Returns: A UnicodeDecodingResult instance, representing the next Unicode scalar, an indication of an error, or an indication that the UTF sequence has been fully decoded.

Declaration

mutating func decode<I where I : IteratorProtocol, Self.CodeUnit == I.Element>(_ input: inout I) -> UnicodeDecodingResult