Zig Language Specification

Living Document,

This version:
https://nektro.github.io/zigspec/
Issue Tracking:
GitHub
Editor:
Meghan Denny <hello@nektro.net>

1. Introduction

This standard defines the Zig 2022-09-12 Language. It is a Living Document and kept up to date on a best-effort basis with the latest master releases of github.com/ziglang/zig and provided as-is. As Zig is a pre-1.0 language, any information is capable of changing as new proposals are accepted and implemented. The core parts of the language are fairly stable and likely to remain unchanged going forward but all discretion is up to the Zig Core Team.

TODO more history information

https://about.sourcegraph.com/podcast/andrew-kelley has a lot of early days details

0.1.1 released on 2017-10-17

0.2.0 released on 2018-03-15

0.3.0 released on 2018-09-28

0.4.0 released on 2019-04-08

0.5.0 released on 2019-09-30

0.6.0 released on 2020-04-13

0.7.0 released on 2020-11-08

0.8.0 released on 2021-06-04

0.9.0 released on 2021-12-20

2. Zen

3. Scope

This Standard defines the Zig 2022-09-12 general-purpose programming language.

4. Conformance

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

A conforming implementation of Zig must provide and support all the types, values, objects, properties, functions, program syntax, and semantics described in this specification.

A conforming implementation of Zig must interpret source text input in conformance with the latest version of the [Unicode] Standard and [ISO10646].

5. Overview

Zig is a general-purpose programming language and toolchain for maintaining robust, optimal and reusable software.

Robust

Behavior is correct even for edge cases such as out of memory.

Optimal

Write programs the best way they can behave and perform.

Reusable

The same code works in many environments which have different constraints.

Maintainable

Precisely communicate intent to the compiler and other programmers. The language imposes a low overhead to reading code and is resilient to changing requirements and environments.

6. Source Text

SourceCharacter
any Unicode code point

Zig source text is a sequence of Unicode code points. All Unicode code point values from U+0000 to U+10FFFF, including surrogate code points, may occur in Zig source text where permitted by the grammar. Files storing Zig source text are [UTF-8] encoded text files. The files storing Zig source code are usually named with the .zig extension.

The components of a combining character sequence are treated as individual Unicode code points even though a user might think of the whole sequence as a single character.

NL
SourceCharacter 'LINE FEED (LF)' (U+000A)
CR
SourceCharacter 'CARRIAGE RETURN (CR)' (U+000D)
TAB
SourceCharacter 'CHARACTER TABULATION' (U+0009)

further discussion here. <https://github.com/ziglang/zig-spec/issues/38>

7. Lexical Grammar

The source text of an Zig file is first converted into a sequence of input elements, which are tokens, line terminators, comments, or white space. The source text is scanned from left to right, repeatedly taking the longest possible sequence of code points as the next input element. A Zig file may contain zero or more @imports; the parameter to which is enforced to be a String Literal. The content of this String may be either a relative path to another Zig file or a Package name. The corresponding Zig file may be lexically processed in parallel.

7.1. White Space

WhiteSpace

7.2. Line Terminators

Like white space code points, line terminator code points are used to improve source text readability and to separate tokens (indivisible lexical units) from each other. However, unlike white space code points, line terminators have some influence over the behaviour of the syntactic grammar. In general, line terminators may occur between any two tokens, but there are a few places where they are forbidden by the syntactic grammar.

LineTerminator
CR NL

Note: CR anywhere outside of a LineTerminator is rejected.

7.3. Comments

Comments can only be either single-line. There are no multiline comments in Zig (e.g. like /* */ comments in C). This allows Zig source code to have the property that each line of code can be tokenized out of context.

Because a single-line comment can contain any Unicode code point except a LineTerminator code point, and because of the general rule that a token is always as long as possible, a single-line comment always consists of all code points from the // marker to the end of the line.

Comments behave like white space and are discarded.

Note: In an Zig program, a Unicode escape sequence occurring within a comment is never interpreted and therefore cannot contribute to termination of the comment. Similarly, a Unicode escape sequence occurring within a string literal in an Zig program always contributes to the literal and is never interpreted as a line terminator or as a code point that might terminate the string literal.

Note: Any TAB is rejected in a Comment since it is ambiguous how it should be rendered.

Comment
SingleLineComment
DocComment
/ / / CommentChars
ContainerDocComment
/ / ! CommentChars
CommentChars

7.4. Tokens

Note: Defined by std.zig.Token.Tag in /lib/std/zig/tokenizer.zig Could be interpreted by other code. Railroad diagrams for complex code in § 9 Expressions.

Note: TODO how should this section be expanded?

7.5. Identifiers

TODO digit definitions in type.md literal sections

IDENTIFIER
    <- !keyword [A-Za-z_] [A-Za-z0-9_]* skip
     / "@\"" string_char* "\""                            skip

7.6. Keywords

align
allowzero
and
anyframe
anytype
asm
async
await
break
catch
comptime
const
continue
defer
else
enum
errdefer
error
export
extern
fn
for
if
inline
noalias
nosuspend
or
orelse
packed
pub
resume
return
linksection
struct
suspend
switch
test
threadlocal
try
union
unreachable
usingnamespace
var
volatile
while

7.7. Operators

filter out of std.zig.Token.Tag

use table like https://ziglang.org/documentation/master/#Operators

8. Data Types and Values

Algorithms within this specification manipulate values each of which has an associated type. The possible value types are exactly those defined in this clause.

Within this specification, the notation "Type(x)" is used as shorthand for "the type of x" where "type" refers to the Zig language and specification types defined in this clause.

TODO include literal definitions in this section

8.1. The AnyFrame Type

8.2. The Array Type

8.3. The Bool Type

8.4. The ComptimeFloat Type

8.5. The ComptimeInt Type

8.6. The Enum Type

8.7. The EnumLiteral Type

8.8. The ErrorSet Type

8.9. The ErrorUnion Type

8.10. The Float Type

8.11. The Fn Type

8.12. The Frame Type

8.13. The Int Type

8.14. The NoReturn Type

8.15. The Null Type

8.16. The Opaque Type

8.17. The Optional Type

8.18. The Pointer Type

8.19. The Struct Type

8.20. The Type Type

8.21. The Undefined Type

8.22. The Union Type

8.23. The Vector Type

8.24. The Void Type

9. Expressions

See types.md for the following
    struct
        auto
        extern
        packed
            infer/explicit int
        field
            name/type/default
        init
    enum
        auto
        backing int
        (non-)exhaustive
        literal
    union
        (un)tagged
        infer/explicit tagged
        init
        mention @unionInit
    function
        auto
        extern
        type
        body
        parameter
            auto/comptime
        error union
    pointer
        modifiers
        single
        multi
        c
        slice
    array
    vector
    error set

See gramar.md for the following
    operators
        math
        math+assign
        address of
        pointer dereference
        optional unwrap

TODO const decl
TODO var decl
TODO test decl
TODO comptime block
TODO usingnamespace decl
TODO while
TODO for
TODO switch

10. Statements and Declarations

TODO add references to grammar/types/expressions.md and define semantics

11. Compiler Builtins

Builtin functions are provided by the compiler and are prefixed with @. The comptime keyword on a parameter means that the parameter must be known at compile time.

TODO figure out how to workaround ID conflict between @frame and @Frame <https://github.com/tabatkins/bikeshed/issues/861>

@addWithOverflow
@alignCast
@alignOf
@as
@asyncCall
@atomicLoad
@atomicRmw
@atomicStore
@bitCast
@bitOffsetOf
@bitReverse
@bitSizeOf
@boolToInt
@breakpoint
@byteSwap
@call
@cDefine
@ceil
@cImport
@cInclude
@clz
@cmpxchgStrong
@cmpxchgWeak
@compileError
@compileLog
@cos
@ctz
@cUndef
@divExact
@divFloor
@divTrunc
@embedFile
@enumToInt
@errorName
@errorReturnTrace
@errorToInt
@errSetCast
@exp
@exp2
@export
@extern
@fabs
@fence
@field
@fieldParentPtr
@floatCast
@floatToInt
@floor

@Frame
@frameAddress
@frameSize
@hasDecl
@hasField
@import
@intCast
@intToEnum
@intToError
@intToFloat
@intToPtr
@log
@log10
@log2
@maximum
@memcpy
@memset
@minimum
@mod
@mulAdd
@mulWithOverflow
@offsetOf
@panic
@popCount
@prefetch
@ptrCast
@ptrToInt
@reduce
@rem
@returnAddress
@round
@select
@setAlignStack
@setCold
@setEvalBranchQuota
@setFloatMode
@setRuntimeSafety
@shlExact
@shlWithOverflow
@shrExact
@shuffle
@sin
@sizeOf
@splat
@sqrt
@src
@subWithOverflow
@tagName
@tan
@This
@trunc
@truncate
@Type
@typeInfo
@typeName
@TypeOf
@unionInit
@Vector
@wasmMemoryGrow
@wasmMemorySize

12. The std Package

A special package made available to all source files. Its contents can be found in data files for the the upstream compiler implementation at github.com/ziglang/zig/lib/std/.

13. The builtin Package

A special package made available to all source files. Declarations in this package fall under two categories: those identifying meta information about the currently executing compiler implementation and those identifying information about the compilation unit currently being processed.

13.1. Compiler information

zig_version

A std.SemanticVersion representing the version of the currently executing compiler implementation.

zig_backend

A std.builtin.CompilerBackend identifying the currently executing compiler implementation.

have_error_return_tracing

= true;

valgrind_support

A Boolean value representing whether the currently executing compiler implementation supports Valgrind’s Memcheck tool for detecting improper access of Undefined memory.

13.2. Compilation information

output_mode
link_mode
is_test
single_threaded
abi
cpu
os
target
object_format
mode
link_libc
link_libcpp
sanitize_thread
position_independent_code
position_independent_executable
strip_debug_info
code_model

14. The root Package

15. Undefined Behavior

tracking issue for the exhaustiveness of this section <https://github.com/ziglang/zig/issues/1966>

16. Comptime

17. C Interoperability

References

Normative References

[INFRA]
Anne van Kesteren; Domenic Denicola. Infra Standard. Living Standard. URL: https://infra.spec.whatwg.org/

Informative References

[ISO10646]
Information Technology - Universal Multiple- Octet Coded CharacterSet (UCS) - Part 1: Architecture and Basic Multilingual Plane. ISO/IEC10646-1:1993. The current specification also takes into consideration the first five amendments to ISO/IEC 10646-1:1993. Useful <a href="http://www.egt.ie/standards/iso10646/ucs-roadmap.html">roadmaps</a>show which scripts sit at which numeric ranges.
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://tools.ietf.org/html/rfc2119
[Unicode]
The Unicode Standard. URL: https://www.unicode.org/versions/latest/
[UTF-8]
F. Yergeau. UTF-8, a transformation format of ISO 10646. November 2003. Internet Standard. URL: https://tools.ietf.org/html/rfc3629