Class Lexer

All Implemented Interfaces:
StringPool

public final class Lexer extends Scanner implements StringPool
Responsible for converting source content into a stream of tokens.
  • Nested Class Summary

    Nested Classes
    Modifier and Type
    Class
    Description
    static class 
    Helper class for Lexer tokens, e.g XML or RegExp tokens.
    protected static interface 
    interface to receive line information for multi-line literals.
    static class 
    Temporary container for regular expressions.
  • Field Summary

    Fields inherited from class Scanner

    ch0, ch1, ch2, ch3, content, limit, line, position
  • Constructor Summary

    Constructors
    Constructor
    Description
    Lexer(Source source, int start, int len, TokenStream stream, boolean scripting, int ecmaScriptVersion, boolean shebang, boolean isModule, boolean pauseOnFunctionBody, boolean allowBigInt, boolean annexB)
    Constructor
    Lexer(Source source, TokenStream stream, boolean scripting, int ecmaScriptVersion, boolean shebang, boolean isModule, boolean allowBigInt, boolean annexB)
    Constructor
  • Method Summary

    Modifier and Type
    Method
    Description
    protected void
    add(TokenType type, int start)
    Add a new token to the stream.
    protected void
    add(TokenType type, int start, int end)
    Add a new token to the stream.
    boolean
    Return true if the given token can be the beginning of a literal.
    boolean
    checkIdentForKeyword(long token, String keyword)
     
    protected static int
    convertDigit(char ch, int base)
    Convert a digit to a integer.
    protected void
    error(String message, TokenType type, int start, int length)
    Generate a runtime exception
    static boolean
    isEOL(char ch)
    Test whether a char is valid JavaScript end of line
    protected static boolean
    Is the given character a valid escape char after "\" ?
    protected static boolean
    Test if char is a string delimiter, e.g.
    static boolean
    Test whether a char is valid JavaScript end of string.
    static boolean
    isWhitespace(char ch)
    Test whether a char is valid JavaScript whitespace
    void
    Breaks source content down into lex units, adding tokens to the token stream.
    protected static String
    message(String msgId, String... args)
    Get the correctly localized error message for a given message id format arguments
    protected boolean
    scanLiteral(long token, TokenType startTokenType, Lexer.LineInfoReceiver lir)
    Check whether the given token represents the beginning of a literal.
    protected void
    Scan a number.
    protected void
    scanString(boolean add)
    Scan over a string literal.
    protected final void
    Continue scanning a template literal after an expression.
    com.oracle.truffle.api.strings.TruffleString
    stringIntern(com.oracle.truffle.api.strings.TruffleString candidate)
     
    com.oracle.truffle.api.strings.TruffleString
    stringIntern(String candidate)
     
    valueOfPattern(int start, int length)
    Convert a regex token to a token object.
    com.oracle.truffle.api.strings.TruffleString
    valueOfRawString(long token)
    Get the raw string value of a template literal string part.

    Methods inherited from class Scanner

    atEOF, charAt, reset, skip

    Methods inherited from class Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • Lexer

      public Lexer(Source source, TokenStream stream, boolean scripting, int ecmaScriptVersion, boolean shebang, boolean isModule, boolean allowBigInt, boolean annexB)
      Constructor
      Parameters:
      source - the source
      stream - the token stream to lex
      scripting - are we in scripting mode
      ecmaScriptVersion - ECMAScript language version
      shebang - do we support shebang
      isModule - are we in module
    • Lexer

      public Lexer(Source source, int start, int len, TokenStream stream, boolean scripting, int ecmaScriptVersion, boolean shebang, boolean isModule, boolean pauseOnFunctionBody, boolean allowBigInt, boolean annexB)
      Constructor
      Parameters:
      source - the source
      start - start position in source from which to start lexing
      len - length of source segment to lex
      stream - token stream to lex
      scripting - are we in scripting mode
      ecmaScriptVersion - ECMAScript language version
      shebang - do we support shebang
      isModule - are we in module
      pauseOnFunctionBody - if true, lexer will return from lexify() when it encounters a function body. This is used with the feature where the parser is skipping nested function bodies to avoid reading ahead unnecessarily when we skip the function bodies.
  • Method Details

    • add

      protected void add(TokenType type, int start, int end)
      Add a new token to the stream.
      Parameters:
      type - Token type.
      start - Start position.
      end - End position.
    • add

      protected void add(TokenType type, int start)
      Add a new token to the stream.
      Parameters:
      type - Token type.
      start - Start position.
    • isWhitespace

      public static boolean isWhitespace(char ch)
      Test whether a char is valid JavaScript whitespace
      Parameters:
      ch - a char
      Returns:
      true if valid JavaScript whitespace
    • isEOL

      public static boolean isEOL(char ch)
      Test whether a char is valid JavaScript end of line
      Parameters:
      ch - a char
      Returns:
      true if valid JavaScript end of line
    • isStringLineTerminator

      public static boolean isStringLineTerminator(char ch)
      Test whether a char is valid JavaScript end of string. Line separators and paragraph separators can appear in JavaScript string literals.
      Parameters:
      ch - a char
      Returns:
      true if valid JavaScript end of string
    • isStringDelimiter

      protected static boolean isStringDelimiter(char ch)
      Test if char is a string delimiter, e.g. '\' or '"'.
      Parameters:
      ch - a char
      Returns:
      true if string delimiter
    • valueOfPattern

      public Lexer.RegexToken valueOfPattern(int start, int length)
      Convert a regex token to a token object.
      Parameters:
      start - Position in source content.
      length - Length of regex token.
      Returns:
      Regex token object.
    • canStartLiteral

      public boolean canStartLiteral(TokenType token)
      Return true if the given token can be the beginning of a literal.
      Parameters:
      token - a token
      Returns:
      true if token can start a literal.
    • scanLiteral

      protected boolean scanLiteral(long token, TokenType startTokenType, Lexer.LineInfoReceiver lir)
      Check whether the given token represents the beginning of a literal. If so scan the literal and return true, otherwise return false.
      Parameters:
      token - the token.
      startTokenType - the token type.
      lir - LineInfoReceiver that receives line info for multi-line string literals.
      Returns:
      True if a literal beginning with startToken was found and scanned.
    • convertDigit

      protected static int convertDigit(char ch, int base)
      Convert a digit to a integer. Can't use Character.digit since we are restricted to ASCII by the spec.
      Parameters:
      ch - Character to convert.
      base - Numeric base.
      Returns:
      The converted digit or -1 if invalid.
    • checkIdentForKeyword

      public boolean checkIdentForKeyword(long token, String keyword)
    • scanString

      protected void scanString(boolean add)
      Scan over a string literal.
      Parameters:
      add - true if we are not just scanning but should actually modify the token stream
    • scanTemplateSpan

      protected final void scanTemplateSpan()
      Continue scanning a template literal after an expression.
    • isEscapeCharacter

      protected static boolean isEscapeCharacter(char ch)
      Is the given character a valid escape char after "\" ?
      Parameters:
      ch - character to be checked
      Returns:
      if the given character is valid after "\"
    • scanNumber

      protected void scanNumber()
      Scan a number.
    • lexify

      public void lexify()
      Breaks source content down into lex units, adding tokens to the token stream. The routine scans until the stream buffer is full. Can be called repeatedly until EOF is detected.
    • valueOfRawString

      public com.oracle.truffle.api.strings.TruffleString valueOfRawString(long token)
      Get the raw string value of a template literal string part.
      Parameters:
      token - template string token
      Returns:
      raw string
    • stringIntern

      public com.oracle.truffle.api.strings.TruffleString stringIntern(com.oracle.truffle.api.strings.TruffleString candidate)
      Specified by:
      stringIntern in interface StringPool
    • stringIntern

      public com.oracle.truffle.api.strings.TruffleString stringIntern(String candidate)
    • message

      protected static String message(String msgId, String... args)
      Get the correctly localized error message for a given message id format arguments
      Parameters:
      msgId - message id
      args - format arguments
      Returns:
      message
    • error

      protected void error(String message, TokenType type, int start, int length) throws ParserException
      Generate a runtime exception
      Parameters:
      message - error message
      type - token type
      start - start position of lexed error
      length - length of lexed error
      Throws:
      ParserException - unconditionally