public class LexBuffer extends Object
Lexical analysers actually extend this class to inherit a buffer with markers for token positions, final states, and methods which are used by the generated automata, as well as methods that can be used in semantic actions.
Modifier and Type | Class and Description |
---|---|
static class |
LexBuffer.LexicalError
Exception which can be raised by generated lexers which
extend
LexBuffer , and which is raised also by
getNextChar() in place of potential
IOException s. |
static class |
LexBuffer.Position
Instances of this class describe a position in some input
(most frequently a file, but could be a string or any char sequence).
|
Modifier and Type | Field and Description |
---|---|
protected int |
absPos
Absolute position of the start of the buffer
|
protected LexBuffer.Position |
curLoc
Current token position
|
protected int |
curPos
Current buffer input position
|
protected static LexBuffer.Position |
DUMMY_POS
|
protected String |
filename
The name of the input, for locations
(for error reports only, need not be an actual filename)
|
protected int[] |
memory
Memory cells
|
protected LexBuffer.Position |
startLoc
Position of the last token start
|
protected int |
startPos
Buffer input position of the token start
|
Modifier | Constructor and Description |
---|---|
protected |
LexBuffer(String version,
@Nullable String filename,
Reader reader)
Constructs a new lexer buffer based on the given character stream.
|
Modifier and Type | Method and Description |
---|---|
protected void |
appendLexeme(StringBuilder buf)
Appends the last matched lexeme to the given buffer
buf . |
protected void |
changeInput(String filename,
Reader reader)
This changes the input stream used by this lexer buffer to the
new source described by
filename and reader . |
void |
disablePositions()
Disables position tracking in this lexer.
|
protected void |
endToken()
Ends the matching of the current token
|
protected LexBuffer.LexicalError |
error(String msg)
Convenience helper which returns a
LexBuffer.LexicalError
located at the current lexeme start. |
protected String |
getLexeme() |
protected char |
getLexemeChar(int idx)
When successful, this is equivalent to
getLexeme().charAt(idx) ,
but will be more efficient in general since it does not require
allocating the lexeme string as getLexeme() does. |
protected CharSequence |
getLexemeChars()
This method returns the same sequence of characters that
getLexeme() would, but it is only a view based on the
current state of the LexBuffer . |
LexBuffer.Position |
getLexemeEnd() |
int |
getLexemeLength() |
LexBuffer.Position |
getLexemeStart() |
protected char |
getNextChar() |
protected String |
getSubLexeme(int start,
int end) |
protected char |
getSubLexemeChar(int pos) |
protected Optional<String> |
getSubLexemeOpt(int start,
int end) |
protected Optional<Character> |
getSubLexemeOptChar(int pos) |
protected boolean |
hasMoreInput() |
boolean |
hasPositions() |
protected void |
mark(int action)
Marks the current position as the last terminal
state encountered
|
protected void |
newline()
Updates the current position to account for a line change.
|
protected char |
peekNextChar()
This function is useful in cases when a lexer's semantic
action must act depending on what kind of input follows the
current token in the stream (although arguably at that point it's
not lexing anymore but parsing!).
|
protected int |
peekNextChars(char[] chars)
This function fetches the next characters in the stream
and returns them into the given character buffer
chars . |
protected void |
popInput()
This fetches the input stream which is at the stop of the input
stack.
|
protected void |
pushInput(String filename,
Reader reader)
This pushes the current input stream to the internal input stack
and resets the lexer to read from the given
reader . |
protected int |
rewind()
Resets the current position to the last terminal
state encountered
|
protected void |
savePosition(Runnable runnable,
LexBuffer.Position saved)
This method runs the given
runnable but takes
care to restore the current token start position to the
position given as the saved parameter. |
protected <T> T |
savePosition(Supplier<T> supplier,
LexBuffer.Position saved)
This method runs the given
supplier routine but takes
care to restore the current token start position to the
position given as the saved parameter. |
protected void |
saveStart(Runnable runnable)
Same as
savePosition(java.util.function.Supplier<T>, org.stekikun.dolmen.codegen.LexBuffer.Position) (runnable, getLexemeStart()) . |
protected <T> T |
saveStart(Supplier<T> supplier)
Same as
savePosition(java.util.function.Supplier<T>, org.stekikun.dolmen.codegen.LexBuffer.Position) (supplier, getLexemeStart()) . |
protected void |
startToken()
Starts the matching of a new token
|
protected String filename
@DolmenInternal protected int absPos
@DolmenInternal protected int startPos
@DolmenInternal protected int curPos
@DolmenInternal protected int[] memory
protected LexBuffer.Position startLoc
protected LexBuffer.Position curLoc
protected static final LexBuffer.Position DUMMY_POS
protected LexBuffer(String version, @Nullable String filename, Reader reader)
version
- the version of Dolmen which generated the subclassfilename
- reader
- Exceptions.DolmenVersionException
- if version
is not equal to
the version of this LexBuffer
public boolean hasPositions()
public void disablePositions()
Should be called first before the lexer is used, and cannot be re-enabled later.
@DolmenInternal protected final char getNextChar()
@DolmenInternal protected final void startToken()
@DolmenInternal protected final void mark(int action)
action
- associated semantic action index@DolmenInternal protected final int rewind()
@DolmenInternal protected final void endToken()
protected final char peekNextChar()
protected final int peekNextChars(char[] chars)
chars
.
It does so without advancing the state of the lexing engine so
that on return the lexer can resume from the original position.
This function is useful in cases when a lexer's semantic action must act depending on what kind of input follows the current token in the stream (although arguably at that point it's not lexing anymore but parsing!).
chars
peekNextChar()
protected final void changeInput(String filename, Reader reader)
filename
and reader
.
In contrast to pushInput(String, java.io.Reader)
, this
does not allow resuming the analysis of the former stream once
the new one is complete. This takes care of closing the input
stream which was in use until that point.
filename
- name of the new input sourcereader
- new input character streamprotected final void pushInput(String filename, Reader reader)
reader
.
Subsequent tokens will consume characters from the new stream.
The syntactic analysis of the former input stream can be resumed
in the exact same position by a subsequent call to popInput()
.
filename
- name of the new input sourcereader
- new input character streamprotected final boolean hasMoreInput()
true
if and only if there is more input
to fall back to once the current input stream is over,
i.e. if popInput()
will succeedprotected final void popInput()
pushInput(String, java.io.Reader)
.
This takes care of closing the input stream which was in use until that point.
IllegalArgumentException
- when the input stack is emptyhasMoreInput()
protected final String getLexeme()
public final LexBuffer.Position getLexemeStart()
public final LexBuffer.Position getLexemeEnd()
public final int getLexemeLength()
protected final char getLexemeChar(int idx)
getLexeme().charAt(idx)
,
but will be more efficient in general since it does not require
allocating the lexeme string as getLexeme()
does.
Beware that this method must only be called in the associated semantic action, and before any call to a nested rule, as leaving the action or entering another rule can change the underlying buffer.
idx
- idx
in the last
matched lexemeLexBuffer.LexicalError
- when idx
is negative or not less
than the length
of the lexemeprotected final CharSequence getLexemeChars()
getLexeme()
would, but it is only a view based on the
current state of the LexBuffer
. Therefore it can be more
efficient than getLexeme()
when the only requirement is
iterating on the characters in sequence, for instance appending
the contents to a StringBuilder
.
The downside is that the result of getLexemeChars()
must
be used with more care, as it depends on the current state of
the buffer. It only makes sense during the associated semantic
action, and before any call to a nested entry rule as well.
Otherwise, the contents or size of the underlying buffer may
have changed and the contents of the returned CharSequence
is undefined.
protected void appendLexeme(StringBuilder buf)
buf
.
This is equivalent to buf.append(getLexeme())
or
buf.append(getLexemeChars())
but will be faster than
both.buf
- the buffer to append the last lexeme to@DolmenInternal protected final String getSubLexeme(int start, int end)
start
- end
- pos
and end
(exclusive) in the token buffer@DolmenInternal protected final Optional<String> getSubLexemeOpt(int start, int end)
start
- end
- pos
and end
(exclusive) in the token buffer@DolmenInternal protected final char getSubLexemeChar(int pos)
pos
- pos
in the token buffer@DolmenInternal protected final Optional<Character> getSubLexemeOptChar(int pos)
pos
- pos
in the token bufferprotected LexBuffer.LexicalError error(String msg)
LexBuffer.LexicalError
located at the current lexeme start.
It is also used by the lexer generator to report empty tokens, i.e. input which does not match any of the lexer rules. It can be overriden in generated lexers to allow for customized message and position reports.
msg
- protected final void newline()
protected final <T> T saveStart(Supplier<T> supplier)
savePosition(java.util.function.Supplier<T>, org.stekikun.dolmen.codegen.LexBuffer.Position)
(supplier, getLexemeStart())
.
This is exactly equivalent to the following code:
Position saved = getLexemeStart(); T res = supplier.get(); // Calling the routine startLoc = saved; // Restoring the start position
supplier
- a routine to runsupplier
savePosition(Supplier, Position)
,
getLexemeStart()
protected final <T> T savePosition(Supplier<T> supplier, LexBuffer.Position saved)
supplier
routine but takes
care to restore the current token start position to the
position given as the saved
parameter.
This is exactly equivalent to the following code:
T res = supplier.get(); // Calling the routine startLoc = saved; // Restoring the start positionThis is useful in semantic actions which have only recognized part of a syntactic construct (typically a complex literal opening delimiter, such as the opening double-quote of a literal string) and which call other rules of the lexer recursively in order to finish analyzing the current construct. In such cases, one may want to return the final resulting token as if its span covered the whole range from the opening construct. Calling nested rules inside a lambda passed to this method is a way to achieve this.
supplier
- a routine to runsaved
- the starting position to savesupplier
saveStart(Supplier)
protected final void saveStart(Runnable runnable)
savePosition(java.util.function.Supplier<T>, org.stekikun.dolmen.codegen.LexBuffer.Position)
(runnable, getLexemeStart())
.
This is exactly equivalent to the following code:
Position saved = getLexemeStart(); supplier.run(); // Calling the routine startLoc = saved; // Restoring the start position
runnable
- a routine to runsavePosition(Runnable, Position)
,
getLexemeStart()
protected final void savePosition(Runnable runnable, LexBuffer.Position saved)
runnable
but takes
care to restore the current token start position to the
position given as the saved
parameter.
This is exactly equivalent to the following code:
supplier.run(); // Calling the routine startLoc = saved; // Restoring the start positionThis is useful in semantic actions which have only recognized part of a syntactic construct (typically a complex literal opening delimiter, such as the opening double-quote of a literal string) and which call other rules of the lexer recursively in order to finish analyzing the current construct. In such cases, one may want to return the final resulting token as if its span covered the whole range from the opening construct. Calling nested rules inside a lambda passed to this method is a way to achieve this.
runnable
- a routine to runsaved
- the starting position to savesaveStart(Runnable)