public final class Tokenizer extends Object
These methods are both useful for quick-and-dirty testing of a lexical analyzer, e.g. manually typing test input sequences and checking the analyzer behaves as expected, as well as for testing complete files and recording the precise results of the lexical analysis, which can be very useful as a non-regression tool in the continuous integration of a project.
In particular, the input positions associated to tokens can be retrieved as well, as they are usually important for error reporting and used in parsers being built on top of the lexical analyzers as well. They are usually hard to thoroughly check and debug, in particular as subtle changes to a lexer description can break locations without affecting the behaviour of the analyzer otherwise.
tokenize(LexerInterface, String, Reader, Writer, boolean)
,
prompt(LexerInterface, boolean)
,
file(LexerInterface, File, File, boolean)
Modifier and Type | Class and Description |
---|---|
static interface |
Tokenizer.LexerInterface<L extends LexBuffer,T>
This interface acts as a generic proxy to using a Dolmen-generated
lexer in the static debugging functions provided in
Tokenizer . |
Modifier and Type | Method and Description |
---|---|
static <L extends LexBuffer,T> |
file(Tokenizer.LexerInterface<L,T> lexer,
File input,
File output,
boolean positions)
Uses the given
lexer interface to tokenize the contents
of the file input , and stores the result in the
output file. |
static <L extends LexBuffer,T> |
prompt(Tokenizer.LexerInterface<L,T> lexer,
boolean positions)
This method can be used to conveniently test a lexical analyzer
against various one-line sentences entered manually or fed from a
test file.
|
static <L extends LexBuffer,T> |
tokenize(Tokenizer.LexerInterface<L,T> lexer,
String inputName,
Reader reader,
Writer writer,
boolean positions)
Initializes a lexical analyzer with the given input stream,
based on the
lexer interface, and repeatedly consumes
tokens from the input until the halting condition in lexer
is met. |
public static <L extends LexBuffer,T> void tokenize(Tokenizer.LexerInterface<L,T> lexer, String inputName, Reader reader, Writer writer, boolean positions)
lexer
interface, and repeatedly consumes
tokens from the input until the halting condition in lexer
is met. The tokens are displayed, one per line, using the given
writer
. Optionally, the start and end positions of each
token can be displayed as well along the token.
Potential lexical and IO errors are caught and displayed, and abort the tokenization process. This method does not attempt to close the given reader/writer streams, this should be handled by the caller as necessary.
lexer
- an interface to the lexical analyzer to useinputName
- a user-friendly name describing the inputreader
- character stream to feed the lexer withwriter
- character stream to write the tokens topositions
- whether token locations are displayed as wellpublic static <L extends LexBuffer,T> void prompt(Tokenizer.LexerInterface<L,T> lexer, boolean positions)
lexer
, as described
by tokenize(LexerInterface, LexBuffer, Writer, boolean)
.
In response, the tokens are displayed on standard output, one per line.
Optionally, the start and end positions of each
token can be displayed as well along the token.
Potential lexical and IO errors are caught and displayed, and handling of the subsequent lines on standard input resumes normally. The method stops when encountering end-of-input or a totally empty line. Of course, this method is not suitable to test sentences which themselves contain line breaks.
lexer
- an interface to the lexical analyzer to usepositions
- whether token locations are displayed as wellpublic static <L extends LexBuffer,T> void file(Tokenizer.LexerInterface<L,T> lexer, File input, File output, boolean positions)
lexer
interface to tokenize the contents
of the file input
, and stores the result in the
output
file. The tokenization process repeatedly consumes
tokens from the input until the halting condition in lexer
is met. The tokens are displayed, one per line, in the output.
Optionally, the start and end positions of each
token can be displayed as well along the token.
Potential lexical and IO errors are caught and displayed on standard output, and abort the tokenization process.
lexer
- an interface to the lexical analyzer to usepositions
- whether token locations are displayed as well