Eclipse Xtext - Customising Lexer/Parser

parser flow

Xtext allows the lexer and/or the parser to be customised in certain ways. In order to do this we need to understand some of the internal workings of Xtext.

The above diagram is intended to give a top level picture of the process, there are 3 big external components involved here:

Xtext is providing the glue linking together these components. There is a sequence of events going on here:

The output of the Xtext parser is two separate tree structures:

The character stream is not altered by this sequence so, when text is referenced by tokens or nodes in the node model it can be referenced by indexes to the character stream.

Each token can refer to two separate text values:

Node Model

So each token can have two values. So if we take the macro example, the index for the macro will point to the macro name and the text value will contain the expansion of the macro.

PhantomToken

There is not an explicit mechanism for tokens which need to be used in the parser and will affect the EMF, but do not exist anywhere in the editor, such as the inserted curly brackets in the Python-like example above.

However we can cheat by making the start and stop indexes the same, this means that the token has little effect on the NodeModel. It is still important that the index values are contiguous with the tokens before and after it.

The best way to understand the indexes into the text stream is to think of the indexes as representing the spaces between the characters, not the characters, like this:

Index: 0   1   2   3   4   5   6   7
text stream:   {   {   {   a   }   }   }  

So the first character has index 0:1

The second 1:2 and so on.

This makes it easier to work out the indexes for composite nodes as well as leaf nodes. So, for example, the composite node holding the outer brackets is 0:7. The inner brackets are 2:5.

Customising Lexer

There are some more technical notes on page here and there are examples, for instance, Python-like whitespace block delineation and macroes here.

Diagram

Here is a diagrammatic representation of just part of what is going on here.

lexer

 


metadata block
see also:
Correspondence about this page

This site may have errors. Don't use for critical systems.

Copyright (c) 1998-2023 Martin John Baker - All rights reserved - privacy policy.