Axiom/FriCAS Interpreter Tokeniser

This is part of some experimental code that I am writing to implement the FriCAS interpreter using SPAD code. For an overview of this experiment see page here. For information about how this is done using the current boot/lisp code see the page here.

Here we describe a scanner or tokeniser for our interpreter. This takes the input string holding the input line and converts it to a list of tokens.

How It Works

Each token, generated by this code, consists of a token type and a string with its acual value. For instance, if the token type is 'key' then the sting will hold the particular keyword such as: "macro".

Token Type Meaning
id identifier such as the name of a variable
key keyword
integer A numeric integer literal. If it is negative this will not be held in this token but there will be a '-' token preceeding it.
rinteger  
float This holds numeric values but it may also have '.' 'e' 'E' and '-' values. It is difficult to scan this as a single terminal value
string any characters wrapped in double quotes.
comment  
negcomment  
error  
spaces  

This tokeniser is driven by a state table, as we scan across the input line this determines the next state depending on the character being scanned.

  Character Just Read
Current State   space double quote alphabetic numeric other
init space string sym integ op
space space string sym integ op
string string init string string string
sym space string sym sym (symbol names can contain numeric values) op
integ space string sym or float if 'e' or 'E' integ

op or float if '.'

float space string sym or float if 'e' or 'E' float op
op space string sym integ op
comment comment comment comment comment comment

'comment' state is triggered if 'op' contains '--' or '++'.

Each time the state changes a new token is added to the list being generated.

In the case of errors a error token will be put in the token list. There is a function to scan the list for error tags. If this is true then the following stages of parsing need not be carried out and the error string can be displayed.

Testing It

We can try out the tokeniser in isolation by calling 'spadTokenise' from the existing interpreter. For information about downloading and compiling the code see this page.

(1) -> spadTokenise("1+2")

   (1)  [integer="1",key="PLUS",integer="2"]
                                                              Type: Tokeniser
(2) -> spadTokenise("1.0 + a3")

   (2)  [float="1.0",spaces=" ",key="PLUS",spaces=" ",id="a3"]
                                                              Type: Tokeniser
(3) -> spadTokenise("b2= -3")  

   (3)  [id="b2",key="EQUAL",spaces=" ",key="MINUS",integer="3"]
                                                              Type: Tokeniser

To Do

There are still some things to be fixed

(4) -> spadTokenise("b2=-3") 

   (4)  [id="b2",error="=-",integer="3"]
                                                              Type: Tokeniser
(5) -> spadTokenise("2e-6") 

   (5)  [float="2e",key="MINUS",integer="6"]
                                                              Type: Tokeniser

Next step

The output of this tokeniser is passed on to the parser as described on the page here.


metadata block
see also:
Correspondence about this page

This site may have errors. Don't use for critical systems.

Copyright (c) 1998-2023 Martin John Baker - All rights reserved - privacy policy.