OSDev.org

The Place to Start for Operating System Developers
It is currently Fri Mar 29, 2024 7:04 am

All times are UTC - 6 hours




Post new topic Reply to topic  [ 49 posts ]  Go to page Previous  1, 2, 3, 4  Next
Author Message
 Post subject: Dont worry......
PostPosted: Sun May 20, 2007 7:50 pm 
Offline
Member
Member

Joined: Tue Jun 20, 2006 9:17 am
Posts: 566
Parsing is nt that hard....,you can make use of paser generators like
yacc or bison....,May be you can make you generic code more readable
by allowing higher level constructs(This requires a Parser)......this is what Ryndall Hyde has done with hla....But it'is written in hla itself and very diffcult to port....... , if you make something like that in C or pascal ... it will be far
easier to port .... and also compiler's can target your translator as a target
than the bare machine......This makes making portable compilers quite
easier...... I presented this as a paper ...but they did'nt like it very much............. 8) :P :D :idea: :idea:


Top
 Profile  
 
 Post subject:
PostPosted: Mon May 21, 2007 1:21 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

hckr83 wrote:
ok so to code for optimizations on some archs, you use 100s of registers, so on those few archs that do support, it will be fast...but what do you do when the target arch only has 16?
My idea is to use memory in order to store missing registers..


A better idea would be for the language to have no registers at all (where all variables are in memory), and for the compiler/optimizer to automatically decide to use CPU registers instead of the variables in memory where appropriate.

For e.g. you'd code something like:

Code:
    mov [variable],0xFOODBABE
    add [variable],3


And the compiler might generate:

Code:
    mov eax,0xFOODBABE
    add eax,3


hckr83 wrote:
lets say you wanted to program on 32bit for the generic assembly, but wanted to compile to use a 16bit arch. For example sake, we'll use 8086


The generic code:

Code:
;generic
mov dword r0,0xFFFF FFFE
add dword r0,1
sub dword r0,2
div dword r0,3
mul dword r0,4


Would produce something like:

Code:
;produced
mov ax,0xFFFF
mov bx,0xFFFF

add ax,1
adc bx,0

sub ax,2
sbb bx,0

push cx
push dx
push ax
mov ax,bx    ;ax = high word
mov dx,0     ;dx:ax = high word
mov cx,3
div cx           ;ax = high word / 3, dx = remainder
pop ax         ;ax = low word
add ax,dx
mov dx,0
adc dx,0      ;dx:ax = low word + remainder
div cx          ;ax = (low word + remainder) / 3
pop dx
pop cx

push dx
push cx
push si
push di
mov cx,4
mul cx        ;dx:ax = low word * 4
mov si,ax
mov di,dx  ;di:si = low word * 4
mov ax,bx
mul cx        ;dx:ax = high word * 4
mov bx,di
add bx,ax
mov ax,si   ;bx:ax = ((high word * 4) << 16) + (low word * 4)
pop di
pop si
pop cx
pop dx


Of course this is extremely unoptimized (and not tested)...

If the language had no registers at all, then you'd produce something like this:

Code:
;produced
mov word [variable],0xFFFF
mov word [variable+2],0xFFFF

add word [variable],1
adc word [variable+2],0

sub word [variable],2
sbb word [variable+2],0

mov ax,[variable+2]
mov dx,0
mov cx,3
div cx                         ;ax = high word / 3, dx = remainder
mov [variable+2],ax
mov ax,[variable]
add ax,dx
mov dx,0
adc dx,0
div cx                         ;ax = (low word + remainder) / 3
mov [variable],ax

mov ax,[variable]
mov cx,4
mul cx                       ;dx:ax = low dword * 4
mov [variable],ax
mov ax,[variable+2]
mov [variable+2],dx
mul cx
add [variable+2],ax


This is also extremely unoptimized, but it'd be easier for the compiler to see that SI, DI and BP are unused and decide to use SI:DI instead of "[variable]" and "[variable+2]".


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jun 01, 2007 9:52 pm 
Offline
Member
Member

Joined: Thu Jul 07, 2005 11:00 pm
Posts: 1546
hmmm...that'd be smart...

though optimizers are hard to make..lol

really though, what if you knew you wanted something stored in a register for speed, and though your not using it right hten, you will use it later..something no optimizer can detect...

hmm...what about a "register" flag or something you could put for a certain address of memory where "if possible put this in a register"

_________________
My new NEW blag


Top
 Profile  
 
 Post subject:
PostPosted: Sat Jun 02, 2007 1:15 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

hckr83 wrote:
really though, what if you knew you wanted something stored in a register for speed, and though your not using it right hten, you will use it later..something no optimizer can detect...

Using registers is faster, and not using registers is slower. Putting something in a register now that you intend to use much later on means that the register can't be used to speed up all the code between "now" and "later on". In this case you could improve performance by pushing the register on the stack, using the register to speed things up, then popping it off the stack much later on.

Think of it like an excercise in caching, where CPU registers are used to cache the most frequently used variables. As soon as you're caching something that isn't the most frequently used, then you start to lose efficiency.

BTW you'd be surprised what an optimizer could detect. It'd recommend doing some research into common optimizations (e.g. see this wiki page) before designing your language, so you can decide which features of your language will make it easier or harder to optimize later.

One thing I've been considering is replacing complete functions with table lookups. For example, consider a function like this:
Code:
int myFunction(char a, char b) {
    int result;

    result = ((a * b) / (a + b)) * b;
    result &= (a * b) / MY_CONSTANT;
    return result;
}

This could be optimized to:
Code:
    movzx eax,byte [a]
    mov ah,[b]
    mov eax,[__myFunction_table + eax * 4]
    ret

Of course to do this type of thing you need to implement an interpretter to generate the table, and it'd be nice if you could tell the compiler the exact range of a variable (so that if "b" only ever contains a value from 0 to 3, the optimizer could generate a 1024 byte table instead of a full 65536 byte table).

hckr83 wrote:
hmm...what about a "register" flag or something you could put for a certain address of memory where "if possible put this in a register"

If your optimizer/s are any good, the "register" flag should be considered a suggestion to the optimizer (and mostly ignored), just like it is in C. It's more important to have it's opposite ("volatile") which tells the optimizer that the variable can't be cached in a register (which the optimizer must not ignore).

Optimizing is hard, and optimizing well is a lot harder. My suggestion is to forget about optimizing while you're implementing the compiler, but leave places where you could insert optimizers later (e.g. between "tokenising" and "compiling", and between "compiling" and "assembling").


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Jun 02, 2007 7:47 am 
Offline
Member
Member

Joined: Thu Jul 07, 2005 11:00 pm
Posts: 1546
Making the tokenizer is what I can't do...I have tried like 10 times to make something convert a plain text file in a language convert to tokens...it just doesn't work for me...:-(

hmm...
what about a registers only language?
I think it simplifies things a LOT if you only have to worry about registers or memory...kinda one or the other..

course, I don't think registers only could have many uses practically.. So probably memory only would be best..what about storing addresses in registers though? would that be best done by the optimizer?

another bit, what about having portable bits? like be able to choose between one amount of bits to make the code in, and then that can be compiled to any other amount of bits? course...this also brings up the problem of endianess..and also the way the computer stores memory(most significant byte first or least significant byte first)

I just don't see anyway to combat those problems without major speed penalties..

_________________
My new NEW blag


Top
 Profile  
 
 Post subject:
PostPosted: Sat Jun 02, 2007 8:36 am 
Offline
Member
Member
User avatar

Joined: Thu Jan 04, 2007 3:29 pm
Posts: 1466
Location: Noricum and Pannonia
hckr83 wrote:
Making the tokenizer is what I can't do...I have tried like 10 times to make something convert a plain text file in a language convert to tokens...it just doesn't work for me...:-(

Tokenizers aren't really that difficult. You could even use flex to do it for you.

If you write it yourself, you might want to look into finite state automata's as they do help.

_________________
C8H10N4O2 | #446691 | Trust the nodes.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Jun 02, 2007 9:53 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

hckr83 wrote:
Making the tokenizer is what I can't do...I have tried like 10 times to make something convert a plain text file in a language convert to tokens...it just doesn't work for me...:-(


A quick crash course....

First you need some output routines which would send the tokenized data somewhere. For now just send the tokens to the screen and worry about the details later:

Code:
void sendSimpleToken(unsigned short token) {
    printf("%d\n", token);
}

void sendStringToken(unsigned short token, char *string) {
    printf("%d '%s'\n", token, string);
}

void sendNumberToken(unsigned short token, long int number) {
    printf("%d %d\n", token, number);
}


Next you want some framework for your tokenizer. I'm going to assume it's a line-oriented language (like assembly) and your tokenizer is sent lines of ASCII by the code that reads the source file/s from disk:

Code:
void tokenize(char *input) {
    while(*input != 0) {
        printf("Syntax error!\n");
    }
}


Now you probably want to ignore any white space between things. That's easy enough:

Code:
void tokenize(char *input) {
    while(*input != 0) {
        if( (*input = ' ')  || (*input = '\t') || (*input = '\r') || (*input = '\n') ) {
            input++;
        }
        printf("Syntax error!\n");
    }
}


Every language has numerical constants, so let's add them next:

Code:
enum {
    TOKEN_NUMBER,
}

void tokenize(char *input) {
    long int number;
    char *temp;

    while(*input != 0) {
        if( (*input = ' ')  || (*input = '\t') || (*input = '\r') || (*input = '\n') ) {
            input++;
        } else {
            number = strtol(input, &temp, 0);
            if(*temp != input) {
                sendNumberToken(TOKEN_NUMBER, number);
            }
        }
        printf("Syntax error!\n");
    }
}


Then there's identifiers (labels, keywords, etc):

Code:
enum {
    TOKEN_NUMBER,
    TOKEN_IDENTIFIER,
}

void tokenize(char *input) {
    long int number;
    char *temp;
    char identifier[256];
    int i;

    while(*input != 0) {
        if( (*input = ' ')  || (*input = '\t') || (*input = '\r') || (*input = '\n') ) {
            input++;
        } else {
            number = strtol(input, &temp, 0);
            if(temp != input) {
                sendNumberToken(TOKEN_NUMBER, number);
                input = temp;
            } else {
                i = 0;
                while( (input[i] > 'A') && (input[i] < 'Z') ) {
                    identifier[i] = input[i];
                    i++;
                }
                if(i != 0) {
                    identifier[i] = 0;
                    sendStringToken(TOKEN_IDENTIFIER, identifier);
                    input += i;
                }
            }
        }
        printf("Syntax error!\n");
    }
}


You'd probably also want to handle operators, punctuation, etc. I'll only do a few:

Code:
enum {
    TOKEN_NUMBER,
    TOKEN_IDENTIFIER,
    TOKEN_LEFTBRACKET,
    TOKEN_RIGHTBRACKET,
    TOKEN_MULTIPLY,
    TOKEN_ADD,
}

void tokenize(char *input) {
    long int number;
    char *temp;
    char identifier[256];
    int i;

    while(*input != 0) {
        if( (*input = ' ')  || (*input = '\t') || (*input = '\r') || (*input = '\n') ) {
            input++;
        } else if(*input == '*') {
            sendSimpleToken(TOKEN_MULTIPLY);
        } else if(*input == '+') {
            sendSimpleToken(TOKEN_ADD);
        } else if(*input == '(') {
            sendSimpleToken(TOKEN_LEFTBRACKET);
        } else if(*input == ')') {
            sendSimpleToken(TOKEN_RIGHTBRACKET);
        } else {
            number = strtol(input, &temp, 0);
            if(temp != input) {
                sendNumberToken(TOKEN_NUMBER, number);
                input = temp;
            } else {
                i = 0;
                while( (input[i] > 'A') && (input[i] < 'Z') ) {
                    identifier[i] = input[i];
                    i++;
                }
                if(i != 0) {
                    identifier[i] = 0;
                    sendStringToken(TOKEN_IDENTIFIER, identifier);
                    input += i;
                }
            }
        }
        printf("Syntax error!\n");
    }
}


Handling string constants would be a good idea. Also if your language has comments like assembly you could probably stop tokenizing when you get a comment character:

Code:
enum {
    TOKEN_NUMBER,
    TOKEN_IDENTIFIER,
    TOKEN_STRING,
    TOKEN_LEFTBRACKET,
    TOKEN_RIGHTBRACKET,
    TOKEN_MULTIPLY,
    TOKEN_ADD,
}

void tokenize(char *input) {
    long int number;
    char *temp;
    char identifier[256];
    int i;

    while(*input != 0) {
        if( (*input = ' ')  || (*input = '\t') || (*input = '\r') || (*input = '\n') ) {
            input++;
        } else if(*input == ';') {
            return;
        } else if(*input == '*') {
            sendSimpleToken(TOKEN_MULTIPLY);
        } else if(*input == '+') {
            sendSimpleToken(TOKEN_ADD);
        } else if(*input == '(') {
            sendSimpleToken(TOKEN_LEFTBRACKET);
        } else if(*input == ')') {
            sendSimpleToken(TOKEN_RIGHTBRACKET);
        } else if(*input == '"') {
            i = 1;
            while( (input[i] != '"') && (input[i] != 0) {
                i++;
            }
            if(input[i] == 0) {
                printf("Hey - there's no right quote character on that string!\n");
                exit(EXIT_FAILURE);
            }
            input[i] = 0;
            sendStringToken(TOKEN_STRING, input);
            input += i + 1;
        } else {
            number = strtol(input, &temp, 0);
            if(temp != input) {
                sendNumberToken(TOKEN_NUMBER, number);
                input = temp;
            } else {
                i = 0;
                while( (input[i] > 'A') && (input[i] < 'Z') ) {
                    identifier[i] = input[i];
                    i++;
                }
                if(i != 0) {
                    identifier[i] = 0;
                    sendStringToken(TOKEN_IDENTIFIER, identifier);
                    input += i;
                }
            }
        }
        printf("Syntax error!\n");
        exit(EXIT_FAILURE);
    }
}


Lastly, there's keywords. At the moment a keyword would end up being tokenized as an identifier. If your compiler handles several languages (e.g. C and inline assembly, or NASM and AT&T) it's probably better to leave them for the parser to sort out. Otherwise, you could start adding them:

Code:
enum {
    TOKEN_NUMBER,
    TOKEN_IDENTIFIER,
    TOKEN_STRING,
    TOKEN_LEFTBRACKET,
    TOKEN_RIGHTBRACKET,
    TOKEN_MULTIPLY,
    TOKEN_ADD,
    TOKEN_FOO_KEYWORD,
    TOKEN_BAR_KEYWORD,
}

void tokenize(char *input) {
    long int number;
    char *temp;
    char identifier[256];
    int i;

    while(*input != 0) {
        if( (*input = ' ')  || (*input = '\t') || (*input = '\r') || (*input = '\n') ) {
            input++;
        } else if(*input == ';') {
            return;
        } else if(*input == '*') {
            sendSimpleToken(TOKEN_MULTIPLY);
        } else if(*input == '+') {
            sendSimpleToken(TOKEN_ADD);
        } else if(*input == '(') {
            sendSimpleToken(TOKEN_LEFTBRACKET);
        } else if(*input == ')') {
            sendSimpleToken(TOKEN_RIGHTBRACKET);
        } else if(*input == '"') {
            i = 1;
            while( (input[i] != '"') && (input[i] != 0) {
                i++;
            }
            if(input[i] == 0) {
                printf("Hey - there's no right quote character on that string!\n");
                exit(EXIT_FAILURE);
            }
            input[i] = 0;
            sendStringToken(TOKEN_STRING, input);
            input += i + 1;
        } else {
            number = strtol(input, &temp, 0);
            if(temp != input) {
                sendNumberToken(TOKEN_NUMBER, number);
                input = temp;
            } else {
                i = 0;
                while( (input[i] > 'A') && (input[i] < 'Z') ) {
                    identifier[i] = input[i];
                    i++;
                }
                if(i != 0) {
                    identifier[i] = 0;
                    if(strcasecmp(identifier, "FOO") == 0) {
                        sendSimpleToken(TOKEN_FOO_KEYWORD);
                    } else if(strcasecmp(identifier, "BAR") == 0) {
                        sendSimpleToken(TOKEN_BAR_KEYWORD);
                    } else {
                        sendStringToken(TOKEN_IDENTIFIER, identifier);
                    }
                    input += i;
                }
            }
        }
        printf("Syntax error!\n");
        exit(EXIT_FAILURE);
    }
}


Most of the code above is fairly dodgy (untested, and intended as an example only), but it should be able to tokenize something like this:

Code:
; A comment
    FOO 1 * (3 + 4)
    BAR "Hello World"


Of course it'd also accept some gibberish like this:

Code:
   (FOO BAR) FOO + BAR "Hello" * 3


That's OK - the next stage of your compiler can work out what is gibberish and what isn't.

There's also a lot missing from the example code. For e.g. you'd want to be able to handle larger integers and floating point constants (I store all numerical constants as arbitrary precision floating point). It's also a good idea to use an array of keywords and write a function that checks each entry in the array to see if a string is a keyword or not, instead of using a "strcasecmp" for each possible keyword (it's easier to maintain that way, and means you can use hash tables or something to improve performance later on). You'd also want to add a generic error handling function, so that the compiler will tell you which line in which file it didn't like.

After writing your tokenizer you'd want to replace those output routines. For this I tend to add the output to a buffer, and send the buffer to another thread when it gets full or when there's nothing left to tokenize (but that's just how I do things - most people would just collect it all in memory).

I'd also be tempted to write the reverse - some code to convert the tokens back into ASCII. This just makes it easier to debug your compiler later.



Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jun 07, 2007 4:07 pm 
Offline
Member
Member
User avatar

Joined: Wed Feb 07, 2007 1:45 pm
Posts: 1401
Location: Eugene, OR, US
Brendan wrote:
Optimizing is hard, and optimizing well is a lot harder. My suggestion is to forget about optimizing while you're implementing the compiler, but leave places where you could insert optimizers later


I completely agree with the premise, but I say that the conclusion logically contradicts the premise. :wink:

If you have something like the old Whitesmith C compiler that will generate 100 lines of asm from a particular source code -- and GCC that will emit 500 lines of asm for the same code -- and if you ignore optimization completely and get 3000 lines of asm for the same code out of yours -- where an asm hacker can do it all in 40 lines of asm ...

you will NEVER significantly compress the code down later with an optimizer (because optimizing really is damned hard!), for any of the inefficient compilers. The one that will optimize best, later, is the one that was the best optimized to begin with.

And every piece of code that you compile with your inefficient compiler will be inefficient -- bogging down any system you ever run that code on.

Developer time is NOT always the most important factor. There are two pieces of software that are the fulcrum of the entire system -- the kernel, and the compiler. Spending the extra time to polish both of those foundation stones into slick shiny jewels will produce endless rewards, later.


Top
 Profile  
 
 Post subject:
PostPosted: Thu Jun 07, 2007 7:21 pm 
Offline
Member
Member
User avatar

Joined: Tue Nov 09, 2004 12:00 am
Posts: 843
Location: United States
bewing wrote:
Brendan wrote:
Optimizing is hard, and optimizing well is a lot harder. My suggestion is to forget about optimizing while you're implementing the compiler, but leave places where you could insert optimizers later


I completely agree with the premise, but I say that the conclusion logically contradicts the premise. :wink:

If you have something like the old Whitesmith C compiler that will generate 100 lines of asm from a particular source code -- and GCC that will emit 500 lines of asm for the same code -- and if you ignore optimization completely and get 3000 lines of asm for the same code out of yours -- where an asm hacker can do it all in 40 lines of asm ...

you will NEVER significantly compress the code down later with an optimizer (because optimizing really is damned hard!), for any of the inefficient compilers. The one that will optimize best, later, is the one that was the best optimized to begin with.

And every piece of code that you compile with your inefficient compiler will be inefficient -- bogging down any system you ever run that code on.

Developer time is NOT always the most important factor. There are two pieces of software that are the fulcrum of the entire system -- the kernel, and the compiler. Spending the extra time to polish both of those foundation stones into slick shiny jewels will produce endless rewards, later.


umm... ok. I agree with what you are saying, but not in the context of this thread and the thread's original post's author. You have to slow down and realize that the guy is actually trying to learn how to write a compiler not revolutionize the world with a multi-million dollar compiler, which I think is the point Brendan had in mind. Notice he said he was having trouble learning to parse syntax of some language I think that is grounds for him to think and start small. Since as you know every prototype gets rebuilt. :P

...but I say that the conclusion logically contradicts the premise... if we were writing a production compiler instead of for a educational factor.

_________________
Projects


Top
 Profile  
 
 Post subject:
PostPosted: Fri Jun 08, 2007 8:48 pm 
Offline
Member
Member
User avatar

Joined: Wed Feb 07, 2007 1:45 pm
Posts: 1401
Location: Eugene, OR, US
Kevin McGuire wrote:
umm... ok. I agree with what you are saying, but not in the context of this thread and the thread's original post's author. You have to slow down and realize that the guy is actually trying to learn how to write a compiler not revolutionize the world with a multi-million dollar compiler, which I think is the point Brendan had in mind. Notice he said he was having trouble learning to parse syntax of some language I think that is grounds for him to think and start small. Since as you know every prototype gets rebuilt. :P


You are correct, and my post was not as well phrased as it could have been. *My* OS is dead-serious, and sometimes I have trouble remembering that many/most people here are doing this for fun or as a hobby. My post was also in reply to several intermediate posts in this thread that I did not bother going back to quote.


Top
 Profile  
 
 Post subject:
PostPosted: Sat Jun 09, 2007 12:42 am 
Offline
Member
Member
User avatar

Joined: Sat Jan 15, 2005 12:00 am
Posts: 8561
Location: At his keyboard!
Hi,

bewing wrote:
Kevin McGuire wrote:
umm... ok. I agree with what you are saying, but not in the context of this thread and the thread's original post's author. You have to slow down and realize that the guy is actually trying to learn how to write a compiler not revolutionize the world with a multi-million dollar compiler, which I think is the point Brendan had in mind. Notice he said he was having trouble learning to parse syntax of some language I think that is grounds for him to think and start small. Since as you know every prototype gets rebuilt. :P


You are correct, and my post was not as well phrased as it could have been. *My* OS is dead-serious, and sometimes I have trouble remembering that many/most people here are doing this for fun or as a hobby. My post was also in reply to several intermediate posts in this thread that I did not bother going back to quote.


Even in a more general context I'm not sure I agree...

For most high level languages you'd preprocess and tokenize, convert to some sort of intermediate language, optimize (dead code elimination, loop optimizations, function inlining, etc), compile from intermediate language to intermediate assembly, optimize again (peephole, cache management, instruction scheduling), then assemble, and output a file.

Ignoring preprocessing and all the optimizing stages, this only leaves optimizing register allocation and (depending on how good your peephole optimizer is) some instruction selection in the "compile from intermediate language to intermediate assembly" stage.

Of course you'd want to add the preprocessing and optimizing stages later, but until the other stages are working right they just get in the way and make things harder IMHO....


Cheers,

Brendan

_________________
For all things; perfection is, and will always remain, impossible to achieve in practice. However; by striving for perfection we create things that are as perfect as practically possible. Let the pursuit of perfection be our guide.


Top
 Profile  
 
 Post subject: Breif guide on tokeninzing
PostPosted: Sat Jun 23, 2007 8:53 am 
Offline
Member
Member

Joined: Tue Jun 20, 2006 9:17 am
Posts: 566
We have system software lab......., I have done a fair bit of reaseach on tokenizing ....... , Here are the few methods that will help you in tokenzing

1) Simplest and the dirtiest :
make use of the function str_tok function available in the C - libraty
(This is not recomended ..check the man pages.....)

2) Make an adhoc lexer : Something which i initially did.......
like this (pseudo code ..not entirely correct)
char token[80];
char *exp_ptr=token;
while(!end_of_file)
{
ch=fgetc(inp_file);
if(ch==' ')ch=fgetc(inp_file); // skip white space.....
*exp_ptr=ch;
while(ch!=' ')
{
exp_ptr++;
*exp_ptr=ch;
}
*exp_ptr='\0';
return token;
}

3) Much better method is to hard code a DFA (deterministic finite automata) table into your lexer.....This is cleaner and better than the
earler methods.there are many ways of doing this
one way is to use a array for implementing the transistion function,
(see adrew s appel )or else you can again hardcode the transistion function also...

4) Easier way is to use lex ... which will automatically generate and implement a DFA (NFA) for you .......It s used like this
{%
/*put declarations here*/
%}
%%
/*
eg
a(a|b)* { printf("ab string");}

/* put rules here*/
%%

/* other functions*/



I recommend method three coz ... i am wriing a compiler to lean not to
outperform gcc or microsoft.... My compiler is not yet compleate...In fact
it's not even begun well.....:)Gee ..it works sporadically......It uses method 2... Yea i am pragmatic guy...


Top
 Profile  
 
 Post subject: Slight correction....
PostPosted: Mon Jun 25, 2007 1:18 am 
Offline
Member
Member

Joined: Tue Jun 20, 2006 9:17 am
Posts: 566
It's actually strtok not str_tok() ....also plse forgive my spell errors and
other grammar mistakes(I apologize for my ambigous grammar !!! :P ,
which makes parsing difficult :shock: ..... :D)....


Top
 Profile  
 
 Post subject: Hi Guys..................
PostPosted: Sun Jul 08, 2007 1:51 am 
Offline
Member
Member

Joined: Tue Jun 20, 2006 9:17 am
Posts: 566
<Stupidity Deleted >


Last edited by DeletedAccount on Sat Apr 11, 2009 7:13 am, edited 1 time in total.

Top
 Profile  
 
 Post subject:
PostPosted: Sun Jul 08, 2007 4:45 pm 
Offline
Member
Member

Joined: Thu Jul 07, 2005 11:00 pm
Posts: 1546
ok?
what do you mean "Placed in Microsoft" ?

_________________
My new NEW blag


Top
 Profile  
 
Display posts from previous:  Sort by  
Post new topic Reply to topic  [ 49 posts ]  Go to page Previous  1, 2, 3, 4  Next

All times are UTC - 6 hours


Who is online

Users browsing this forum: Google [Bot] and 35 guests


You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot post attachments in this forum

Search for:
Jump to:  
Powered by phpBB © 2000, 2002, 2005, 2007 phpBB Group