Friday, April 27, 2012

MCQ on Computer Organization




Q.1   In Reverse Polish notation, expression A*B+C*D is written as
      (A) AB*CD*+                                    (B) A*BCD*+
      (C) AB*CD+*                                    (D) A*B*CD+
      Ans: A

Q.2   SIMD represents an organization that ______________.
      (A) refers to a computer system capable of processing  
          several programs at the same time.
      (B) represents organization of single computer containing
          a control unit, processor unit and a memory unit.
      (C) includes many processing units under the supervision
          of a common control unit
      (D) none of the above.
      Ans: C
Note:
Single instruction, multiple data (SIMD), is a class of parallel computers in Flynn's taxonomy. It describes computers with multiple processing elements that perform the same operation on multiple data simultaneously. Thus, such machines exploit data level parallelism.
Ref. :  wiki

Q.3   Floating point representation is used to store
      (A) Boolean values      (B) whole numbers
      (C) real integers       (D) integers
      Ans: C

Q.4   Suppose that a bus has 16 data lines and requires 4 cycles of 250 nsecs each to transfer data. The bandwidth of this bus would be 2 Megabytes/sec. If the cycle time of the bus was reduced to 125 nsecs and the number of cycles required for transfer stayed the same what would the bandwidth of the bus?
      (A) 1 Megabyte/sec        (B) 4 Megabytes/sec
      (C) 8 Megabytes/sec       (D) 2 Megabytes/sec
      Ans: D

Q.5   Assembly language
      (A) uses alphabetic codes in place of binary numbers used
          in machine language
      (B) is the easiest language to write programs
      (C) need not be translated into machine language
      (D) None of these
      Ans: A

Q.6   In computers, subtraction is generally carried out by
      (A) 9's complement    (B) 10's complement
      (C) 1's complement    (D) 2's complement
      Ans: D

Q.7   The amount of time required to read a block of data from a disk into memory is composed of seek time, rotational latency, and transfer time. Rotational latency refers to
  (A) the time its takes for the platter to make a full rotation
  (B) the time it takes for the read-write head to move into position over the appropriate track
  (C) the time it takes for the platter to rotate the correct  sector under the head
  (D) none of the above
      Ans: A

Q.8   What characteristic of RAM memory makes it not suitable  
      for permanent storage?
      (A) too slow       (B) unreliable
      (C) it is volatile (D) too bulky
      Ans: C

Q.9   Computers use addressing mode techniques for _____________________.
      (A) giving programming versatility to the user by providing facilities as pointers to memory counters for loop control
      (B) to reduce no. of bits in the field of instruction
      (C) specifying rules for modifying or interpreting address field of the instruction
      (D) All the above
      Ans: D

Q.10 The circuit used to store one bit of data is known as
      (A) Register       (B) Encoder
      (C) Decoder        (D) Flip Flop
      Ans: D

Q. 11 (2FAOC) of base 16 is equivalent to
      (A) (195 084)10        (B) (001011111010 0000 1100)2
      (C) Both (A) and (B)   (D) None of these
      Ans: B

Q.12 The average time required to reach a storage location in memory and obtain its contents is called the
      (A) seek time          (B) turnaround time
      (C) access time        (D) transfer time
      Ans: C

Q.13 Which of the following is not a weighted code?
      (A) Decimal Number system     (B) Excess 3-cod
      (C) Binary number System      (D) None of these
      Ans: B

Q.14 The idea of cache memory is based
      (A) on the property of locality of reference
      (B) on the heuristic 90-10 rule
      (C) on the fact that references generally tend to cluster
      (D) all of the above
      Ans: A

Q.15 _________ register keeps track of the instructions stored in program stored in
      memory.
      (A) AR (Address Register)    (B) XR (Index Register)
      (C) PC (Program Counter)     (D) AC (Accumulator)
      Ans: C

Q.16 The addressing mode used in an instruction of the form ADD X Y, is
      (A) Absolute            (B) indirect
      (C) index               (D) none of these
      Ans: C

Q.17 If memory access takes 20 ns with cache and 110 ns with out it, then the ratio ( cache uses a 10 ns memory) is
      (A) 93%                                    (B) 90%
      (C) 88%                                    (D) 87%
      Ans: B

Q.18 In a memory-mapped I/O system, which of the following will not be there?
      (A) LDA                                    (B) IN
      (C) ADD                                    (D) OUT
      Ans: A

Q.19 In a vectored interrupt.
      (A) the branch address is assigned to a fixed location in   memory.
      (B) the interrupting source supplies the branch information to the processor through an interrupt vector.
      (C) the branch address is obtained from a register in the  processor
      (D) none of the above
      Ans: B

Q.20 Von Neumann architecture is
      (A) SISD                                    (B) SIMD
      (C) MIMD                                    (D) MISD
      Ans: A

Q. 21 The circuit used to store one bit of data is known as
      (A) Encoder                                 (B) OR gate
      (C) Flip Flop                               (D) Decoder
      Ans: C

Q.22 Cache memory acts between
      (A) CPU and RAM                             (B) RAM and ROM
      (C) CPU and Hard Disk                       (D) None of these
      Ans: A

Q.23 Write Through technique is used in which memory for updating the
     data
      (A) Virtual memory                          (B) Main memory
      (C) Auxiliary memory                        (D) Cache memory
      Ans: D

Q.24 Generally Dynamic RAM is used as main memory in a computer
      system as it
      (A) Consumes less power      (B) has higher speed
      (C) has lower cell density   (D) needs refreshing circuitary
      Ans: B

Q.25 In signed-magnitude binary division, if the dividend is (11100)2 and divisor is
      (10011)2 then the result is
      (A) (00100)2                             (B) (10100)2
      (C) (11001)2                             (D) (01100)2
      Ans: B

Q.26 Virtual memory consists of
      (A) Static RAM                           (B) Dynamic RAM
      (C) Magnetic memory                      (D) None of these
      Ans: A

Q.27 In a program using subroutine call instruction, it is necessary
      (A) initialise program counter
      (B) Clear the accumulator
      (C) Reset the microprocessor
      (D) Clear the instruction register
      Ans: D

Q.28 A Stack-organised Computer uses instruction of
      (A) Indirect addressing                  (B) Two-addressing
      (C) Zero addressing                      (D) Index addressing
      Ans: C

Q.29 If the main memory is of 8K bytes and the cache memory is of 2K words. It uses associative mapping. Then each word of cache memory shall be
      (A) 11 bits                              (B) 21 bits
      (C) 16 bits                              (D) 20 bits
      Ans: C

Q.30 A-Flip Flop can be converted into T-Flip Flop by using additional logic circuit
      (A) D = T * Qn                          (B) D = T
      (C) D = T . Qn                           (D) D = T•Qn
      Ans: D

Q.31 Logic X-OR operation of (4ACO)H & (B53F)H results
      (A) AACB                                 (B) 0000
      (C) FFFF                                 (D) ABCD
      Ans: C

Q.32 When CPU is executing a Program that is part of the Operating System, it is said to be
      in
      (A) Interrupt mode                           (B) System mode
      (C) Half mode                                (D) Simplex mode
      Ans: B

Q.33 An n-bit microprocessor has
      (A) n-bit program counter     (B) n-bit address register
      (C) n-bit ALU             (D) n-bit instruction register
      Ans: D

Q.34 Cache memory works on the principle of
      (A) Locality of data        (B) Locality of memory
      (C) Locality of reference   (D) Locality of reference & memory
      Ans: C

Q.35 The main memory in a Personal Computer (PC) is made of
      (A) cache memory.           (B) static RAM
      (C) Dynamic Ram             (D) both (A) and (B).
      Ans: D

Q.36 In computers, subtraction is carried out generally by
      (A) 1's complement method
      (B) 2's complement method
      (C) signed magnitude method
      (D) BCD subtraction method
      Ans: B

Q.37 PSW is saved in stack when there is a
      (A) interrupt recognised
      (B) execution of RST instruction
      (C) Execution of CALL instruction
      (D) All of these
      Ans: A

Q.38 The multiplicand register & multiplier register of a hardware circuit implementing
      booth's algorithm have (11101) & (1100). The result shall be
      (A) (812)10                                  (B) (-12)10
      (C) (12)10                                   (D) (-812)10
      Ans: A

Q.39 The circuit converting binary data in to decimal is
      (A) Encoder         (B) Multiplexer
      (C) Decoder         (D) Code converter
      Ans: D

Q.40 A three input NOR gate gives logic high output only when
      (A) one input is high     (B) one input is low
      (C) two input are low     (D) all input are high
      Ans: D

Q.41 n bits in operation code imply that there are ___________ possible distinct operators
        (A) 2n                                       (B) 2n
        (C) n/2                                      (D) n2
        Ans: B

Q.42    _________ register keeps tracks of the instructions stored in program stored in memory.
        (A) AR (Address Register)      (B) XR (Index Register)
        (C) PC (Program Counter)       (D) AC (Accumulator)
        Ans: C

Q.43    Memory unit accessed by content is called
        (A) Read only memory     (B) Programmable Memory
        (C) Virtual Memory       (D) Associative Memory
        Ans: D

Q.44    "Aging registers" are
        (A) Counters which indicate how long ago their associated pages have been referenced.
        (B) Registers which keep track of when the program was last accessed.
        (C) Counters to keep track of last accessed instruction.
        (D) Counters to keep track of the latest data structures referred.
        Ans: A

Q.45    The instruction "ORG O" is a
        (A) Machine Instruction.        (B) Pseudo instruction.
        (C) High level instruction.     (D) Memory instruction.
        Ans: B

Q.46    Translation from symbolic program into Binary is done in
        (A) Two passes.                              (B) Directly
        (C) Three passes.                            (D) Four passes.
      Ans: A

Q.47  A floating point number that has a O in the MSB of mantissa is said to have
      (A) Overflow                                  (B) Underflow
      (C) Important number                          (D) Undefined
      Ans: B

Q.48  The BSA instruction is
      (A) Branch and store accumulator
      (B) Branch and save return address
      (C) Branch and shift address
      (D) Branch and show accumulator
      Ans: B

Q.49  State whether True or False.
      (i)     Arithmetic operations with fixed point numbers take longer time for execution as compared to with floating point numbers.
      Ans: True.
      (ii)    An arithmetic shift left multiplies a signed binary number by 2.
      Ans: False.

Q.50  Logic gates with a set of input and outputs is arrangement of
      (A) Combinational circuit                     (B) Logic circuit
      (C) Design circuits                           (D) Register
      Ans: A

Q.51  MIMD stands for
      (A) Multiple instruction multiple data
      (B) Multiple instruction memory data
      (C) Memory instruction multiple data
      (D) Multiple information memory data
      Ans: A

Q.52  A k-bit field can specify any one of
      (A) 3k registers                              (B) 2k registers
      (C) K2 registers                              (D) K3 registers
      Ans: B

Q.53  The time interval between adjacent bits is called the
      (A) Word-time                                 (B) Bit-time
      (C) Turn around time                          (D) Slice time
        Ans: B

Q.54    A group of bits that tell the computer to perform a specific operation is known as
        (A) Instruction code      (B) Micro-operation
        (C) Accumulator           (D) Register
        Ans: A

Q.55    The load instruction is mostly used to designate a transfer from memory to a
        processor register known as
        (A) Accumulator        (B) Instruction Register
        (C) Program counter    (D) Memory address Register
        Ans: A

Q.56    The communication between the components in a microcomputer takes place via the
        address and
        (A) I/O bus              (B) Data bus
        (C) Address bus          (D) Control lines
        Ans: B

Q.57    An instruction pipeline can be implemented by means of
        (A) LIFO buffer        (B) FIFO buffer
        (C) Stack              (D) None of the above
        Ans: B

Q.58    Data input command is just the opposite of a
        (A) Test command          (B) Control command
        (C) Data output           (D) Data channel
        Ans: C

Q.59   A microprogram sequencer
      (A) generates the address of next micro instruction to be executed.
      (B) generates the control signals to execute a microinstruction.
      (C) sequentially averages all microinstructions in the control memory.
      (D) enables the efficient handling of a micro program subroutine.
        Ans: A

Q.60    A binary digit is called a
        (A) Bit                                      (B) Byte
        (C) Number                                   (D) Character
        Ans: A

Q.61   A flip-flop is a binary cell capable of storing information of
       (A) One bit                                   (B) Byte
       (C) Zero bit                                  (D) Eight bit
       Ans: A

Q.62   The operation executed on data stored in registers is called
       (A) Macro-operation        (B) Micro-operation
       (C) Bit-operation          (D) Byte-operation
       Ans: B

Q.63   MRI indicates
       (A) Memory Reference Information.
       (B) Memory Reference Instruction.
       (C) Memory Registers Instruction.
       (D) Memory Register information
       Ans: B

Q.64   Self-contained sequence of instructions that performs a given computational task is called
       (A) Function                                  (B) Procedure
       (C) Subroutine                                (D) Routine
       Ans: A

Q.65   Microinstructions are stored in control memory groups, with each group specifying a
       (A) Routine                                   (B) Subroutine
       (C) Vector                                    (D) Address
       Ans: A

Q.66 An interface that provides a method for transferring binary information between
       internal storage and external devices is called
       (A) I/O interface          (B) Input interface
       (C) Output interface       (D) I/O bus
      Ans: A

Q.67 Status bit is also called
       (A) Binary bit                                (B) Flag bit
       (C) Signed bit                                (D) Unsigned bit
      Ans: B

Q.68 An address in main memory is called
        (A) Physical address      (B) Logical address
        (C) Memory address        (D) Word address
      Ans: A

Q.69    If the value V(x) of the target operand is contained in the address field itself, the
        addressing mode is
        (A) immediate.                                (B) direct.
        (C) indirect.                                 (D) implied.
      Ans: B

Q.70(-27)10 can be represented in a signed magnitude format and in a 1's complement format as
      (A) 111011 & 100100      (B) 100100 & 111011
      (C) 011011 & 100100      (D) 100100 & 011011
      Ans: A

Q.71 The instructions which copy information from one location to another either in the processor's internal register set or in the external main memory are called
 (A) Data transfer instructions.(B) Program control instructions.
 (C) Input-output instructions.   (D) Logical instructions.
      Ans: A

Q.72 A device/circuit that goes through a predefined sequence of states upon the application
      of input pulses is called
      (A) register                                    (B) flip-flop
      (C) transistor.                                 (D) counter.
      Ans: D

Q.73 The performance of cache memory is frequently measured in terms of a quantity called
      (A) Miss ratio.                                 (B) Hit ratio.
      (C) Latency ratio.                              (D) Read ratio.
      Ans: C

Q.74 The information available in a state table may be represented graphically in a
      (A) simple diagram.    (B) state diagram.
      (C) complex diagram.   (D) data flow diagram.
      Ans: B

Q.75 Content of the program counter is added to the address part of the instruction in order to obtain the effective address is called.
      (A) relative address mode.     (B) index addressing mode.
      (C) register mode.             (D) implied mode.
      Ans: A

Q.76 An interface that provides I/O transfer of data directly to and form the memory unit and peripheral is termed as
      (A) DDA.       (B) Serial interface.
      (C) BR.        (D) DMA.
      Ans: D

Q.77 The 2s compliment form (Use 6 bit word) of the number 1010 is
      (A) 111100.                                  (B) 110110.
      (C) 110111.                                  (D) 1011.
      Ans: B

Q.78 A register capable of shifting its binary information either to the right or the left is called a
      (A) parallel register.   (B) serial register.
      (C) shift register.      (D) storage register.
      Ans: C

Q.79 What is the content of Stack Pointer (SP)?
      (A) Address of the current instruction
      (B) Address of the next instruction
      (C) Address of the top element of the stack
      (D) Size of the stack.
      Ans: C

Q.80 Which of the following interrupt is non maskable
      (A) INTR.                                    (B) RST 7.5.
      (C) RST 6.5.                                 (D) TRAP.
      Ans: D

Q.81 Which of the following is a main memory
      (A) Secondary memory.   (B) Auxiliary memory.
      (C) Cache memory.       (D) Virtual memory.
      Ans: C

Q.82 Which of the following are not a machine instructions
      (A) MOV.                                     (B) ORG.
      (C) END.                                     (D) (B) & (C).
      Ans: D

Q.83 In Assembly language programming, minimum number of operands required for an instruction is/are
       (A) Zero.            (B) One.
       (C) Two.             (D) Both (B) & (C).
       Ans: A

Q.84 The maximum addressing capacity of a micro processor which uses 16 bit database is
       32 bit address base is
       (A) 64 K.                                   (B) 4 GB.
       (C) both (A) & (B).                         (D) None of these.
       Ans: B

Q.85 The memory unit that communicates directly with the CPU is called the
       (A) main memory    (B) Secondary memory
       (C) shared memory  (D) auxiliary memory.
       Ans: A

Q.86 The average time required to reach a storage location in memory and obtain its contents
       is called
       (A) Latency time.                           (B) Access time.
       (C) Turnaround time.                        (D) Response time.
       Ans: B

State True or False
Q.87   A byte is a group of 16 bits.
       Ans: False

Q.88 A nibble is a group of 16 bits.
       Ans: False

Q.89   When a word is to be written in an associative memory, address has got to be given.
       Ans: False

Q.90    When two equal numbers are subtracted, the result would be ______and not_________.
        Ans: +ZERO, -ZERO.

Q.91    A ___________development system and an ______are essential tools for writing large assembly language programs.
        Ans: Microprocessor, assembler

Q.92    In an operation performed by the ALU, carry bit is set to 1 if the end carry C8 is ________. It is cleared to 0 (zero) if the carry is ______ _______.
        Ans: One, zero

Choose the correct alternative
Q.93 A successive A/D converter is
    (A) a high-speed converter.  (B) a low speed converter.
    (C) a medium speed converter.  (D) none of these.
        Ans: C

Q.94    When necessary, the results are transferred from the CPU to main memory by
         (A) I/O devices.                          (B) CPU.
         (C) shift registers.                      (D) none of these.
        Ans: B

Q.95 The gray code equivalent of (1011)2 is
        (A) 1101.                                  (B) 1010.
        (C) 1110.                                  (D) 1111.
        Ans: C

Q.96 A combinational logic circuit which sends data coming from a single source to two or more separate destinations is
       (A) Decoder.      (B) Encoder.
      (C) Multiplexer.   (D) Demultiplexer.
        Ans: D

Q.97 In which addressing mode the operand is given explicitly in the instruction
      (A) Absolute.                                (B) Immediate.
      (C) Indirect.                                (D) Direct.
        Ans: B

Q.98 A stack organized computer has
   (A) Three-address Instruction.   (B) Two-address Instruction.
   (C) One-address Instruction.     (D) Zero-address Instruction.
       Ans: D

Q.99 A Program Counter contains a number 825 and address part of the instruction contains the number 24. The effective address in the relative address mode, when an instruction is read from the memory is
      (A) 849.                                 (B) 850.
      (C) 801.                                 (D) 802.
       Ans: B

Q.100 A system program that translates and executes an instruction simultaneously is
      (A) Compiler.                            (B) Interpreter.
      (C) Assembler.                           (D) Operating system.
       Ans: C

Q.101 The cache memory of 1K words uses direct mapping with a block size of 4 words. How many blocks can the cache accommodate.
      (A) 256 words.                           (B) 512 words.
      (C) 1024 words.                          (D) 128 words.
       Ans: A

Q.102 A page fault
 (A) Occurs when there is an error in a specific page.
 (B) Occurs when a program accesses a page of main memory.
 (C) Occurs when a program accesses a page not currently in main  memory.
 (D) Occurs when a program accesses a page belonging to another program.
       Ans: C

Wednesday, April 25, 2012

DFA(Deterministic Finite Automaton) of Binary Number Divisible by 2


Q: Draw a DFA of binary number divisible by 2


Solution:

A binary number is divisible by 2 if its last digit is "0". So the DFA of binary number divisible by 2 is as:



DFA(Deterministic Finite Automaton) of Binary Number Divisible by 3




Q: Draw a DFA of binary number divisible by 3 over alphabet {0,1}

Solution: First we try to look how a binary number changes when 0 or 1 is inserted at right side. (Mod_0, Mod_1, Mod_2 are remainder of a number when divisible by 3)

Inserted Digit             Mod_0           Mod_1             Mod_2
---------------------------------------------------------------------------------------------
0                                  Mod_0          Mod_2            Mod_1
--------------------------------------------------------------------------------------------
1                                  Mod_1         Mod_0             Mod_2
-------------------------------------------------------------------------------------------


So the DFA is as:











Mod_0 is the final state.

Tuesday, April 24, 2012

DFA(Deterministic Finite Automaton) of Binary Number Divisible By4

Q: Draw a DFA of the strings over the set {0,1}, so that the strings are divisible by 4.

Solution:
A binary number is divisible by 4 if its last two digits are "0". As example : 100, 1000,10100, 11100,.....
So the DFA of binary number divisible by 4 is as:








Where Q0 starting state and Qf is final state.

Saturday, April 21, 2012

Parsing in Brief


Part I. Intoduction to Parsing
A. Introduction
This is the first of a series of articles on language translation. This is commonly called “parsing”, and I will no doubt lapse from time to time into this usage, but it is better to reserve the word for one phase of translation.
The techniques I will discuss are used in compilers and interpreters, of course, but also in many more-common circumstances. For example, when you add a formula to a speadsheet, the application must translate the formula into a form it can use. Many database programs let you specify formats for input fields -- these must translate your format, and then use the result to translate the input. Consider the expression search in MPW. I guarantee you there’s a translator operating there. And, of course, my own program Idealiner uses a translator when you specify topic numbers.
All the code examples in this series of articles will be MPW tools, to minimize interface considerations. I doubt if you’ll see a single Macintosh trap before the tenth article. I will focus instead on the internals of language translation.
Language translation can be divided into three stages: first, the LEXICAL ANALYZER (also called the “scanner”) divides the input text into individual symbols, pieces like identifiers, operators and constants; then the PARSER interprets the sequence of symbols according to the language’s grammar; and lastly the CODE GENERATOR responds in an appropriate fashion to the statements the parser has processed. These three phases operate simultaneously, the parser calling the lexical analyzer for input, and then calling the code generator for output.
A(1). Parsing
The first three parts in the series will focus on parsing.
The first part of the article (this very one) will introduce parsing and the parser-generator YACC (Yet Another Compiler Compiler). YACC converts a set of GRAMMAR RULES into C source for a parser that will accept that grammar. You still have to write the lexical analyzer and the code generator (although YACC provides a framework for code generation) but YACC takes care of the most tedious part of the job.
The second part will go beneath the surface and explore the inner workings of YACC. This will be somewhat technical, and may seem unnecessary, but it will provide an essential foundation for later topics. The article will cover deriving parse tables from YACC output, parsing expressions by hand (using the parse tables), and YACC’s debug output.
The next article will discuss ambiguities, association and error detection. An understanding of the first two topics are essential to writing correct and efficient grammars. The third covers detecting and reporting errors by type and location.
A(2). Lexical Analysis
With the third article, I will return to the first stage of language translation, lexical analysis. The article will begin with a discussion of state machines, then present a simple but effective table-driven analyzer.
The fourth article will be an excursion into a seemingly unrelated field: hash tables and binary trees. The idea is to develop some tools to increase the power of the lexical analyzer in the following article.
The fifth article will extend the analyzer by adding a symbol table. The routines developed in the fifth article will give us a way to save symbols; the example program will be an improved version of the MPW Canon tool.
A(3). And a Useful Example
Now, I’m pretty sure of the preceding, because I’ve already written the articles! What follows is a forecast of what I’ll do next.
The plan is to build an MPW tool that will preprocess either Pascal or C source files and convert inline assembly code into Pascal inline routines or C direct functions, as appropriate. That is, you will be able to write assembly language instructions, and the preprocessor will convert your assembly into machine language. Your assembler routines will be declared with high-level headers, a la Lightspeed C, and you will be able to refer to routine arguments and local variables by name, rather than indexing off the stack, a real convenience. I’m going to write this tool because, damn it, I want to use it!
This will be a major project: I’ll need to parse Pascal and C well enough to find routines, and I’ll need to parse assembler completely. I’ll then need to write a more-or-less complete assembler. It could take six or eight columns.
B. Grammar Descriptions
Here’s a reference for this topic: Compilers: Principles, Techniques and Tools, by Aho, Sethi and Ullman (Addison-Wesley 1986). You may have heard of some of these guys. I will refer to it as “ASU”.
A computer language is composed of a set of symbols (the “words” of the language), and a set of grammar rules that determine how these words are formed into statements (the “sentences” of a computer language). As I said earlier, the lexical analyzer is in charge of extracting the “words” from the input; it is the job of the parser to make meaningful “sentences” from these “words”.
A bit of terminology: classes of symbols (such as “integer” or “identifier”) are called TOKENS, and specific instances of tokens (such as “252” or “sysbeep”) are called LEXEMES. The lexical analyzer will typically determine both the type and value of a symbol that it reads; the former is the token classification and the latter is the lexical value. The parser cares about tokens; the code generator cares about values.
I guarantee that I will use the word “token” to mean both token and lexeme. The meaning should be clear from the context.
B(1). Grammar Rules
A grammar rule is called a PRODUCTION. It is a substitution rule; the single symbol of the left-hand side (often, but not by me, abbreviated LHS) produces the expansion on the right-hand side (RHS). Here’s an example:
stmt -> ID ‘=’ expr ‘;’
A ‘stmt’ can be expanded to (that’s what the ‘->’ means) a series of symbols consisting of an ‘ID’, an ‘=’ sign, and ‘expr’ and a semicolon. Symbols within quotation marks are explicit, and must appear as written; other symbols are token classes.
Recursive definition is permitted. Here’s an example:
expr -> expr ‘+’ NUM
Applying this rule to the first, we discover that:
stmt -> ID ‘=’ expr ‘+’ NUM ‘;’
stmt -> ID ‘=’ expr ‘+’ NUM ‘+’ NUM ‘;’
And so on. It is possible to determine a complete language such as Pascal or C with relatively few such productions, even though there are infinitely-many legal statements in the language.
Now, everyone can make up his own conventions, of course, but I will distinguish two kinds of non-explicit symbols by showing one in all-caps and one in all-lower-case. All-caps symbols are defined by the lexical analyzer, not by the parser. Thus, they will not appear on the left-hand side of any production. These symbols are call TERMINALS, because they terminate expansion. When you expand to a terminal, you can expand no further. The lower-case symbols are, surprise!, NON-TERMINALS. They are defined by the parser rather than the lexical analyzer, appear somewhere as the left-hand side of one or more productions, and do not terminate expansion. These distinctions are important!
As the above examples suggest, several productions can share the same left-hand side:
expr -> expr ‘+’ NUM
expr -> NUM
This pair of productions expands to arbitrary sums. Just start with the first production and substitute the first production into it to add another term, or the second to terminate the expansion.
B(2). An Example Grammar
This will be fun. If you don’t want to mess with abstract symbols, just skip this whole section; the result is all you need. My development of the grammar won’t be very theoretical, though.
A parser reads the input symbols from left to right, until it has read the right-hand side of a production. Then it REDUCES the input, running the production backwards and replacing the right-hand side with the left-hand side. This is the point at which code is generated, and expressions evaluated: when a reduction occurs.
So to see if a grammar does what we want, we can start with a test statement that should be accepted by the grammar, and see what happens when we play parser.
The grammar we are shooting for will accept simple algebraic expressions, involving addition, subtraction, multiplication and division of numbers.
B(2)(a). First Try
Earlier, we saw productions that can expand to arbitrary sums:
expr -> expr ‘+’ NUM
expr -> NUM
We can add a few more to get the other three usual operators:
/* 1 */

(1) expr -> expr ‘+’ NUM
(2) expr -> expr ‘-’ NUM
(3) expr -> expr ‘*’ NUM
(4) expr -> expr ‘/’ NUM
(5) expr -> NUM
Let’s try a simple test:
NUM ‘+’ NUM ‘*’ NUM
-> expr ‘+’ NUM ‘*’ NUM
(rule 5; remember that we read from the left)
-> expr ‘*’ NUM (rule 1)
-> expr (rule 3)
The addition was the first thing to go; therefore, it was performed first. This is contrary to our expectations, I hope (my past as a math teacher shows itself!). The grammar won’t work. We need to make sure that multiplication and division are performed before addition and subtraction.
B(2)(b). Second Try
So let’s introduce another non-terminal. We want to cluster products together, to ensure that they are evaluated first, so let’s make them a separate non-terminal:
/* 2 */

(1) expr -> expr ‘+’ term
(2) expr -> expr ‘-’ term
(3) expr -> term
(4) term -> term ‘*’ NUM
(5) term -> term ‘/’ NUM
(6) term -> NUM
Now try the test string again:
NUM ‘+’ NUM ‘*’ NUM
-> term ‘+’ NUM ‘*’ NUM
(rule 6)
-> expr ‘+’ NUM ‘*’ NUM
(rule 3)
-> expr ‘+’ term ‘*’ NUM
(rule 6)
Oops, it looks like we have a choice here: rule 1 or rule 4. We really don’t, though; reducing by rule 1 would leave us with “expr ‘*’ NUM”, and we don’t have any rule with “expr ‘*’” in it.
-> expr ‘+’ term
(rule 4)
-> expr
(rule 1)
So the multiplication happened first, just like we wanted.
B(2)(c). Third Try
Suppose, however, that we wanted the addition to occur first. Then we need to add some parentheses:
/* 3 */

(1) expr -> expr ‘+’ term
(2) expr -> expr ‘-’ term
(3) expr -> term
(4) term -> term ‘*’ NUM
(5) term -> term ‘/’ NUM
(6) term -> ‘(‘ expr ‘)’
(7) term -> NUM
Now we treat an addition or subtraction that occurs within parentheses as if the sum or difference was a ‘term’ (rule 6). And that should do it.
C. YACC: Yet Another Compiler Compiler
YACC is a UNIX tool that builds parsers from grammars. We can take the grammar just developed, supplement it with a minimal bit of C code, and YACC will write the complete parser for us. Or, if the parser is only a small part of a big program, YACC will write that small part, and we can then link it to the main program. In either case, YACC saves us from the unbearable tedium of computing parse tables by hand.
A YACC input file is divided into three parts: the declaration part, the grammar part, and the program part. (Remind you of anything? My past as a COBOL programmer coming out!) The three sections are separated by lines beginning:
%%

YACC writes a C source file as its output; it will also write a file of parser information if you desire it. We’ll look at that next time.
C(1). The Declaration Section
The declarations include a set of C-style declarations that are used by YACC to build the parser. For now, there are only two sorts of declarations that concern us.
The first is the “%token” declaration. YACC will automatically recognize one-character tokens like ‘+’ and ‘-’. All other terminal tokens must be declared with the %token statement, e.g.,
%token NUM
Non-terminals do not need to be declared; they are implicitly declared in the grammar rules.
The second declaration type lets us pass regular C declarations through YACC to the C compiler. These look like this:
/* 4 */

%{
#define blip 12
%}
Anything between the %{ and the %} is passed through unchanged, and written by YACC to the C source file.
It is customary to include a #define for YYSTYPE. What is YYSTYPE? It is the type of the parser’s stack. For now, there’s no need to worry about it. It is “int” by default, and int will work fine for what we’re doing this month. Later, after I’ve discussed how the parser operates, and we know what sort of things go on the stack, we’ll come back to it.
C(2). The Grammar Section
The grammar section includes all the grammar productions, like those we discussed earlier. They are written in a somewhat non-standard format (taking ASU’s notation as the standard, which I think is reasonable). They also provide a framework for code generation.
C(2)(a). Production Rule Format
The arrow in productions is replaced with a colon, and productions with the same left-hand side are combined into one, with the right-hand sides separated with vertical bars, |. The last right-hand side is terminated with a semicolon. The usual practice is to format the productions like this:
/* 5 */

expr : expr ‘+’ term
| expr ‘-’ term
| term
;

term : term ‘*’ NUM
| term ‘/’ NUM
| ‘(‘ expr ‘)’
| NUM
;
in place of:
expr -> expr ‘+’ term
expr -> expr ‘-’ term
expr -> term
term -> term ‘*’ NUM
term -> term ‘/’ NUM
term -> ‘(‘ expr ‘)’
term -> NUM
C(2)(b). Code Generation
You can follow each right-hand side with some pseudo-C code that the parser will then call when the production rule is executed (i.e., if you read that stuff about reductions, when the input is reduced by that production).
Here’s an example. Suppose you have the production rules:
expr : expr ‘+’ term
| expr ‘-’ term
| term
;
The question is, what is the value of the “expr”? In the first case, it’s the value of the first token of the right-hand side plus the value of the third; in the second, it’s the first minus the third; and in the third, it’s just the value of the first token. So we write:
/* 6 */

expr : expr ‘+’ term
{
$$ = $1 + $3;
}

| expr ‘-’ term
{
$$ = $1 - $3;
}

| term
{
$$ = $1;
}

;
This isn’t hard to figure out; $$ is the value of the left-hand side, $1 is the value of the first token of the right-hand side, $3 the value of the third token. Using those symbols, you just write straight C code. You are not limited to a single line, and you can call routines that you have written elsewhere. Don’t forget the braces or the semicolons.
C(3). The Program Section
The program section is made up of straight C code, and is copied unaltered into the C source file. While YACC requires that you supply some routines to it, you can put them in another file, and this section can be completely empty. In simpler cases, however, the program section allows you to write your entire program in the YACC source file.
Here’s what YACC requires: a lexical analyzer, called yylex(), and an error routine, yyerror(). Common sense requires a main() routine as well.
The prototype for yylex() is:
/* 7 */

int yylex();
The parser routine, called yyparse(), will call yylex() whenever it needs input. yylex() reads the next token, sets the global variable “yylval” to the value of the token (optional; this is the “$” value used in the production code), and returns the token type. You might wonder where it reads the token from, since it hasn’t any arguments. The answer is, it uses global variables of some sort, such as a global string or a global file reference, that is set up by the main() routine.
The error routine is:
/* 8 */

void yyerror(char *message);
The message is generated by the parser when it detects an error; yyerror()’s job is to notify the user that something has gone wrong. Of course, you don’t have do live with YACC’s default error messages, which are rather unilluminating; you can call yyerror() yourself. More on this in a future article.
D. An MPW Hex Calculator
Now for some real code. Our example program is an MPW tool, a hex calculator. It will evaluate expressions using +, -, * and /, and will also properly evaluate expressions with parentheses in them. The name of the tool is “Hex”; you can invoke it with an expression, e.g.:
Hex ‘2 - ( 3 - 4 )’
in which case it will evaluate, print the result, and exit. Notice that in this case, the expression must be in quotation marks, or MPW will treat each token separately. Note also that the tokens must be separated by spaces. This is to simplify the lexical analyzer; we will relax this requirement in a future version.
The tool may also be invoked without an expression to evaluate. It will then go into a loop, reading in expressions and evaluating them, e.g.:
Hex
? 2 - ( 3 - 4 )
= 3
? 64 * 8
= 320
?
The loop ends on a blank line.
Here are the input globals and the code for the main() routine:
/* 9 */

char *input;
char *token;

void main(int argc, char *argv[])
{
char thestring[256];
if (argc < 1)
printf(“\tImpossible error!\n”);
else if (argc > 2)
printf(“\tHey! One at a time!\n”);
else if (argc == 2)
{
input = argv[1];
yyparse();
}
else
{
printf(“? “);
while (strlen(gets(thestring)) > 2)
{
input = &thestring[2];
yyparse();
printf(“? “);
}
}
}
“input” is the input buffer, and holds the expression to evaluate. “token” is filled by yylex() with the current token. It’s useful for debugging and error reporting. Finally, in the read loop, notice that the gets() routine will read the “? ” prompt as well as the expression, this being MPW, which is why “input” points to thestring[2].
D(1). The Lexical Analyzer and Error Routines
This is about as simple as a lexical analyzer can be. The strtok() routine will return the tokens in the input string, so long as they’re separated by spaces, or newline at the end of input. Then if sscanf() can read a hex number, that’s what the token must be; if it isn’t a number, it must be an operator or a parenthesis, so return the first character.
This routine is so simple it is vulnerable to pathological input -- please don’t TRY to break it! Where’s the challenge? This is just a stopgap, good enough to serve until we get a real lexical analyzer.
/* 10 */

int yylex()
{
if (input == 0)
token = strtok(0, “ “);
else
{
token = strtok(input, “ “);
input = 0;
}
if (token == 0)
return(‘\n’);
else if (sscanf(token, “%x”, &yylval) == 1)
return(NUM);
else
return(token[0]);
}
The error routine is even simpler. It just prints out the parser’s default error message, which is the blindingly helpful “syntax error”. We do add the current token, which may be helpful.
/* 11 */

#define yyerror(x)
{
printf(“\t%s [%s]\n”, x, token);
return(0);
}
(This is divided into separate lines to fit in Mac Toot’s narrow columns; that’s not the way we’d write it in C, of course!) Note the return statement. yyerror is called from within the parser routine yyparse(), so that’s what we’re returning from. The effect is to abort the translation.
D(2). The Grammar
The grammar we will use is the same as that developed earlier. There’s one additional production: we have to give YACC a START SYMBOL, which is the left-hand side of the first production. In this case, “prob” is short for “problem”.
prob -> expr ‘\n’
The newline is a single-character token returned by yylex() to signal the end of input. So if the entire input string is an expression, we’ve got a complete problem. Any other occurrence of a newline is an error.
/* 12 */

prob -> expr ‘\n’
expr -> expr + term
expr -> expr - term
expr -> term
term -> term * NUM
term -> term / NUM
term -> ( expr )
term -> NUM
D(3). The Value Calculation and Output
Here’s the actual YACC input file, declaration and grammar sections. The declaration section consists of a single %token declaration, making NUM a terminal symbol. The grammar section includes all the productions listed above, each with some associated code.
/* 13 */

%token NUM

%%

prob : expr
{
printf(“\t= %X\n”, $1);
return(0);
}
;
expr : expr ‘+’ term
{
$$ = $1 + $3;
}
| expr ‘-’ term
{
$$ = $1 - $3;
}
| term
{
$$ = $1;
}
;

term : term ‘*’ NUM
{
$$ = $1 * $3;
}
| term ‘/’ NUM
{
$$ = $1 / $3;
}
| ‘(‘ expr ‘)’
{
$$ = $2;
}
| NUM
{
$$ = $1;
}
;
D(4). The Make File
YACC goes right into your make file, just like any other MPW tool. Hmm. I thought I said that YACC was a UNIX tool... The truth is that there is an MPW version of YACC available, called MACYACC (I have renamed the tool on my system to make typing easier).
#14

Hex.c ƒ Hex.make Hex.y
Yacc -VHex.out Hex.y
Hex.c.o ƒ Hex.make Hex.c
C -r Hex.c
Hex ƒƒ Hex.make Hex.c.o
Link -w -t MPST -c ‘MPS ‘ �
Hex.c.o �
{Libraries}”Interface.o �
{CLibraries}”CRuntime.o �
{CLibraries}”StdCLib.o �
{CLibraries}”CSANELib.o �
{CLibraries}”Math.o �
{CLibraries}”CInterface.o �
{Libraries}”ToolLibs.o �
-o Hex