Looking for GATE Preparation Material? Join & Get here now!

** Gate 2013 Question Papers.. ** CEED 2013 Results.. ** Gate 2013 Question Papers With Solutions.. ** GATE 2013 CUT-OFFs.. ** GATE 2013 Results.. **

<<Previous

Next>>

Introduction Of Tokanisation

Parsing actually starts to get interesting with tokenisation. This is where input is finally converted into a form that the players couldn�t have typed in directly.

Here�s the gist of it. You take a pre-processed input line and chunk it into units called symbols. A symbol is one of:

1. Anything between matching quotation marks, or between an unmatched quotation mark and the end of the line. Example: "this is a string".

2. Any other piece of punctuation. Examples: ? ; , ! .

3. Any series of digits not bounded by alphabetical characters. Examples: 7, 26. Minus signs can count as part of integers too.

4. Any series of alphanumerical characters. Examples: WALK, DRAGON, COIN22.

5. Whatever special characters you use to talk to the tokenisation process directly. I�ll discuss these later.

OK, now what you need from the tokeniser is a list of tokens. These are nodes that represent multiple parts of speech (nouns, verbs etc.), of which the main part of the parser can then attempt to make sense. They usually consist of three elements:

1. A type identifier, so you know what kind of token it is.

2. Data (for freestyle tokens).

3. A set of parts of speech that the token can take.

For strings, the type will be some predefined constant, such as T_STRING, StringType or whatever your naming convention decrees. The data will be the body of the string, e.g. WHAT?!!. The set of parts of speech will contain some representation for nouns, and maybe also for verbs. I�ll write this as [noun, verb]. Don�t panic, I shall explain parts of speech in detail when I reach the main parsing process in a later article.

For integers, the type will be T_INTEGER or IntegerType or whatever, and the data will be a number such as 142857. The set of parts of speech will be at least [noun, adjective], with maybe some others thrown in too.

Punctuation marks will have their own predefined nodes. You can look them up in a table, it�s simple enough. If you like, you can point some of them to the same record, e.g. question marks and exclamation marks could be mapped to the same node as a full stop (my apologies to American readers, I know you call these latter two "exclamation points" and "periods").

This brings us to words...

The Vocabulary

Words must be translated into atoms (from the inheritance hierarchy, as I described earlier in this set of articles). The data structure linking the two is the vocabulary. This consists of a symbol table that connects words, parts of speech (PoS) and atoms. Here�s an extract showing what a vocabulary might contain:


		Word 		    PoS 	Atom 						Comment
		<eat 		verb 	eat> 	
		<egg 		noun 	egg> 	
		<hit 	    verb 	hit> 	
		<orange	    colour 	adjectiveorange_colour> 	the colour
		<orange 		noun 	orange> 					the fruit
		<box 		verb 	hit> 					as in the sport of boxing
		<box 		noun 	box> 					the container

If a player typed HIT ORANGE BOX then the tokeniser would need to look up all definitions of each word and the appropriate possible meanings, i.e.:


						HIT 	<verb hit>
						ORANGE 	<adjective orange_colour><noun orange>
						BOX 	<verb hit><noun box>

This is done by means of a dictionary mechanism. I�m not going to go into the details of writing one of these � dictionaries are fairly common data structures. If you�re not using one from a library, a hash table with binary tree overflow usually does the business. So long as you have a reasonably efficient mechanism by which a word can be used to retrieve its matching record, that�s enough.

There are two further points to consider about vocabularies. Firstly, you might want to add a fourth component to each tuple to represent privilege level. If there are some commands that only admins should have, then there�s no reason these should be in the vocabulary for non-admins � it adds an extra level of security.

Secondly, some links need only be unidirectional. In the above example, the verb for BOX is just a synonym that points to the same atom as HIT. If during execution of [hit]() you wished to refer to the issuing command by looking backwards through the vocabulary, you wouldn�t want it to come up with BOX. Therefore, some kind of flag to note that a command is a synonym or an abbreviation is also in order.

Aside: if you did want [hit]() to refer to BOX then you would use

which when invoked would be [box_hit](). If box_hit were declared as a subclass of hit, then the code which was invoked would be the same as for [hit]() but when the action/verb atom was referred to it would come up as box_hit.

<<Previous

Next>>

Discussion Center

Discuss/
Query
Papers/
Syllabus
Feedback/
Suggestion
Yahoo
Groups
Sirfdosti
Groups
Contact
Us

MEMBERS LOGIN

INTERVIEW EBOOK

Get 9,000+ Interview Questions & Answers in an eBook. Interview Question & Answer Guide

9,000+ Interview Questions
All Questions Answered
5 FREE Bonuses
Free Upgrades

GATE RESOURCES

Gate Books

Training Institutes

Gate FAQs

GATE BOOKS

Mechanical Engineeering Books

Robotics Automations Engineering Books

Civil Engineering Books

Chemical Engineering Books

Environmental Engineering Books

Electrical Engineering Books

Electronics Engineering Books

Information Technology Books

Software Engineering Books

GATE Preparation Books

Exciting Offers

GATE Exam, Gate 2009, Gate Papers, Gate Preparation & Related Pages GATE Overview \| GATE Eligibility \| Structure Of GATE \| GATE Training Institutes \| Colleges Providing M.Tech/M.E. \| GATE Score \| GATE Results \| PG with Scholarships \| Article On GATE \| GATE Forum \| GATE 2009 Exclusive \| GATE 2009 Syllabus \| GATE Organizing Institute \| Important Dates for GATE Exam \| How to Apply for GATE \| Discipline / Branch Codes \| GATE Syllabus for Aerospace Engineering \| GATE Syllabus for Agricultural Engineering \| GATE Syllabus for Architecture and Planning \| GATE Syllabus for Chemical Engineering \| GATE Syllabus for Chemistry \| GATE Syllabus for Civil Engineering \| GATE Syllabus for Computer Science / IT \| GATE Syllabus for Electronics and Communication Engineering \| GATE Syllabus for Engineering Sciences \| GATE Syllabus for Geology and Geophysics \| GATE Syllabus for Instrumentation Engineering \| GATE Syllabus for Life Sciences \| GATE Syllabus for Mathematics \| GATE Syllabus for Mechanical Engineering \| GATE Syllabus for Metallurgical Engineering \| GATE Syllabus for Mining Engineering \| GATE Syllabus for Physics \| GATE Syllabus for Production and Industrial Engineering \| GATE Syllabus for Pharmaceutical Sciences \| GATE Syllabus for Textile Engineering and Fibre Science \| GATE Preparation \| GATE Pattern \| GATE Tips & Tricks \| GATE Compare Evaluation \| GATE Sample Papers \| GATE Downloads \| Experts View on GATE \| CEED 2009 \| CEED 2009 Exam \| Eligibility for CEED Exam \| Application forms of CEED Exam \| Important Dates of CEED Exam \| Contact Address for CEED Exam \| CEED Examination Centres \| CEED Sample Papers \| Discuss GATE \| GATE Forum of OneStopGATE.com \| GATE Exam Cities \| Contact Details for GATE \| Bank Details for GATE \| GATE Miscellaneous Info \| GATE FAQs \| Advertisement on GATE \| Contact Us on OneStopGATE \|
Copyright © 2025. One Stop Gate.com. All rights reserved	Testimonials \|Link To Us \|Sitemap \|Privacy Policy \| Terms and Conditions\|About Us
Our Portals : Academic Tutorials \| Best eBooksworld \| Beyond Stats \| City Details \| Interview Questions \| India Job Forum \| Excellent Mobiles \| Free Bangalore \| Give Me The Code \| Gog Logo \| Free Classifieds \| Jobs Assist \| Interview Questions \| One Stop FAQs \| One Stop GATE \| One Stop GRE \| One Stop IAS \| One Stop MBA \| One Stop SAP \| One Stop Testing \| Web Hosting \| Quick Site Kit \| Sirf Dosti \| Source Codes World \| Tasty Food \| Tech Archive \| Software Testing Interview Questions \| Free Online Exams \| The Galz \| Top Masala \| Vyom \| Vyom eBooks \| Vyom International \| Vyom Links \| Vyoms \| Vyom World C Interview Questions \| C++ Interview Questions \| Send Free SMS \| Placement Papers \| SMS Jokes \| Cool Forwards \| Romantic Shayari