* semantic.texi (Analyzer Internals): Rename from Analyzer.
[bpt/emacs.git] / doc / misc / semantic.texi
1 \input texinfo
2 @setfilename ../../info/semantic
3 @set TITLE Semantic Manual
4 @set AUTHOR Eric M. Ludlam and David Ponce
5 @settitle @value{TITLE}
6
7 @c *************************************************************************
8 @c @ Header
9 @c *************************************************************************
10
11 @c Merge all indexes into a single index for now.
12 @c We can always separate them later into two or more as needed.
13 @syncodeindex vr cp
14 @syncodeindex fn cp
15 @syncodeindex ky cp
16 @syncodeindex pg cp
17 @syncodeindex tp cp
18
19 @c @footnotestyle separate
20 @c @paragraphindent 2
21 @c @@smallbook
22 @c %**end of header
23
24 @copying
25 This manual documents the Semantic library and utilities.
26
27 Copyright @copyright{} 1999, 2000, 2001, 2002, 2003, 2004, 2005, 2007,
28 2009 Free Software Foundation, Inc.
29
30 @quotation
31 Permission is granted to copy, distribute and/or modify this document
32 under the terms of the GNU Free Documentation License, Version 1.3 or
33 any later version published by the Free Software Foundation; with no
34 Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
35 and with the Back-Cover Texts as in (a) below. A copy of the license
36 is included in the section entitled ``GNU Free Documentation License.''
37
38 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
39 modify this GNU manual. Buying copies from the FSF supports it in
40 developing GNU and promoting software freedom.''
41 @end quotation
42 @end copying
43
44 @ifinfo
45 @format
46 START-INFO-DIR-ENTRY
47 * Semantic: (semantic). Source code parser library and utilities.
48 END-INFO-DIR-ENTRY
49 @end format
50 @end ifinfo
51
52 @titlepage
53 @center @titlefont{Semantic}
54 @sp 4
55 @center by @value{AUTHOR}
56 @end titlepage
57 @page
58
59 @macro semantic{}
60 @i{Semantic}
61 @end macro
62
63 @macro keyword{kw}
64 @anchor{\kw\}
65 @b{\kw\}
66 @end macro
67
68 @macro obsolete{old,new}
69 @sp 1
70 @strong{Compatibility}:
71 @code{\new\} introduced in @semantic{} version 2.0 supercedes
72 @code{\old\} which is now obsolete.
73 @end macro
74
75 @c *************************************************************************
76 @c @ Document
77 @c *************************************************************************
78 @contents
79
80 @node top
81 @top @value{TITLE}
82
83 @semantic{} is a suite of Emacs libraries and utilities for parsing
84 source code. At its core is a lexical analyzer and two parser
85 generators (@code{bovinator} and @code{wisent}) written in Emacs Lisp.
86 @semantic{} provides a variety of tools for making use of the parser
87 output, including user commands for code navigation and completion, as
88 well as enhancements for imenu, speedbar, whichfunc, eldoc,
89 hippie-expand, and several other parts of Emacs.
90
91 To send bug reports, or participate in discussions about semantic,
92 use the mailing list cedet-semantic@@sourceforge.net via the URL:
93 @url{http://lists.sourceforge.net/lists/listinfo/cedet-semantic}
94
95 @ifnottex
96 @insertcopying
97 @end ifnottex
98
99 @menu
100 * Introduction::
101 * Using Semantic::
102 * Semantic Internals::
103 * Glossary::
104 * GNU Free Documentation License::
105 * Index::
106 @end menu
107
108 @node Introduction
109 @chapter Introduction
110
111 This chapter gives an overview of @semantic{} and its goals.
112
113 Ordinarily, Emacs uses regular expressions (and syntax tables) to
114 analyze source code for purposes such as syntax highlighting. This
115 approach, though simple and efficient, has its limitations: roughly
116 speaking, it only ``guesses'' the meaning of each piece of source code
117 in the context of the programming language, instead of rigorously
118 ``understanding'' it.
119
120 @semantic{} provides a new infrastructure to analyze source code using
121 @dfn{parsers} instead of regular expressions. It contains two
122 built-in parser generators (an @acronym{LL} generator named
123 @code{Bovine} and an @acronym{LALR} generator named @code{Wisent},
124 both written in Emacs Lisp), and parsers for several common
125 programming languages. It can also make use of @dfn{external
126 parsers}---programs such as GNU Global and GNU IDUtils.
127
128 @semantic{} provides a uniform, language-independent @acronym{API} for
129 accessing the parser output. This output can be used by other Emacs
130 Lisp programs to implement ``syntax-aware'' behavior. @semantic{}
131 itself includes several such utilities, including user-level Emacs
132 commands for navigating, searching, and completing source code.
133
134 The following diagram illustrates the structure of the @semantic{}
135 package:
136
137 @table @strong
138 @item Please Note:
139 The words in all-capital are those that @semantic{} itself provides.
140 Others are current or future languages or applications that are not
141 distributed along with @semantic{}.
142 @end table
143
144 @example
145 Applications
146 and
147 Utilities
148 -------
149 / \
150 +---------------+ +--------+ +--------+
151 C --->| C PARSER |--->| | | |
152 +---------------+ | | | |
153 +---------------+ | COMMON | | COMMON |<--- SPEEDBAR
154 Java --->| JAVA PARSER |--->| PARSE | | |
155 +---------------+ | TREE | | PARSE |<--- SEMANTICDB
156 +---------------+ | FORMAT | | API |<--- ecb
157 Scheme --->| SCHEME PARSER |--->| | | |
158 +---------------+ | | | |
159 +---------------+ | | | |
160 Texinfo --->| TEXI. PARSER |--->| | | |
161 +---------------+ | | | |
162
163 ... ... ... ...
164
165 +---------------+ | | | |<--- app. 1
166 Lang. A --->| A Parser |--->| | | |
167 +---------------+ | | | |<--- app. 2
168 +---------------+ | | | |
169 Lang. B --->| B Parser |--->| | | |<--- app. 3
170 +---------------+ | | | |
171
172 ... ... ... ... ...
173
174 +---------------+ | | | |
175 Lang. Y --->| Y Parser |--->| | | |<--- app. ?
176 +---------------+ | | | |
177 +---------------+ | | | |<--- app. ?
178 Lang. Z --->| Z Parser |--->| | | |
179 +---------------+ +--------+ +--------+
180 @end example
181
182 @menu
183 * Semantic Components::
184 @end menu
185
186 @node Semantic Components
187 @section Semantic Components
188
189 In this section, we provide a more detailed description of the major
190 components of @semantic{}, and how they interact with one another.
191
192 The first step in parsing a source code file is to break it up into
193 its fundamental components. This step is called lexical analysis:
194
195 @example
196 syntax table, keywords list, and options
197 |
198 |
199 v
200 input file ----> Lexer ----> token stream
201 @end example
202
203 @noindent
204 The output of the lexical analyzer is a list of tokens that make up
205 the file. The next step is the actual parsing, shown below:
206
207 @example
208 parser tables
209 |
210 v
211 token stream ---> Parser ----> parse tree
212 @end example
213
214 @noindent
215 The end result, the parse tree, is @semantic{}'s internal
216 representation of the language grammar. @semantic{} provides an
217 @acronym{API} for Emacs Lisp programs to access the parse tree.
218
219 Parsing large files can take several seconds or more. By default,
220 @semantic{} automatically caches parse trees by saving them in your
221 @file{.emacs.d} directory. When you revisit a previously-parsed file,
222 the parse tree is automatically reloaded from this cache, to save
223 time. @xref{SemanticDB}.
224
225 @node Using Semantic
226 @chapter Using Semantic
227
228 @include sem-user.texi
229
230 @node Semantic Internals
231 @chapter Semantic Internals
232
233 This chapter provides an overview of the internals of @semantic{}.
234 This information would not be needed by neither application developers
235 nor grammar developers.
236
237 It would be useful mostly for the hackers who would like to learn
238 more about how @semantic{} works.
239
240 @menu
241 * Parser code :: Code used for the parsers
242 * Tag handling :: Code used for manipulating tags
243 * Semanticdb Internals :: Code used in the semantic database
244 * Analyzer Internals :: Code used in the code analyzer
245 * Tools :: Code used in user tools
246 * Tests :: Code used for testing
247 @end menu
248
249 @node Parser code
250 @section Parser code
251
252 @semantic{} parsing code is spread across a range of files.
253
254 @table @file
255 @item semantic.el
256 The core infrastructure sets up buffers for parsing, and has all the
257 core parsing routines. Most parsing routines are overloadable, so the
258 actual implementation may be somewhere else.
259
260 @item semantic-edit.el
261 Incremental reparse based on user edits.
262
263 @item semantic-grammar.el
264 @itemx semantic-grammar.wy
265 Parser for the different grammar languages, and a major mode for
266 editing grammars in Emacs.
267
268 @item semantic-lex.el
269 Infrastructure for implementing lexical analyzers. Provides macros
270 for creating individual analyzers for specific features, and a way to
271 combine them together.
272
273 @item semantic-lex-spp.el
274 Infrastructure for a lexical symbolic preprocessor. This was written
275 to implement the C preprocessor, but could be used for other lexical
276 preprocessors.
277
278 @item bovine/bovine-grammar.el
279 @itemx bovine/bovine-grammar-macros.el
280 @itemx bovine/semantic-bovine.el
281 The ``bovine'' grammar. This is the first grammar mode written for
282 @semantic{} and is useful for simple creating simple parsers.
283
284 @item wisent/wisent.el
285 @itemx wisent/bison-wisent.el
286 @itemx wisent/semantic-wisent.el
287 @itemx wisent/semantic-debug-grammar.el
288 A port of bison to Emacs. This infrastructure lets you create LALR
289 based parsers for @semantic{}.
290
291 @item semantic-ast.el
292 Manage Abstract Syntax Trees for parsers.
293
294 @item semantic-debug.el
295 Infrastructure for debugging grammars.
296
297 @item semantic-util.el
298 Various utilities for manipulating tags, such as describing the tag
299 under point, adding labels, and the all important
300 @code{semantic-something-to-tag-table}.
301
302 @end table
303
304 @node Tag handling
305 @section Tag handling
306
307 A tag represents an individual item found in a buffer, such as a
308 function or variable. Tag handling is handled in several source
309 files.
310
311 @table @file
312 @item semantic-tag.el
313 Basic tag creation, queries, cloning, binding, and unbinding.
314
315 @item semantic-tag-write.el
316 Write a tag or tag list to a stream. These routines are used by
317 @file{semanticdb-file.el} when saving a list of tags.
318
319 @item semantic-tag-file.el
320 Files associated with tags. Goto-tag, file for include, and file for
321 a prototype.
322
323 @item semantic-tag-ls.el
324 Language dependant features of a tag, such as parent calculation, slot
325 protection, and other states like abstract, virtual, static, and leaf.
326
327 @item semantic-dep.el
328 Include file handling. Contains the include path concepts, and
329 routines for looking up file names in the include path.
330
331 @item semantic-format.el
332 Convert a tag into a nicely formatted and colored string. Use
333 @code{semantic-test-all-format-tag-functions} to test different output
334 options.
335
336 @item semantic-find.el
337 Find tags matching different conditions in a tag table.
338 These routines are used by @file{semanticdb-find.el} once the database
339 has been converted into a simpler tag table.
340
341 @item semantic-sort.el
342 Sorting lists of tags in different ways. Includes sorting a plain
343 list of tags forward or backward. Includes binning tags based on
344 attributes (bucketize), and tag adoption for multiple references to
345 the same thing.
346
347 @item semantic-doc.el
348 Capture documentation comments from near a tag.
349
350 @end table
351
352 @node Semanticdb Internals
353 @section Semanticdb Internals
354
355 @acronym{Semanticdb} complexity is certainly an issue. It is a rather
356 hairy problem to try and solve.
357
358 @table @file
359 @item semanticdb.el
360 Defines a @dfn{database} and a @dfn{table} base class. You can
361 instantiate these classes, and use them, but they are not persistent.
362
363 This file also provides support for @code{semanticdb-minor-mode},
364 which automatically associates files with tables in databases so that
365 tags are @emph{saved} while a buffer is not in memory.
366
367 The database and tables both also provide applicate cache information,
368 and cache flushing system. The semanticdb search routines use caches
369 to save datastructures that are complex to calculate.
370
371 Lastly, it provides the concept of @dfn{project root}. It is a system
372 by which a file can be associated with the root of a project, so if
373 you have a tree of directories and source files, it can find the root,
374 and allow a tag-search to span all available databases in that
375 directory hierarchy.
376
377 @item semanticdb-file.el
378 Provides a subclass of the basic table so that it can be saved to
379 disk. Implements all the code needed to unbind/rebind tags to a
380 buffer and writing them to a file.
381
382 @item semanticdb-el.el
383 Implements a special kind of @dfn{system} database that uses Emacs
384 internals to perform queries.
385
386 @item semanticdb-ebrowse.el
387 Implements a system database that uses Ebrowse to parse files into a
388 table that can be queried for tag names. Successful tag hits during a
389 find causes @semantic{} to pick up and parse the reference files to
390 get the full details.
391
392 @item semanticdb-find.el
393 Infrastructure for searching groups @semantic{} databases, and dealing
394 with the search results format.
395
396 @item semanticdb-ref.el
397 Tracks crossreferences. Cross references are needed when buffer is
398 reparsed, and must alert other tables that any dependant caches may
399 need to be flushed. References are in the form of include files.
400
401 @end table
402
403 @node Analyzer Internals
404 @section Analyzer Internals
405
406 The @semantic{} analyzer is a complex engine which has been broken
407 down across several modules. When the @semantic{} analyzer fails,
408 start with @code{semantic-analyze-debug-assist}, then dive into some
409 of these files.
410
411 @table @file
412 @item semantic-analyze.el
413 The core analyzer for defining the @dfn{current context}. The
414 current context is an object that contains references to aspects of
415 the local context including the current prefix, and a tag list
416 defining what the prefix means.
417
418 @item semantic-analyze-complete.el
419 Provides @code{semantic-analyze-possible-completions}.
420
421 @item semantic-analyze-debug.el
422 The analyzer debugger. Useful when attempting to get everything
423 configured.
424
425 @item semantic-analyze-fcn.el
426 Various support functions needed by the analyzer.
427
428 @item semantic-ctxt.el
429 Local context parser. Contains overloadable functions used to move
430 around through different scopes, get local variables, and collect the
431 current prefix used when doing completion.
432
433 @item semantic-scope.el
434 Calculate @dfn{scope} for a location in a buffer. The scope includes
435 local variables, and tag lists in scope for various reasons, such as
436 C++ using statements.
437
438 @item semanticdb-typecache.el
439 The typecache is part of @code{semanticdb}, but is used primarilly by
440 the analyzer to look up datatypes and complex names. The typecache is
441 bound across source files and builds a master lookup table for data
442 type names.
443
444 @item semantic-ia.el
445 Interactive Analyzer functions. Simple routines that do completion or
446 lookups based on the results from the Analyzer. These routines are
447 meant as examples for application writers, but are quite useful as
448 they are.
449
450 @item semantic-ia-sb.el
451 Speedbar support for the analyzer, displaying context info, and
452 completion lists.
453
454 @end table
455
456 @node Tools
457 @section Tools
458
459 These files contain various tools a user can use.
460
461 @table @file
462 @item semantic-idle.el
463 Idle scheduler for @semantic{}. Manages reparsing buffers after
464 edits, and large work tasks in idle time. Includes modes for showing
465 summary help and pop-up completion.
466
467 @item senator.el
468 The @semantic{} navigator. Provides many ways to move through a
469 buffer based on the active tag table.
470
471 @item semantic-decorate.el
472 A minor mode for decorating tags based on details from the parser.
473 Includes overlines for functions, or coloring class fields based on
474 protection.
475
476 @item semantic-decorate-include.el
477 A decoration mode for include files, which assists users in setting up
478 parsing for their includes.
479
480 @item semantic-complete.el
481 Advanced completion prompts for reading tag names in the minibuffer, or
482 inline in a buffer.
483
484 @item semantic-imenu.el
485 Imenu support for using @semantic{} tags in imenu.
486
487 @item semantic-mru-bookmark.el
488 Automatic bookmarking based on tags. Jump to locations you've been
489 before based on tag name.
490
491 @item semantic-sb.el
492 Support for @semantic{} tag usage in Speedbar.
493
494 @item semantic-util-modes.el
495 A bunch of small minor-modes that exposes aspects of the semantic
496 parser state. Includes @code{semantic-stickyfunc-mode}.
497
498 @item document.el
499 @itemx document-vars.el
500 Create an update comments for tags.
501
502 @item semantic-adebug.el
503 Extensions of @file{data-debug.el} for @semantic{}.
504
505 @item semantic-chart.el
506 Draw some charts from stats generated from parsing.
507
508
509 @item semantic-elp.el
510 Profiler for helping to optimize the @semantic{} analyzer.
511
512
513 @end table
514
515 @node Tests
516 @section Tests
517
518 @table @file
519
520 @item semantic-utest.el
521 Basic testing of parsing and incremental parsing for most supported
522 languages.
523
524 @item semantic-ia-utest.el
525 Test the semantic analyzer's ability to provide smart completions.
526
527 @item semantic-utest-c.el
528 Tests for the C parser's lexical pre-processor.
529
530 @item semantic-regtest.el
531 Regression tests from the older Semantic 1.x API.
532
533 @end table
534
535 @node Glossary
536 @appendix Glossary
537
538 @table @keyword
539 @item BNF
540 In semantic 1.4, a BNF file represented ``Bovine Normal Form'', the
541 grammar file used for the 1.4 parser generator. This was a play on
542 Backus-Naur Form which proved too confusing.
543
544 @item bovinate
545 A verb representing what happens when a bovine parser parses a file.
546
547 @item bovine lambda
548 In a bovine, or LL parser, the bovine lambda is a function to execute
549 when a specific set of match rules has succeeded in matching text from
550 the buffer.
551
552 @item bovine parser
553 A parser using the bovine parser generator. It is an LL parser
554 suitible for small simple languages.
555
556 @item context
557
558 @item LALR
559
560 @item lexer
561 A program which converts text into a stream of tokens by analyzing
562 them lexically. Lexers will commonly create strings, symbols,
563 keywords and punctuation, and strip whitespaces and comments.
564
565 @item LL
566
567 @item nonterminal
568 A nonterminal symbol or simply a nonterminal stands for a class of
569 syntactically equivalent groupings. A nonterminal symbol name is used
570 in writing grammar rules.
571
572 @item overloadable
573 Some functions are defined via @code{define-overload}.
574 These can be overloaded via ....
575
576 @item parser
577 A program that converts @b{tokens} to @b{tags}.
578
579 @item tag
580 A tag is a representation of some entity in a language file, such as a
581 function, variable, or include statement. In semantic, the word tag is
582 used the same way it is used for the etags or ctags tools.
583
584 A tag is usually bound to a buffer region via overlay, or it just
585 specifies character locations in a file.
586
587 @item token
588 A single atomic item returned from a lexer. It represents some set
589 of characters found in a buffer.
590
591 @item token stream
592 The output of the lexer as well as the input to the parser.
593
594 @item wisent parser
595 A parser using the wisent parser generator. It is a port of bison to
596 Emacs Lisp. It is an LALR parser suitable for complex languages.
597 @end table
598
599
600 @node GNU Free Documentation License
601 @appendix GNU Free Documentation License
602 @include doclicense.texi
603
604 @node Index
605 @unnumbered Index
606 @printindex cp
607
608 @iftex
609 @contents
610 @summarycontents
611 @end iftex
612
613 @bye
614
615 @c Following comments are for the benefit of ispell.
616
617 @c LocalWords: alist API APIs arg argc args argv asis assoc autoload Wisent
618 @c LocalWords: backquote bnf bovinate bovinates LALR
619 @c LocalWords: bovinating bovination bovinator bucketize
620 @c LocalWords: cb cdr charquote checkcache cindex CLOS
621 @c LocalWords: concat concocting const constantness ctxt Decl defcustom
622 @c LocalWords: deffn deffnx defun defvar destructor's dfn diff dir
623 @c LocalWords: doc docstring EDE EIEIO elisp emacsman emph enum
624 @c LocalWords: eq Exp EXPANDFULL expresssion fn foo func funcall
625 @c LocalWords: ia ids iff ifinfo imenu imenus init int isearch itemx java kbd
626 @c LocalWords: keymap keywordtable lang languagemode lexer lexing Ludlam
627 @c LocalWords: menubar metaparent metaparents min minibuffer Misc mode's
628 @c LocalWords: multitable NAvigaTOR noindent nomedian nonterm noselect
629 @c LocalWords: nosnarf obarray OLE OO outputfile paren parsetable POINT's
630 @c LocalWords: popup positionalonly positiononly positionormarker pre
631 @c LocalWords: printf printindex Programmatically pt punctuations quotemode
632 @c LocalWords: ref regex regexp Regexps reparse resetfile samp sb
633 @c LocalWords: scopestart SEmantic semanticdb setfilename setq
634 @c LocalWords: settitle setupfunction sexp sp SPC speedbar speedbar's
635 @c LocalWords: streamorbuffer struct subalist submenu submenus
636 @c LocalWords: subsubsection sw sym texi texinfo titlefont titlepage
637 @c LocalWords: tok TOKEN's toplevel typemodifiers uml unset untar
638 @c LocalWords: uref usedb var vskip xref yak