Don't say "buying copies from the FSF" for manuals they do not publish
[bpt/emacs.git] / doc / misc / semantic.texi
1 \input texinfo
2 @setfilename ../../info/semantic
3 @set TITLE Semantic Manual
4 @set AUTHOR Eric M. Ludlam, David Ponce, and Richard Y. Kim
5 @settitle @value{TITLE}
6
7 @c *************************************************************************
8 @c @ Header
9 @c *************************************************************************
10
11 @c Merge all indexes into a single index for now.
12 @c We can always separate them later into two or more as needed.
13 @syncodeindex vr cp
14 @syncodeindex fn cp
15 @syncodeindex ky cp
16 @syncodeindex pg cp
17 @syncodeindex tp cp
18
19 @c @footnotestyle separate
20 @c @paragraphindent 2
21 @c @@smallbook
22 @c %**end of header
23
24 @copying
25 This manual documents the Semantic library and utilities.
26
27 Copyright @copyright{} 1999-2005, 2007, 2009-2012 Free Software Foundation, Inc.
28
29 @quotation
30 Permission is granted to copy, distribute and/or modify this document
31 under the terms of the GNU Free Documentation License, Version 1.3 or
32 any later version published by the Free Software Foundation; with no
33 Invariant Sections, with the Front-Cover texts being ``A GNU Manual,''
34 and with the Back-Cover Texts as in (a) below. A copy of the license
35 is included in the section entitled ``GNU Free Documentation License.''
36
37 (a) The FSF's Back-Cover Text is: ``You have the freedom to copy and
38 modify this GNU manual.''
39 @end quotation
40 @end copying
41
42 @dircategory Emacs misc features
43 @direntry
44 * Semantic: (semantic). Source code parser library and utilities.
45 @end direntry
46
47 @titlepage
48 @center @titlefont{Semantic}
49 @sp 4
50 @center by @value{AUTHOR}
51 @end titlepage
52 @page
53
54 @macro semantic{}
55 @i{Semantic}
56 @end macro
57
58 @macro keyword{kw}
59 @anchor{\kw\}
60 @b{\kw\}
61 @end macro
62
63 @macro obsolete{old,new}
64 @sp 1
65 @strong{Compatibility}:
66 @code{\new\} introduced in @semantic{} version 2.0 supersedes
67 @code{\old\} which is now obsolete.
68 @end macro
69
70 @c *************************************************************************
71 @c @ Document
72 @c *************************************************************************
73 @contents
74
75 @node top
76 @top @value{TITLE}
77
78 @semantic{} is a suite of Emacs libraries and utilities for parsing
79 source code. At its core is a lexical analyzer and two parser
80 generators (@code{bovinator} and @code{wisent}) written in Emacs Lisp.
81 @semantic{} provides a variety of tools for making use of the parser
82 output, including user commands for code navigation and completion, as
83 well as enhancements for imenu, speedbar, whichfunc, eldoc,
84 hippie-expand, and several other parts of Emacs.
85
86 To send bug reports, or participate in discussions about semantic,
87 use the mailing list cedet-semantic@@sourceforge.net via the URL:
88 @url{http://lists.sourceforge.net/lists/listinfo/cedet-semantic}
89
90 @ifnottex
91 @insertcopying
92 @end ifnottex
93
94 @menu
95 * Introduction::
96 * Using Semantic::
97 * Semantic Internals::
98 * Glossary::
99 * GNU Free Documentation License::
100 * Index::
101 @end menu
102
103 @node Introduction
104 @chapter Introduction
105
106 This chapter gives an overview of @semantic{} and its goals.
107
108 Ordinarily, Emacs uses regular expressions (and syntax tables) to
109 analyze source code for purposes such as syntax highlighting. This
110 approach, though simple and efficient, has its limitations: roughly
111 speaking, it only ``guesses'' the meaning of each piece of source code
112 in the context of the programming language, instead of rigorously
113 ``understanding'' it.
114
115 @semantic{} provides a new infrastructure to analyze source code using
116 @dfn{parsers} instead of regular expressions. It contains two
117 built-in parser generators (an @acronym{LL} generator named
118 @code{Bovine} and an @acronym{LALR} generator named @code{Wisent},
119 both written in Emacs Lisp), and parsers for several common
120 programming languages. It can also make use of @dfn{external
121 parsers}---programs such as GNU Global and GNU IDUtils.
122
123 @semantic{} provides a uniform, language-independent @acronym{API} for
124 accessing the parser output. This output can be used by other Emacs
125 Lisp programs to implement ``syntax-aware'' behavior. @semantic{}
126 itself includes several such utilities, including user-level Emacs
127 commands for navigating, searching, and completing source code.
128
129 The following diagram illustrates the structure of the @semantic{}
130 package:
131
132 @table @strong
133 @item Please Note:
134 The words in all-capital are those that @semantic{} itself provides.
135 Others are current or future languages or applications that are not
136 distributed along with @semantic{}.
137 @end table
138
139 @example
140 Applications
141 and
142 Utilities
143 -------
144 / \
145 +---------------+ +--------+ +--------+
146 C --->| C PARSER |--->| | | |
147 +---------------+ | | | |
148 +---------------+ | COMMON | | COMMON |<--- SPEEDBAR
149 Java --->| JAVA PARSER |--->| PARSE | | |
150 +---------------+ | TREE | | PARSE |<--- SEMANTICDB
151 +---------------+ | FORMAT | | API |
152 Scheme --->| SCHEME PARSER |--->| | | |<--- ecb
153 +---------------+ | | | |
154 +---------------+ | | | |
155 Texinfo --->| TEXI. PARSER |--->| | | |
156 +---------------+ | | | |
157
158 ... ... ... ...
159
160 +---------------+ | | | |
161 Lang. Y --->| Y Parser |--->| | | |<--- app. ?
162 +---------------+ | | | |
163 +---------------+ | | | |<--- app. ?
164 Lang. Z --->| Z Parser |--->| | | |
165 +---------------+ +--------+ +--------+
166 @end example
167
168 @menu
169 * Semantic Components::
170 @end menu
171
172 @node Semantic Components
173 @section Semantic Components
174
175 In this section, we provide a more detailed description of the major
176 components of @semantic{}, and how they interact with one another.
177
178 The first step in parsing a source code file is to break it up into
179 its fundamental components. This step is called lexical analysis:
180
181 @example
182 syntax table, keywords list, and options
183 |
184 |
185 v
186 input file ----> Lexer ----> token stream
187 @end example
188
189 @noindent
190 The output of the lexical analyzer is a list of tokens that make up
191 the file. The next step is the actual parsing, shown below:
192
193 @example
194 parser tables
195 |
196 v
197 token stream ---> Parser ----> parse tree
198 @end example
199
200 @noindent
201 The end result, the parse tree, is @semantic{}'s internal
202 representation of the language grammar. @semantic{} provides an
203 @acronym{API} for Emacs Lisp programs to access the parse tree.
204
205 Parsing large files can take several seconds or more. By default,
206 @semantic{} automatically caches parse trees by saving them in your
207 @file{.emacs.d} directory. When you revisit a previously-parsed file,
208 the parse tree is automatically reloaded from this cache, to save
209 time. @xref{SemanticDB}.
210
211 @node Using Semantic
212 @chapter Using Semantic
213
214 @include sem-user.texi
215
216 @node Semantic Internals
217 @chapter Semantic Internals
218
219 This chapter provides an overview of the internals of @semantic{}.
220 This information is usually not needed by application developers or
221 grammar developers; it is useful mostly for the hackers who would like
222 to learn more about how @semantic{} works.
223
224 @menu
225 * Parser code :: Code used for the parsers
226 * Tag handling :: Code used for manipulating tags
227 * Semanticdb Internals :: Code used in the semantic database
228 * Analyzer Internals :: Code used in the code analyzer
229 * Tools :: Code used in user tools
230 * Tests :: Code used for testing
231 @end menu
232
233 @node Parser code
234 @section Parser code
235
236 @semantic{} parsing code is spread across a range of files.
237
238 @table @file
239 @item semantic.el
240 The core infrastructure sets up buffers for parsing, and has all the
241 core parsing routines. Most parsing routines are overloadable, so the
242 actual implementation may be somewhere else.
243
244 @item semantic-edit.el
245 Incremental reparse based on user edits.
246
247 @item semantic-grammar.el
248 @itemx semantic-grammar.wy
249 Parser for the different grammar languages, and a major mode for
250 editing grammars in Emacs.
251
252 @item semantic-lex.el
253 Infrastructure for implementing lexical analyzers. Provides macros
254 for creating individual analyzers for specific features, and a way to
255 combine them together.
256
257 @item semantic-lex-spp.el
258 Infrastructure for a lexical symbolic preprocessor. This was written
259 to implement the C preprocessor, but could be used for other lexical
260 preprocessors.
261
262 @item bovine/bovine-grammar.el
263 @itemx bovine/bovine-grammar-macros.el
264 @itemx bovine/semantic-bovine.el
265 The ``bovine'' grammar. This is the first grammar mode written for
266 @semantic{} and is useful for simple creating simple parsers.
267
268 @item wisent/wisent.el
269 @itemx wisent/bison-wisent.el
270 @itemx wisent/semantic-wisent.el
271 @itemx wisent/semantic-debug-grammar.el
272 A port of bison to Emacs. This infrastructure lets you create LALR
273 based parsers for @semantic{}.
274
275 @item semantic-ast.el
276 Manage Abstract Syntax Trees for parsers.
277
278 @item semantic-debug.el
279 Infrastructure for debugging grammars.
280
281 @item semantic-util.el
282 Various utilities for manipulating tags, such as describing the tag
283 under point, adding labels, and the all important
284 @code{semantic-something-to-tag-table}.
285
286 @end table
287
288 @node Tag handling
289 @section Tag handling
290
291 A tag represents an individual item found in a buffer, such as a
292 function or variable. Tag handling is handled in several source
293 files.
294
295 @table @file
296 @item semantic-tag.el
297 Basic tag creation, queries, cloning, binding, and unbinding.
298
299 @item semantic-tag-write.el
300 Write a tag or tag list to a stream. These routines are used by
301 @file{semanticdb-file.el} when saving a list of tags.
302
303 @item semantic-tag-file.el
304 Files associated with tags. Goto-tag, file for include, and file for
305 a prototype.
306
307 @item semantic-tag-ls.el
308 Language dependent features of a tag, such as parent calculation, slot
309 protection, and other states like abstract, virtual, static, and leaf.
310
311 @item semantic-dep.el
312 Include file handling. Contains the include path concepts, and
313 routines for looking up file names in the include path.
314
315 @item semantic-format.el
316 Convert a tag into a nicely formatted and colored string. Use
317 @code{semantic-test-all-format-tag-functions} to test different output
318 options.
319
320 @item semantic-find.el
321 Find tags matching different conditions in a tag table.
322 These routines are used by @file{semanticdb-find.el} once the database
323 has been converted into a simpler tag table.
324
325 @item semantic-sort.el
326 Sorting lists of tags in different ways. Includes sorting a plain
327 list of tags forward or backward. Includes binning tags based on
328 attributes (bucketize), and tag adoption for multiple references to
329 the same thing.
330
331 @item semantic-doc.el
332 Capture documentation comments from near a tag.
333
334 @end table
335
336 @node Semanticdb Internals
337 @section Semanticdb Internals
338
339 @acronym{Semanticdb} complexity is certainly an issue. It is a rather
340 hairy problem to try and solve.
341
342 @table @file
343 @item semanticdb.el
344 Defines a @dfn{database} and a @dfn{table} base class. You can
345 instantiate these classes, and use them, but they are not persistent.
346
347 This file also provides support for @code{semanticdb-minor-mode},
348 which automatically associates files with tables in databases so that
349 tags are @emph{saved} while a buffer is not in memory.
350
351 The database and tables both also provide applicable cache information,
352 and cache flushing system. The semanticdb search routines use caches
353 to save datastructures that are complex to calculate.
354
355 Lastly, it provides the concept of @dfn{project root}. It is a system
356 by which a file can be associated with the root of a project, so if
357 you have a tree of directories and source files, it can find the root,
358 and allow a tag-search to span all available databases in that
359 directory hierarchy.
360
361 @item semanticdb-file.el
362 Provides a subclass of the basic table so that it can be saved to
363 disk. Implements all the code needed to unbind/rebind tags to a
364 buffer and writing them to a file.
365
366 @item semanticdb-el.el
367 Implements a special kind of @dfn{system} database that uses Emacs
368 internals to perform queries.
369
370 @item semanticdb-ebrowse.el
371 Implements a system database that uses Ebrowse to parse files into a
372 table that can be queried for tag names. Successful tag hits during a
373 find causes @semantic{} to pick up and parse the reference files to
374 get the full details.
375
376 @item semanticdb-find.el
377 Infrastructure for searching groups @semantic{} databases, and dealing
378 with the search results format.
379
380 @item semanticdb-ref.el
381 Tracks crossreferences. Cross references are needed when buffer is
382 reparsed, and must alert other tables that any dependent caches may
383 need to be flushed. References are in the form of include files.
384
385 @end table
386
387 @node Analyzer Internals
388 @section Analyzer Internals
389
390 The @semantic{} analyzer is a complex engine which has been broken
391 down across several modules. When the @semantic{} analyzer fails,
392 start with @code{semantic-analyze-debug-assist}, then dive into some
393 of these files.
394
395 @table @file
396 @item semantic-analyze.el
397 The core analyzer for defining the @dfn{current context}. The
398 current context is an object that contains references to aspects of
399 the local context including the current prefix, and a tag list
400 defining what the prefix means.
401
402 @item semantic-analyze-complete.el
403 Provides @code{semantic-analyze-possible-completions}.
404
405 @item semantic-analyze-debug.el
406 The analyzer debugger. Useful when attempting to get everything
407 configured.
408
409 @item semantic-analyze-fcn.el
410 Various support functions needed by the analyzer.
411
412 @item semantic-ctxt.el
413 Local context parser. Contains overloadable functions used to move
414 around through different scopes, get local variables, and collect the
415 current prefix used when doing completion.
416
417 @item semantic-scope.el
418 Calculate @dfn{scope} for a location in a buffer. The scope includes
419 local variables, and tag lists in scope for various reasons, such as
420 C++ using statements.
421
422 @item semanticdb-typecache.el
423 The typecache is part of @code{semanticdb}, but is used primarily by
424 the analyzer to look up datatypes and complex names. The typecache is
425 bound across source files and builds a master lookup table for data
426 type names.
427
428 @item semantic-ia.el
429 Interactive Analyzer functions. Simple routines that do completion or
430 lookups based on the results from the Analyzer. These routines are
431 meant as examples for application writers, but are quite useful as
432 they are.
433
434 @item semantic-ia-sb.el
435 Speedbar support for the analyzer, displaying context info, and
436 completion lists.
437
438 @end table
439
440 @node Tools
441 @section Tools
442
443 These files contain various tools a user can use.
444
445 @table @file
446 @item semantic-idle.el
447 Idle scheduler for @semantic{}. Manages reparsing buffers after
448 edits, and large work tasks in idle time. Includes modes for showing
449 summary help and pop-up completion.
450
451 @item senator.el
452 The @semantic{} navigator. Provides many ways to move through a
453 buffer based on the active tag table.
454
455 @item semantic-decorate.el
456 A minor mode for decorating tags based on details from the parser.
457 Includes overlines for functions, or coloring class fields based on
458 protection.
459
460 @item semantic-decorate-include.el
461 A decoration mode for include files, which assists users in setting up
462 parsing for their includes.
463
464 @item semantic-complete.el
465 Advanced completion prompts for reading tag names in the minibuffer, or
466 inline in a buffer.
467
468 @item semantic-imenu.el
469 Imenu support for using @semantic{} tags in imenu.
470
471 @item semantic-mru-bookmark.el
472 Automatic bookmarking based on tags. Jump to locations you've been
473 before based on tag name.
474
475 @item semantic-sb.el
476 Support for @semantic{} tag usage in Speedbar.
477
478 @item semantic-util-modes.el
479 A bunch of small minor-modes that exposes aspects of the semantic
480 parser state. Includes @code{semantic-stickyfunc-mode}.
481
482 @item document.el
483 @itemx document-vars.el
484 Create an update comments for tags.
485
486 @item semantic-adebug.el
487 Extensions of @file{data-debug.el} for @semantic{}.
488
489 @item semantic-chart.el
490 Draw some charts from stats generated from parsing.
491
492
493 @item semantic-elp.el
494 Profiler for helping to optimize the @semantic{} analyzer.
495
496
497 @end table
498
499 @node Tests
500 @section Tests
501
502 @table @file
503
504 @item semantic-utest.el
505 Basic testing of parsing and incremental parsing for most supported
506 languages.
507
508 @item semantic-ia-utest.el
509 Test the semantic analyzer's ability to provide smart completions.
510
511 @item semantic-utest-c.el
512 Tests for the C parser's lexical pre-processor.
513
514 @item semantic-regtest.el
515 Regression tests from the older Semantic 1.x API.
516
517 @end table
518
519 @node Glossary
520 @appendix Glossary
521
522 @table @keyword
523 @item BNF
524 In semantic 1.4, a BNF file represented ``Bovine Normal Form'', the
525 grammar file used for the 1.4 parser generator. This was a play on
526 Backus-Naur Form which proved too confusing.
527
528 @item bovinate
529 A verb representing what happens when a bovine parser parses a file.
530
531 @item bovine lambda
532 In a bovine, or LL parser, the bovine lambda is a function to execute
533 when a specific set of match rules has succeeded in matching text from
534 the buffer.
535
536 @item bovine parser
537 A parser using the bovine parser generator. It is an LL parser
538 suitable for small simple languages.
539
540 @item context
541
542 @item LALR
543
544 @item lexer
545 A program which converts text into a stream of tokens by analyzing
546 them lexically. Lexers will commonly create strings, symbols,
547 keywords and punctuation, and strip whitespaces and comments.
548
549 @item LL
550
551 @item nonterminal
552 A nonterminal symbol or simply a nonterminal stands for a class of
553 syntactically equivalent groupings. A nonterminal symbol name is used
554 in writing grammar rules.
555
556 @item overloadable
557 Some functions are defined via @code{define-overload}.
558 These can be overloaded via ....
559
560 @item parser
561 A program that converts @b{tokens} to @b{tags}.
562
563 @item tag
564 A tag is a representation of some entity in a language file, such as a
565 function, variable, or include statement. In semantic, the word tag is
566 used the same way it is used for the etags or ctags tools.
567
568 A tag is usually bound to a buffer region via overlay, or it just
569 specifies character locations in a file.
570
571 @item token
572 A single atomic item returned from a lexer. It represents some set
573 of characters found in a buffer.
574
575 @item token stream
576 The output of the lexer as well as the input to the parser.
577
578 @item wisent parser
579 A parser using the wisent parser generator. It is a port of bison to
580 Emacs Lisp. It is an LALR parser suitable for complex languages.
581 @end table
582
583
584 @node GNU Free Documentation License
585 @appendix GNU Free Documentation License
586 @include doclicense.texi
587
588 @node Index
589 @unnumbered Index
590 @printindex cp
591
592 @iftex
593 @contents
594 @summarycontents
595 @end iftex
596
597 @bye
598
599 @c Following comments are for the benefit of ispell.
600
601 @c LocalWords: alist API APIs arg argc args argv asis assoc autoload Wisent
602 @c LocalWords: backquote bnf bovinate bovinates LALR
603 @c LocalWords: bovinating bovination bovinator bucketize
604 @c LocalWords: cb cdr charquote checkcache cindex CLOS
605 @c LocalWords: concat concocting const constantness ctxt Decl defcustom
606 @c LocalWords: deffn deffnx defun defvar destructor's dfn diff dir
607 @c LocalWords: doc docstring EDE EIEIO elisp emacsman emph enum
608 @c LocalWords: eq Exp EXPANDFULL expression fn foo func funcall
609 @c LocalWords: ia ids iff ifinfo imenu imenus init int isearch itemx java kbd
610 @c LocalWords: keymap keywordtable lang languagemode lexer lexing Ludlam
611 @c LocalWords: menubar metaparent metaparents min minibuffer Misc mode's
612 @c LocalWords: multitable NAvigaTOR noindent nomedian nonterm noselect
613 @c LocalWords: nosnarf obarray OLE OO outputfile paren parsetable POINT's
614 @c LocalWords: popup positionalonly positiononly positionormarker pre
615 @c LocalWords: printf printindex Programmatically pt quotemode
616 @c LocalWords: ref regex regexp Regexps reparse resetfile samp sb
617 @c LocalWords: scopestart SEmantic semanticdb setfilename setq
618 @c LocalWords: settitle setupfunction sexp sp SPC speedbar speedbar's
619 @c LocalWords: streamorbuffer struct subalist submenu submenus
620 @c LocalWords: subsubsection sw sym texi texinfo titlefont titlepage
621 @c LocalWords: tok TOKEN's toplevel typemodifiers uml unset untar
622 @c LocalWords: uref usedb var vskip xref yak