prepare 1.0.1 release
[ntk/apt.git] / doc / dpkg-tech.sgml
1 <!doctype debiandoc PUBLIC "-//DebianDoc//DTD DebianDoc//EN">
2 <book>
3 <title>dpkg technical manual</title>
4
5 <author>Tom Lees <email>tom@lpsg.demon.co.uk</email></author>
6 <version>$Id: dpkg-tech.sgml,v 1.3 2003/02/12 15:05:45 doogie Exp $</version>
7
8 <abstract>
9 This document describes the minimum necessary workings for the APT dselect
10 replacement. It gives an overall specification of what its external interface
11 must look like for compatibility, and also gives details of some internal
12 quirks.
13 </abstract>
14
15 <copyright>
16 Copyright &copy; Tom Lees, 1997.
17 <p>
18 APT and this document are free software; you can redistribute them and/or
19 modify them under the terms of the GNU General Public License as published
20 by the Free Software Foundation; either version 2 of the License, or (at your
21 option) any later version.
22
23 <p>
24 For more details, on Debian systems, see the file
25 /usr/share/common-licenses/GPL for the full license.
26 </copyright>
27
28 <toc sect>
29
30 <chapt>Quick summary of dpkg's external interface
31 <sect id="control">Control files
32
33 <p>
34 The basic dpkg package control file supports the following major features:-
35
36 <list>
37 <item>5 types of dependencies:-
38 <list>
39 <item>Pre-Depends, which must be satisfied before a package may be
40 unpacked
41 <item>Depends, which must be satisfied before a package may be
42 configured
43 <item>Recommends, to specify a package which if not installed may
44 severely limit the usefulness of the package
45 <item>Suggests, to specify a package which may increase the
46 productivity of the package
47 <item>Conflicts, to specify a package which must NOT be installed
48 in order for the package to be configured
49 <item>Breaks, to specify a package which is broken by the
50 package and which should therefore not be configured while broken
51 </list>
52 Each of these dependencies can specify a version and a depedency on that
53 version, for example "<= 0.5-1", "== 2.7.2-1", etc. The comparators available
54 are:-
55 <list>
56 <item>"&lt;&lt;" - less than
57 <item>"&lt;=" - less than or equal to
58 <item>"&gt;&gt;" - greater than
59 <item>"&gt;=" - greater than or equal to
60 <item>"==" - equal to
61 </list>
62 <item>The concept of "virtual packages", which many other packages may provide,
63 using the Provides mechanism. An example of this is the "httpd" virtual package,
64 which all web servers should provide. Virtual package names may be used in
65 dependency headers. However, current policy is that virtual packages do not
66 support version numbers, so dependencies on virtual packages with versions
67 will always fail.
68 <item>Several other control fields, such as Package, Version, Description,
69 Section, Priority, etc., which are mainly for classification purposes. The
70 package name must consist entirely of lowercase characters, plus the characters
71 '+', '-', and '.'. Fields can extend across multiple lines - on the second
72 and subsequent lines, there is a space at the beginning instead of a field
73 name and a ':'. Empty lines must consist of the text " .", which will be
74 ignored, as will the initial space for other continuation lines. This feature
75 is usually only used in the Description field.
76 </list>
77
78 <sect>The dpkg status area
79
80 <p>
81 The "dpkg status area" is the term used to refer to the directory where dpkg
82 keeps its various status files (GNU would have you call it the dpkg shared
83 state directory). This is always, on Debian systems, /var/lib/dpkg. However,
84 the default directory name should not be hard-coded, but #define'd, so that
85 alteration is possible (it is available via configure in dpkg 1.4.0.9 and
86 above). Of course, in a library, code should be allowed to override the
87 default directory, but the default should be part of the library (so that
88 the user may change the dpkg admin dir simply by replacing the library).
89
90 <p>
91 Dpkg keeps a variety of files in its status area. These are discussed later
92 on in this document, but a quick summary of the files is here:-
93
94 <list>
95 <item>available - this file contains a concatenation of control information
96 from all the packages which dpkg knows about. This is updated using the dpkg
97 commands "--update-avail &lt;file&gt;", "--merge-avail &lt;file&gt;", and
98 "--clear-avail".
99 <item>status - this file contains information on the following things for
100 every package:-
101 <list>
102 <item>Whether it is installed, not installed, unpacked, removed,
103 failed configuration, or half-installed (deconfigured in
104 favour of another package).
105 <item>Whether it is selected as install, hold, remove, or purge.
106 <item>If it is "ok" (no installation problems), or "not-ok".
107 <item>It usually also contains the section and priority (so that
108 dselect may classify packages not in available)
109 <item>For packages which did not initially appear in the "available"
110 file when they were installed, the other control information
111 for them.
112 </list>
113 <p>
114 The exact format for the "Status:" field is:
115 <example>
116 Status: Want Flag Status
117 </example>
118 Where <var>Want</> may be one of <em>unknown</>, <em>install</>,
119 <em>hold</>, <em>deinstall</>, <em>purge</>. <var>Flag</>
120 may be one of <em>ok</>, <em>reinstreq</>, <em>hold</>,
121 <em>hold-reinstreq</>.
122 <var>Status</> may be one of <em>not-installed</>, <em>unpacked</>,
123 <em>half-configured</>, <em>installed</>, <em>half-installed</>
124 <em>config-files</>, <em>post-inst-failed</>, <em>removal-failed</>.
125 The states are as follows:-
126 <taglist>
127 <tag>not-installed
128 <item>No files are installed from the package, it has no config files
129 left, it uninstalled cleanly if it ever was installed.
130 <tag>unpacked
131 <item>The basic files have been unpacked (and are listed in
132 /var/lib/dpkg/info/[package].list. There are config files present,
133 but the postinst script has _NOT_ been run.
134 <tag>half-configured
135 <item>The package was installed and unpacked, but the postinst script
136 failed in some way.
137 <tag>installed
138 <item>All files for the package are installed, and the configuration
139 was also successful.
140 <tag>half-installed
141 <item>An attempt was made to remove the packagem but there was a failure
142 in the prerm script.
143 <tag>config-files
144 <item>The package was "removed", not "purged". The config files are left,
145 but nothing else.
146 <tag>post-inst-failed
147 <item>Old name for half-configured. Do not use.
148 <tag>removal-failed
149 <item>Old name for half-installed. Do not use.
150 </taglist>
151 The two last items are only left in dpkg for compatibility - they are
152 understood by it, but never written out in this form.
153
154 <p>
155 Please see the dpkg source code, <tt>lib/parshelp.c</tt>,
156 <em>statusinfos</>, <em>eflaginfos</> and <em>wantinfos</> for more
157 details.
158
159 <item>info - this directory contains files from the control archive of every
160 package currently installed. They are installed with a prefix of "&lt;packagename&gt;.".
161 In addition to this, it also contains a file called &lt;package&gt;.list for every
162 package, which contains a list of files. Note also that the control file is
163 not copied into here; it is instead found as part of status or available.
164 <item>methods - this directory is reserved for "method"-specific files - each
165 "method" has a subdirectory underneath this directory (or at least, it can
166 have). In addition, there is another subdirectory "mnt", where misc.
167 filesystems (floppies, CD-ROMs, etc.) are mounted.
168 <item>alternatives - directory used by the "update-alternatives" program. It
169 contains one file for each "alternatives" interface, which contains information
170 about all the needed symlinked files for each alternative.
171 <item>diversions - file used by the "dpkg-divert" program. Each diversion takes
172 three lines. The first is the package name (or ":" for user diversion), the
173 second the original filename, and the third the diverted filename.
174 <item>updates - directory used internally by dpkg. This is discussed later,
175 in the section <ref id="updates">.
176 <item>parts - temporary directory used by dpkg-split
177 </list>
178
179 <sect>The dpkg library files
180
181 <p>
182 These files are installed under /usr/lib/dpkg (usually), but
183 /usr/local/lib/dpkg is also a possibility (as Debian policy dictates). Under
184 this directory, there is a "methods" subdirectory. The methods subdirectory
185 in turn contains any number of subdirectories for each general method
186 processor (note that one set of method scripts can, and is, used for more than
187 one of the methods listed under dselect).
188
189 <p>
190 The following files may be found in each of these subdirectories:-
191
192 <list>
193 <item>names - One line per method, two-digit priority to appear on menu
194 at beginning, followed by a space, the name, and then another space and the
195 short description.
196 <item>desc.&lt;name&gt; - Contains the long description displayed by dselect
197 when the cursor is put over the &lt;name&gt; method.
198 <item>setup - Script or program which sets up the initial values to be used
199 by this method. Called with first argument as the status area directory
200 (/var/lib/dpkg), second argument as the name of the method (as in the directory
201 name), and the third argument as the option (as in the names file).
202 <item>install - Script/program called when the "install" option of dselect is
203 run with this method. Same arguments as for setup.
204 <item>update - Script/program called when the "update" option of dselect is
205 run. Same arguments as for setup/install.
206 </list>
207
208 <sect>The "dpkg" command-line utility
209
210 <sect1>"Documented" command-line interfaces
211
212 <p>
213 As yet unwritten. You can refer to the other manuals for now. See
214 <manref name="dpkg" section="8">.
215
216 <sect1>Environment variables which dpkg responds to
217
218 <p>
219 <list>
220 <item>DPKG_NO_TSTP - if set to a non-null value, this variable causes dpkg to
221 run a child shell process instead of sending itself a SIGTSTP, when the user
222 selects to background the dpkg process when it asks about conffiles.
223 <item>SHELL - used to determine which shell to run in the case when
224 DPKG_NO_TSTP is set.
225 <item>CC - used as the C compiler to call to determine the target architecture.
226 The default is "gcc".
227 <item>PATH - dpkg checks that it can find at least the following files in the
228 path when it wants to run package installation scripts, and gives an error if
229 it cannot find all of them:-
230 <list>
231 <item>ldconfig
232 <item>start-stop-daemon
233 <item>install-info
234 <item>update-rc.d
235 </list>
236 </list>
237
238 <sect1>Assertions
239
240 <p>
241 The dpkg utility itself is required for quite a number of packages, even if
242 they have been installed with a tool totally separate from dpkg. The reason for
243 this is that some packages, in their pre-installation scripts, check that your
244 version of dpkg supports certain features. This was broken from the start, and
245 it should have actually been a control file header "Dpkg-requires", or similar.
246 What happens is that the configuration scripts will abort or continue according
247 to the exit code of a call to dpkg, which will stop them from being wrongly
248 configured.
249
250 <p>
251 These special command-line options, which simply return as true or false are
252 all prefixed with "--assert-". Here is a list of them (without the prefix):-
253
254 <list>
255 <item>support-predepends - Returns success or failure according to whether
256 a version of dpkg which supports predepends properly (1.1.0 or above) is
257 installed, according to the database.
258 <item>working-epoch - Return success or failure according to whether a version
259 of dpkg which supports epochs in version properly (1.4.0.7 or above) is
260 installed, according to the database.
261 </list>
262
263 <p>
264 Both these options check the status database to see what version of the "dpkg"
265 package is installed, and check it against a known working version.
266
267 <sect1>--predep-package
268
269 <p>
270 This strange option is described as follows in the source code:
271
272 <example>
273 /* Print a single package which:
274 * (a) is the target of one or more relevant predependencies.
275 * (b) has itself no unsatisfied pre-dependencies.
276 * If such a package is present output is the Packages file entry,
277 * which can be massaged as appropriate.
278 * Exit status:
279 * 0 = a package printed, OK
280 * 1 = no suitable package available
281 * 2 = error
282 */
283 </example>
284
285 <p>
286 On further inspection of the source code, it appears that what is does is
287 this:-
288
289 <list>
290 <item>Looks at the packages in the database which are selected as "install",
291 and are installed.
292 <item>It then looks at the Pre-Depends information for each of these packages
293 from the available file. When it find a package for which any of the
294 pre-dependencies are not satisfied, it breaks from the loop through the packages.
295 <item>It then looks through the unsatisfied pre-dependencies, and looks for
296 packages which would satisfy this pre-dependency, stopping on the first it
297 finds. If it finds none, it bombs out with an error.
298 <item>It then continues this for every dependency of the initial package.
299 </list>
300
301 Eventually, it writes out the record of all the packages to satisfy the
302 pre-dependencies. This is used by the disk method to make sure that its
303 dependency ordering is correct. What happens is that all pre-depending
304 packages are first installed, then it runs dpkg -iGROEB on the directory,
305 which installs in the order package files are found. Since pre-dependencies
306 mean that a package may not even be unpacked unless they are satisfied, it is
307 necessary to do this (usually, since all the package files are unpacked in one
308 phase, the configured in another, this is not needed).
309
310 <chapt>dpkg-deb and .deb file internals
311
312 <p>
313 This chapter describes the internals to the "dpkg-deb" tool, which is used
314 by "dpkg" as a back-end. dpkg-deb has its own tar extraction functions, which
315 is the source of many problems, as it does not support long filenames, using
316 extension blocks.
317
318 <sect>The .deb archive format
319
320 <p>
321 The main principal of the new-format Debian archive (I won't describe the old
322 format - for that have a look at deb-old.5), is that the archive really is
323 an archive - as used by "ar" and friends. However, dpkg-deb uses this format
324 internally, rather than calling "ar". Inside this archive, there are usually
325 the following members:-
326
327 <list>
328 <item>debian-binary
329 <item>control.tar.gz
330 <item>data.tar.gz
331 </list>
332
333 <p>
334 The debian-binary member consists simply of the string "2.0", indicating the
335 format version. control.tar.gz contains the control files (and scripts), and
336 the data.tar.gz contains the actual files to populate the filesystem with.
337 Both tarfiles extract straight into the current directory. Information on the
338 tar formats can be found in the GNU tar info page. Since dpkg-deb calls
339 "tar -cf" to build packages, the Debian packages use the GNU extensions.
340
341 <sect>The dpkg-deb command-line
342
343 <p>
344 dpkg-deb documents itself thoroughly with its '--help' command-line option.
345 However, I am including a reference to these for completeness. dpkg-deb
346 supports the following options:-
347
348 <list>
349 <item>--build (-b) &lt;dir&gt; - builds a .deb archive, takes a directory which
350 contains all the files as an argument. Note that the directory
351 &lt;dir&gt;/DEBIAN will be packed separately into the control archive.
352 <item>--contents (-c) &lt;debfile&gt; - Lists the contents of the "data.tar.gz"
353 member.
354 <item>--control (-e) &lt;debfile&gt; - Extracts the control archive into a
355 directory called DEBIAN. Alternatively, with another argument, it will extract
356 it into a different directory.
357 <item>--info (-I) &lt;debfile&gt; - Prints the contents of the "control" file
358 in the control archive to stdout. Alternatively, giving it other arguments will
359 cause it to print the contents of those files instead.
360 <item>--field (-f) &lt;debfile&gt; &lt;field&gt; ... - Prints any number of
361 fields from the "control" file. Giving it extra arguments limits the fields it
362 prints to only those specified. With no command-line arguments other than a
363 filename, it is equivalent to -I and just the .deb filename.
364 <item>--extract (-x) &lt;debfile&gt; &lt;dir&gt; - Extracts the data archive
365 of a debian package under the directory &lt;dir&gt;.
366 <item>--vextract (-X) &lt;debfile&gt; &lt;dir&gt; - Same as --extract, except
367 it is equivalent of giving tar the '-v' option - it prints the filenames as
368 it extracts them.
369 <item>--fsys-tarfile &lt;debfile&gt; - This option outputs a gunzip'd version
370 of data.tar.gz to stdout.
371 <item>--new - sets the archive format to be used to the new Debian format
372 <item>--old - sets the archive format to be used to the old Debian format
373 <item>--debug - Tells dpkg-deb to produce debugging output
374 <item>--nocheck - Tells dpkg-deb not to check the sanity of the control file
375 <item>--help (-h) - Gives a help message
376 <item>--version - Shows the version number
377 <item>--licence/--license (UK/US spellings) - Shows a brief outline of the GPL
378 </list>
379
380 <sect1>Internal checks used by dpkg-deb when building packages
381
382 <p>
383 Here is a list of the internal checks used by dpkg-deb when building packages.
384 It is in the order they are done.
385
386 <list>
387 <item>First, the output Debian archive argument, if it is given, is checked
388 using stat. If it is a directory, an internal flag is set. This check is only
389 made if the archive name is specified explicitly on the command-line. If the
390 argument was not given, the default is the directory name, with ".deb"
391 appended.
392 <item>Next, the control file is checked, unless the --nocheck flag was
393 specified on the command-line. dpkg-deb will bomb out if the second argument
394 to --build was a directory, and --nocheck was specified. Note that dpkg-deb
395 will not be able to determine the name of the package in this case. In the
396 control file, the following things are checked:-
397 <list>
398 <item>The package name is checked to see if it contains any invalid
399 characters (see <ref id="control"> for this).
400 <item>The priority field is checked to see if it uses standard values,
401 and user-defined values are warned against. However, note that this
402 check is now redundant, since the control file no longer contains
403 the priority - the changes file now does this.
404 <item>The control file fields are then checked against the standard
405 list of fields which appear in control files, and any "user-defined"
406 fields are reported as warnings.
407 <item>dpkg-deb then checks that the control file contains a valid
408 version number.
409 </list>
410 <item>After this, in the case where a directory was specified to build the
411 .deb file in, the filename is created as "directory/pkg_ver.deb" or
412 "directory/pkg_ver_arch.deb", depending on whether the control file contains
413 an architecture field.
414 <item>Next, dpkg-deb checks for the &lt;dir&gt;/DEBIAN directory. It complains
415 if it doesn't exist, or if it has permissions &lt; 0755, or &gt; 0775.
416 <item>It then checks that all the files in this subdir are either symlinks
417 or plain files, and have permissions between 0555 and 0775.
418 <item>The conffiles file is then checked to see if the filenames are too
419 long. Warnings are produced for each that is. After this, it checks that
420 the package provides initial copies of each of these conffiles, and that
421 they are all plain files.
422 </list>
423
424 <chapt>dpkg internals
425
426 <p>
427 This chapter describes the internals of dpkg itself. Although the low-level
428 formats are quite simple, what dpkg does in certain cases often does not
429 make sense.
430
431 <sect id="updates">Updates
432
433 <p>
434 This describes the /var/lib/dpkg/updates directory. The function of this
435 directory is somewhat strange, and seems only to be used internally. A function
436 called cleanupdates is called whenever the database is scanned. This function
437 in turn uses <manref name="scandir" section="3">, to sort the files in this
438 directory. Files who names do not consist entirely of digits are discarded.
439 dpkg also causes a fatal error if any of the filenames are different lengths.
440
441 <p>
442 After having scanned the directory, dpkg in turn parses each file the same way
443 it parses the status file (they are sorted by the scandir to be in numerical
444 order). After having done this, it then writes the status information back
445 to the "status" file, and removes all the "updates" files.
446
447 <p>
448 These files are created internally by dpkg's "checkpoint" function, and are
449 cleaned up when dpkg exits cleanly.
450
451 <p>
452 Juding by the use of the updates directory I would call it a Journal. Inorder
453 to efficiently ensure the complete integrity of the status file dpkg will
454 "checkpoint" or journal all of it's activities in the updates directory. By
455 merging the contents of the updates directory (in order!!) against the
456 original status file it can get the precise current state of the system,
457 even in the event of a system failure while dpkg is running.
458
459 <p>
460 The other option would be to sync-rewrite the status file after each
461 operation, which would kill performance.
462
463 <p>
464 It is very important that any program that uses the status file abort if
465 the updates directory is not empty! The user should be informed to run dpkg
466 manually (what options though??) to correct the situation.
467
468 <sect>What happens when dpkg reads the database
469
470 <p>
471 First, the status file is read. This gives dpkg an initial idea of the packages
472 that are there. Next, the updates files are read in, overriding the status
473 file, and if necessary, the status file is re-written, and updates files are
474 removed. Finally, the available file is read. The available file is read
475 with flags which preclude dpkg from updating any status information from it,
476 though - installed version, etc., and is also told to record that the packages
477 it reads this time are available, not installed.
478
479 <p>
480 More information on updates is given above.
481
482 <sect>How dpkg compares version numbers
483
484 <p>
485 Version numbers consist of three parts: the epoch, the upstream version, and
486 the Debian revision. Dpkg compares these parts in that order. If the epochs
487 are different, it returns immediately, and so on.
488
489 <p>
490 However, the important part is how it compares the versions which are
491 essentially stored as just strings. These are compared in two distinct parts:
492 those consisting of numerical characters (which are evaluated, and then
493 compared), and those consisting of other characters. When comparing
494 non-numerical parts, they are compared as the character values (ASCII), but
495 non-alphabetical characters are considered "greater than" alphabetical ones.
496 Also note that longer strings (after excluding differences where numerical
497 values are equal) are considered "greater than" shorter ones.
498
499 <p>
500 Here are a few examples of how these rules apply:-
501
502 <example>
503 15 > 10
504 0010 == 10
505
506 d.r > dsr
507 32.d.r == 0032.d.r
508 d.rnr < d.rnrn
509 </example>
510
511 </book>