Import Upstream version 1.8.5
[hcoop/debian/openafs.git] / doc / txt / vm-integration
CommitLineData
805e021f
CE
1Copyright 2000, International Business Machines Corporation and others.
2All Rights Reserved.
3
4This software has been released under the terms of the IBM Public
5License. For details, see the LICENSE file in the top-level source
6directory or online at http://www.openafs.org/dl/license10.html
7
8Here's a quick guide to understanding the AFS 3 VM integration. This
9will help you do AFS 3 ports, since one of the trickiest parts of an
10AFS 3 port is the integration of the virtual memory system with the
11file system.
12
13The issues arise because in AFS, as in any network file system,
14changes may be made from any machine while references are being made
15to a file on your own machine. Data may be cached in your local
16machine's VM system, and when the data changes remotely, the cache
17manager must invalidate the old information in the VM system.
18
19Furthermore, in some systems, there are pages of virtual memory
20containing changes to the files that need to be written back to the
21server at some time. In these systems, it is important not to
22invalidate those pages before the data has made it to the file system.
23In addition, such systems often provide mapped file support, with read
24and write system calls affecting the same shared virtual memory as is
25used by the file should it be mapped.
26
27As you may have guessed from the above, there are two general styles
28of VM integration done in AFS 3: one for systems with limited VM
29system caching, and one for more modern systems where mapped files
30coexist with read and write system calls.
31
32For the older systems, the function osi_FlushText exists. Its goal is
33to invalidate, or try to invalidate, caches where VM pages might cache
34file information that's now obsolete. Even if the invalidation is
35impossible at the time the call is made, things should be setup so
36that the invalidation happens afterwards.
37
38I'm not going to say more about this type of system, since fewer and
39fewer exist, and since I'm low on time. If I get back to this paper
40later, I'll remove this paragraph. The rest of this note talks about
41the more modern mapped file systems.
42
43For mapped file systems, the function osi_FlushPages is called from
44various parts of the AFS cache manager. We assume that this function
45must be called without holding any vnode locks, since it may call back
46to the file system to do part of its work.
47
48The function osi_FlushPages has a relatively complex specification.
49If the file is open for writing, or if the data version of the pages
50that could be in memory (vp->mapDV) is the current data version number
51of the file, then this function has no work to do. The rationale is
52that if the file is open for writing, calling this function could
53destroy data written to the file but not flushed from the VM system to
54the cache file. If mapDV >= DataVersion, then flushing the VM
55system's pages won't change the fact that we can still only have pages
56from data version == mapDV in memory. That's because flushing all
57pages from the VM system results in a post condition that the only
58pages that might be in memory are from the current data version.
59
60If neither of the two conditions above occur, then we actually
61invalidate the pages, on a Sun by calling pvn_vptrunc. This discards
62the pages without writing any dirty pages to the cache file. We then
63set the mapDV field to the highest data version seen before we started
64the call to flush the pages. On systems that release the vnode lock
65while doing the page flush, the file's data version at the end of this
66procedure may be larger than the value we set mapDV to, but that's
67only conservative, since a new could have been created from the
68earlier version of the file.
69
70There are a few times that we must call osi_FlushPages. We should
71call it at the start of a read or open call, so that we raise mapDV to
72the current value, and get rid of any old data that might interfere
73with later reads. Raising mapDV to the current value is also
74important, since if we wrote data with mapDV < DataVersion, then a
75call to osi_FlushPages would discard this data if the pages were
76modified w/o having the file open for writing (e.g. using a mapped
77file). This is why we also call it in afs_map. We call it in
78afs_getattr, since afs_getattr is the only function guaranteed to be
79called between the time another client updates an executable, and the
80time that our own local client tries to exec this executable; if we
81fail to call osi_FlushPages here, we might use some pages from the
82previous version of the executable file.
83
84Also, note that we update mapDV after a store back to the server
85completes, if we're sure that no other versions were created during
86the file's storeback. The mapDV invariant (that no pages from earlier
87data versions exist in memory) remains true, since the only versions
88that existed between the old and new mapDV values all contained the
89same data.
90
91Finally, note a serious incompleteness in this system: we aren't
92really prepared to deal with mapped files correctly. In particular,
93there is no code to ensure that data stored in dirty VM pages ends up
94in a cache file, except as a side effect of the segmap_release call
95(on Sun 4s) that unmaps the data from the kernel map, and which,
96because of the SM_WRITE flag, also calls putpage synchronously to get
97rid of the data.
98
99This problem needs to be fixed for any system that uses mapped files
100seriously. Note that the NeXT port's generic write call uses mapped
101files, but that we've set a flag (close_flush) that ensures that all
102dirty pages get flushed after every write call. It is also something
103of a performance hit, since it would be better to write those pages to
104the cache asynchronously rather than after every write, as happens
105now.