Commit | Line | Data |
---|---|---|
805e021f CE |
1 | Copyright 2000, International Business Machines Corporation and others. |
2 | All Rights Reserved. | |
3 | ||
4 | This software has been released under the terms of the IBM Public | |
5 | License. For details, see the LICENSE file in the top-level source | |
6 | directory or online at http://www.openafs.org/dl/license10.html | |
7 | ||
8 | Here's a quick guide to understanding the AFS 3 VM integration. This | |
9 | will help you do AFS 3 ports, since one of the trickiest parts of an | |
10 | AFS 3 port is the integration of the virtual memory system with the | |
11 | file system. | |
12 | ||
13 | The issues arise because in AFS, as in any network file system, | |
14 | changes may be made from any machine while references are being made | |
15 | to a file on your own machine. Data may be cached in your local | |
16 | machine's VM system, and when the data changes remotely, the cache | |
17 | manager must invalidate the old information in the VM system. | |
18 | ||
19 | Furthermore, in some systems, there are pages of virtual memory | |
20 | containing changes to the files that need to be written back to the | |
21 | server at some time. In these systems, it is important not to | |
22 | invalidate those pages before the data has made it to the file system. | |
23 | In addition, such systems often provide mapped file support, with read | |
24 | and write system calls affecting the same shared virtual memory as is | |
25 | used by the file should it be mapped. | |
26 | ||
27 | As you may have guessed from the above, there are two general styles | |
28 | of VM integration done in AFS 3: one for systems with limited VM | |
29 | system caching, and one for more modern systems where mapped files | |
30 | coexist with read and write system calls. | |
31 | ||
32 | For the older systems, the function osi_FlushText exists. Its goal is | |
33 | to invalidate, or try to invalidate, caches where VM pages might cache | |
34 | file information that's now obsolete. Even if the invalidation is | |
35 | impossible at the time the call is made, things should be setup so | |
36 | that the invalidation happens afterwards. | |
37 | ||
38 | I'm not going to say more about this type of system, since fewer and | |
39 | fewer exist, and since I'm low on time. If I get back to this paper | |
40 | later, I'll remove this paragraph. The rest of this note talks about | |
41 | the more modern mapped file systems. | |
42 | ||
43 | For mapped file systems, the function osi_FlushPages is called from | |
44 | various parts of the AFS cache manager. We assume that this function | |
45 | must be called without holding any vnode locks, since it may call back | |
46 | to the file system to do part of its work. | |
47 | ||
48 | The function osi_FlushPages has a relatively complex specification. | |
49 | If the file is open for writing, or if the data version of the pages | |
50 | that could be in memory (vp->mapDV) is the current data version number | |
51 | of the file, then this function has no work to do. The rationale is | |
52 | that if the file is open for writing, calling this function could | |
53 | destroy data written to the file but not flushed from the VM system to | |
54 | the cache file. If mapDV >= DataVersion, then flushing the VM | |
55 | system's pages won't change the fact that we can still only have pages | |
56 | from data version == mapDV in memory. That's because flushing all | |
57 | pages from the VM system results in a post condition that the only | |
58 | pages that might be in memory are from the current data version. | |
59 | ||
60 | If neither of the two conditions above occur, then we actually | |
61 | invalidate the pages, on a Sun by calling pvn_vptrunc. This discards | |
62 | the pages without writing any dirty pages to the cache file. We then | |
63 | set the mapDV field to the highest data version seen before we started | |
64 | the call to flush the pages. On systems that release the vnode lock | |
65 | while doing the page flush, the file's data version at the end of this | |
66 | procedure may be larger than the value we set mapDV to, but that's | |
67 | only conservative, since a new could have been created from the | |
68 | earlier version of the file. | |
69 | ||
70 | There are a few times that we must call osi_FlushPages. We should | |
71 | call it at the start of a read or open call, so that we raise mapDV to | |
72 | the current value, and get rid of any old data that might interfere | |
73 | with later reads. Raising mapDV to the current value is also | |
74 | important, since if we wrote data with mapDV < DataVersion, then a | |
75 | call to osi_FlushPages would discard this data if the pages were | |
76 | modified w/o having the file open for writing (e.g. using a mapped | |
77 | file). This is why we also call it in afs_map. We call it in | |
78 | afs_getattr, since afs_getattr is the only function guaranteed to be | |
79 | called between the time another client updates an executable, and the | |
80 | time that our own local client tries to exec this executable; if we | |
81 | fail to call osi_FlushPages here, we might use some pages from the | |
82 | previous version of the executable file. | |
83 | ||
84 | Also, note that we update mapDV after a store back to the server | |
85 | completes, if we're sure that no other versions were created during | |
86 | the file's storeback. The mapDV invariant (that no pages from earlier | |
87 | data versions exist in memory) remains true, since the only versions | |
88 | that existed between the old and new mapDV values all contained the | |
89 | same data. | |
90 | ||
91 | Finally, note a serious incompleteness in this system: we aren't | |
92 | really prepared to deal with mapped files correctly. In particular, | |
93 | there is no code to ensure that data stored in dirty VM pages ends up | |
94 | in a cache file, except as a side effect of the segmap_release call | |
95 | (on Sun 4s) that unmaps the data from the kernel map, and which, | |
96 | because of the SM_WRITE flag, also calls putpage synchronously to get | |
97 | rid of the data. | |
98 | ||
99 | This problem needs to be fixed for any system that uses mapped files | |
100 | seriously. Note that the NeXT port's generic write call uses mapped | |
101 | files, but that we've set a flag (close_flush) that ensures that all | |
102 | dirty pages get flushed after every write call. It is also something | |
103 | of a performance hit, since it would be better to write those pages to | |
104 | the cache asynchronously rather than after every write, as happens | |
105 | now. |