-
Notifications
You must be signed in to change notification settings - Fork 0
/
ZIPsFS.1
374 lines (371 loc) · 11.7 KB
/
ZIPsFS.1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
'\" t
.\" Automatically generated by Pandoc 2.17.1.1
.\"
.\" Define V font for inline verbatim, using C font in formats
.\" that render this, and otherwise B font.
.ie "\f[CB]x\f[]"x" \{\
. ftr V B
. ftr VI BI
. ftr VB B
. ftr VBI BI
.\}
.el \{\
. ftr V CR
. ftr VI CI
. ftr VB CB
. ftr VBI CBI
.\}
.TH "" "" "" "" ""
.hy
.PP
% ZIPsFS(1)
.SH NAME
.PP
\f[B]ZIPsFS\f[R] \[em] FUSE-based overlay union file system which
expands ZIP files
.SH SYNOPSIS
.PP
ZIPsFS [\f[I]ZIPsFS-options\f[R]] \f[I]path-of-root1\f[R]
\f[I]path-of-root2\f[R] \f[I]path-of-root3\f[R] :
[\f[I]fuse-options\f[R]] \f[I]mount-point\f[R]
.SS Example 1
.PP
ZIPsFS \[ti]/tmp/ZIPsFS/writable \[ti]/local/file/tree //computer1/pub
//computer2/pub : -f -o allow_other \[ti]/mnt/ZIPsFS
.SH DESCRIPTION
.SS Summary
.PP
ZIPsFS acts as a union or overlay file system.
It combines multiple file structures into one, resulting in a single
directory structure that contains underlying files and sub-directories
from the given sources.
Created or modified files are stored in the first file location.
The other file sources are read-only and files will never be modified.
ZIPsFS expands ZIP files as folders.
Normally, the folder name is formed from the ZIP file name by appending
\[lq].Contents/\[rq].
This can be changed with rules based on file name patterns.
Extensive configuration is possible without interrupting the file
system.
Specific features and performance tweaks meet our needs for storing
large data from mass spectrometry experiments.
.SS Configuration
.PP
The default behavior can be modified with rules based on file names in
\f[I]ZIPsFS_configuration.h\f[R] and \f[I]ZIPsFS_configuration.c\f[R].
For changes to take effect, re-compilation and restart is necessary.
Using the \f[I]-s symlink\f[R] option, the configuration can be changed
without interrupting the file system.
Ongoing computations with file access to the ZIPsFS are not affected.
.SS Union / overlay file system
.PP
ZIPsFS is a union or overlay file system.
Several file locations are combined to one.
All files in the source file trees (in above example command three
sources) can be accessed via the mount point (in the example
\f[I]~/mnt/ZIPsFS\f[R]).
When files are created or modified, they will be stored in the first
file tree (in the example \f[I]~/tmp/ZIPsFS/writable\f[R]).
If files exist in two locations, the left most source file system takes
precedence.
.PP
New files are created in the first file tree, while the following file
trees are not modified.
If an empty string \[lq]\[rq] or \[cq]\[cq] is given for the first
source, no writable source is used.
.SS ZIP files
.PP
Let \f[I]file.zip\f[R] be a ZIP file in any of the source file systems.
It will appear in the virtual file system together with a folder
\f[I]file.zip.Content\f[R].
Normally, the folder name is formed by appending
\[lq]\f[I].Content\f[R]\[rq] to the zip file name.
This can be changed in \f[I]ZIPsFS_configuration.c\f[R].
.PP
For example Sciex mass spectrometry software requires that the
containing files are shown directly in the file listing rather than in a
sub-folder.
.SS Cache
.PP
Optionally, ZIPsFS can read certain ZIP entries entirely into RAM and
provide the data from the RAM copy at higher speed.
This may improve performance for compressed ZIP entries that are read
from varying positions in the file, so-called file file-seek.
With the option \f[B]-l\f[R] an upper limit of memory consumption for
the ZIP RAM cache is specified.
.PP
Further caches aim at faster file listing of large directories.
.SS Logs
.PP
It is recommended to run ZIPsFS in the foreground with option
\f[I]-f\f[R] within a persistent terminal multiplexer like
\f[I]tmux\f[R].
.PP
Log files are found in \f[B]\[ti]/.ZIPsFS/\f[R].
.PP
An HTML file with status information is dynamically generated in the
generated folder \f[B]ZIPsFS\f[R] in the virtual ZIPsFS file system.
.SS Plugins - Auto-generation of virtual files
.PP
ZIPsFS can display virtual files which do not exist, but which are
generated automatically when used.
This feature must be activated in ZIPsFS_configuration.h.
The first file root is used to store the generated files.
.PP
A typical use-case are file conversions.
Auto-generated files are displayed in the virtual file tree in
\f[B]ZIPsFS/a/\f[R].
If they have not be used before, an estimated file size is reported as
the real file size is not yet known.
.PP
The currently included examples demonstrate this feature and can serve
as a templated for own settings.
.PP
For this purpose copy image or pdf files into one of the roots and visit
the respective folder in the virtual file system.
Prepend this folder with \f[B]ZIPsFS/a/\f[R] and you will see the
generated files:
.IP
.nf
\f[C]
mnt=<path of mountpoint>
mkdir $mnt/test
cp file.png $mnt/test/
ls -l $mnt/ZIPsFS/a/test/
\f[R]
.fi
.IP \[bu] 2
For image files (jpg, jpeg, png and gif), smaller versions of 25 % and
50 %
.IP \[bu] 2
For image files extracted text usign Optical Character Recognition
.IP \[bu] 2
For PDF files extracted ASCII text.
.IP \[bu] 2
For ZIP files the report of the consistency check including check-sums
.IP \[bu] 2
Decompression of .tsv.bz2 and .tsv.gz files
.IP \[bu] 2
Mass spectrometry files: They are converted to Mascot and msML.
For wiff files, the contained ASCII text is extracted.
.PP
When opening these files for the first time there will be some delay.
This is because the files need to be generated.
When accessed a second time, the data comes without delay, because the
file is already there.
Furthermore, the file size will be correct.
When the upstream file changes or the last-modified attribute is
updated, derived files will be generated again.
.SS Limitations - unknown file size
.PP
The system does not know the file size of not-yet-generated files.
This seems to be a common problem, see
https://fuse-devel.narkive.com/tkGi5trJ/trouble-with-samba-fuse-for-files-of-unknown-size.
Any help is appreciated.
.PP
Currently, ZIPsFS reports an upper limit of the expected file size which
is not really nice.
.PP
Why cannot it be done like in /proc files (e.g.\ /proc/
.RS
/\f[I]e\f[R]\f[I]n\f[R]\f[I]v\f[R]\f[I]i\f[R]\f[I]r\f[R]\f[I]o\f[R]\f[I]n\f[R])?\f[I]C\f[R]\f[I]a\f[R]\f[I]l\f[R]\f[I]l\f[R]\f[I]i\f[R]\f[I]n\f[R]\f[I]g\f[R]\f[I]s\f[R]\f[I]t\f[R]\f[I]a\f[R]\f[I]t\f[R]/\f[I]p\f[R]\f[I]r\f[R]\f[I]o\f[R]\f[I]c\f[R]/
.RE/environ reports file size zero.
If ZIPsFS returns zero, then the content of the files are not readable.
.SS Limitations - nested, recursive
.PP
Currently, nesting (recursion) is not yet supported.
A virtual file cannot be the basis for another virtual file.
.SS ZIPsFS_autogen_queue.sh
.PP
Some exotic Wine dependent Windows executables do not work well within
ZIPsFS.
As a work around, we developed the shell script
\f[B]ZIPsFS_autogen_queue.sh\f[R].
With each pass of an infinity loop one task is taken from a queue and
processed.
One file is converted at a time per script instance.
Several instances of this shell script can run in parallel.
In the settings, the symbol \f[B]PLACEHOLDER_EXTERNAL_QUEUE\f[R] is
given instead of an executable program.
.SS ZIPsFS Options
.PP
-h
.PP
Prints brief usage information.
.PP
-l \f[I]Maximum memory for caching ZIP-entries in the RAM\f[R]
.PP
Specifies a limit for the cache.
For example \f[I]-l 8G\f[R] would limit the size of the cache to 8
Gigabyte.
.PP
-c [NEVER,SEEK,RULE,COMPRESSED,ALWAYS]
.PP
Policy for ZIP entries cached in RAM.
.PP
.TS
tab(@);
cw(8.3n) lw(61.7n).
T{
NEVER
T}@T{
ZIP are never cached, even not in case of backward seek.
T}
T{
T}@T{
T}
T{
SEEK
T}@T{
ZIP entries are cached if the file position jumps backward.
This is the default
T}
T{
T}@T{
T}
T{
RULE
T}@T{
ZIP entries are cached according to rules in \f[B]configuration.c\f[R].
T}
T{
T}@T{
T}
T{
COMPRESSED
T}@T{
All compressed ZIP entries are cached.
T}
T{
T}@T{
T}
T{
ALWAYS
T}@T{
All ZIP entries are cached.
T}
T{
T}@T{
T}
.TE
.PP
-s \f[I]path-of-symbolic-link\f[R]
.PP
After initialization the specified symlink is created and points to the
mount point.
Previously existing links are overwritten.
This allows to restart ZIPsFS without affecting running programs which
access file in the virtual ZIPsFS file system.
For file paths in the virtual file system, the symlink is used rather
than the real mount-point.
Consider a running ZIPsFS instance which needs to be replaced by a newer
one.
The new ZIPsFS instance is started with a different mount point.
Both instances work simultaneously.
The symlink which used to point to the mount point of the old instance
is now pointing to that of the new one.
The old instance should be let running for an hour or so until no file
handle is open any more.
.PP
If the symlink is within an exported SAMBA or NFS path, it should be
relative.
This is best achieved by changing into the parent path where the symlink
will be created.
Then give just the name and not the entire path of the symlink.
In the /etc/samba/smb.conf give:
.PP
follow symlinks = yes
.SS Debug Options
.PP
See ZIPsFS.compile.sh for activation of sanitizers.
.PP
-T Checks the capability to print a backtrace.
This requires addr2line which is usually in /usr/bin/ of Linux and
FreeBSD.
For MacOSX, the tool atos is used.
.SS FUSE Options
.PP
-f
.PP
Run in foreground and display some logs at stdout.
This mode is useful inside tmux.
.PP
-s
.PP
Disable multi-threaded operation to rescue ZIPsFS in case of threading
related bugs.
.PP
-o \f[I]comma separated Options\f[R]
.PP
-o allow_other
.PP
Other users can read the files
.SS Fault management
.PP
When source file structures are stored remotely, there is a risk that
they may be temporarily unavailable.
Overlay file systems typically freeze when calls to the file API block.
Conversely, ZIPsFS should continue to operate with the remaining file
roots.
This is implemented as follows: Paths starting with double slash (in the
example \f[I]//computer1/pub\f[R]) are regarded as remote paths and
treated specially.
ZIPsFS will periodically check file systems starting with a double
slash.
If the last responds was too long ago then the respective file system is
skipped.
Furthermore the stat() function to obtain the attributes for a file are
queued to be performed in extra threads.
.PP
For files which are located in ZIP files and which are first loaded
entirely into RAM, the system is also robust for interruptions and
blocks during loading.
The system will not freeze.
After some longer time it will try to load the same file from another
root or return ENOENT.
.PP
If loading of ZIP files fail, loading will be repeated after 1s.
.PP
For ZIP entries loaded entirely into the RAM, the CRC sum is validated
and possible errors are logged.
.SH FILES
.IP \[bu] 2
ZIPsFS_configuration.h and ZIPsFS_configuration.c and
ZIPsFS_configuration_autogen.c: Customizable rules.
Modification requires recompilation.
.IP \[bu] 2
\[ti]/.ZIPsFS: Contains the log file and cache and the folder a.
The later holds auto-generated files.
.SH LIMITATIONS
.SS Hard-links
.PP
Hard-links are not implemented, while symlinks work.
.SS Deleting files
.PP
Files can only be deleted when their physical location is in the first
source.
Conversely, in the FUSE file systems unionfs-fuse and fuse-overlayfs,
files can be always deleted irrespectively of their physical location.
They are canceled out without actually deleting them from their physical
location.
If you need the same behaviour please drop a request-for-feature.
.SH BUGS
.PP
Current status: Testing and Bug fixing If ZIPsFS crashes, please send
the stack-trace together with the version number.
.SH AUTHOR
.PP
Christoph Gille
.SH SEE ALSO
.IP \[bu] 2
https://github.com/openscopeproject/ZipROFS
.IP \[bu] 2
https://github.com/google/fuse-archive
.IP \[bu] 2
https://bitbucket.org/agalanin/fuse-zip/src
.IP \[bu] 2
https://github.com/google/mount-zip
.IP \[bu] 2
https://github.com/cybernoid/archivemount
.IP \[bu] 2
https://github.com/mxmlnkn/ratarmount