123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368 |
- .TH FILE __CSECTION__ "Copyright but distributable"
- .\" $Id: file.man,v 1.31 1998/02/15 23:18:53 christos Exp $
- .SH NAME
- file
- \- determine file type
- .SH SYNOPSIS
- .B file
- [
- .B \-vbczL
- ]
- [
- .B \-f
- namefile ]
- [
- .B \-m
- magicfiles ]
- file ...
- .SH DESCRIPTION
- This manual page documents version __VERSION__ of the
- .B file
- command.
- .B File
- tests each argument in an attempt to classify it.
- There are three sets of tests, performed in this order:
- filesystem tests, magic number tests, and language tests.
- The
- .I first
- test that succeeds causes the file type to be printed.
- .PP
- The type printed will usually contain one of the words
- .B text
- (the file contains only
- .SM ASCII
- characters and is probably safe to read on an
- .SM ASCII
- terminal),
- .B executable
- (the file contains the result of compiling a program
- in a form understandable to some \s-1UNIX\s0 kernel or another),
- or
- .B data
- meaning anything else (data is usually `binary' or non-printable).
- Exceptions are well-known file formats (core files, tar archives)
- that are known to contain binary data.
- When modifying the file
- .I __MAGIC__
- or the program itself,
- .B "preserve these keywords" .
- People depend on knowing that all the readable files in a directory
- have the word ``text'' printed.
- Don't do as Berkeley did \- change ``shell commands text''
- to ``shell script''.
- .PP
- The filesystem tests are based on examining the return from a
- .BR stat (2)
- system call.
- The program checks to see if the file is empty,
- or if it's some sort of special file.
- Any known file types appropriate to the system you are running on
- (sockets, symbolic links, or named pipes (FIFOs) on those systems that
- implement them)
- are intuited if they are defined in
- the system header file
- .IR sys/stat.h .
- .PP
- The magic number tests are used to check for files with data in
- particular fixed formats.
- The canonical example of this is a binary executable (compiled program)
- .I a.out
- file, whose format is defined in
- .I a.out.h
- and possibly
- .I exec.h
- in the standard include directory.
- These files have a `magic number' stored in a particular place
- near the beginning of the file that tells the \s-1UNIX\s0 operating system
- that the file is a binary executable, and which of several types thereof.
- The concept of `magic number' has been applied by extension to data files.
- Any file with some invariant identifier at a small fixed
- offset into the file can usually be described in this way.
- The information in these files is read from the magic file
- .I __MAGIC__.
- .PP
- If an argument appears to be an
- .SM ASCII
- file,
- .B file
- attempts to guess its language.
- The language tests look for particular strings (cf
- .IR names.h )
- that can appear anywhere in the first few blocks of a file.
- For example, the keyword
- .B .br
- indicates that the file is most likely a
- .BR troff (1)
- input file, just as the keyword
- .B struct
- indicates a C program.
- These tests are less reliable than the previous
- two groups, so they are performed last.
- The language test routines also test for some miscellany
- (such as
- .BR tar (1)
- archives) and determine whether an unknown file should be
- labelled as `ascii text' or `data'.
- .SH OPTIONS
- .TP 8
- .B \-v
- Print the version of the program and exit.
- .TP 8
- .B \-m list
- Specify an alternate list of files containing magic numbers.
- This can be a single file, or a colon-separated list of files.
- .TP 8
- .B \-z
- Try to look inside compressed files.
- .TP 8
- .B \-b
- Do not prepend filenames to output lines (brief mode).
- .TP 8
- .B \-c
- Cause a checking printout of the parsed form of the magic file.
- This is usually used in conjunction with
- .B \-m
- to debug a new magic file before installing it.
- .TP 8
- .B \-f namefile
- Read the names of the files to be examined from
- .I namefile
- (one per line)
- before the argument list.
- Either
- .I namefile
- or at least one filename argument must be present;
- to test the standard input, use ``-'' as a filename argument.
- .TP 8
- .B \-L
- option causes symlinks to be followed, as the like-named option in
- .BR ls (1).
- (on systems that support symbolic links).
- .SH FILES
- .I __MAGIC__
- \- default list of magic numbers
- .SH ENVIRONMENT
- The environment variable
- .B MAGIC
- can be used to set the default magic number files.
- .SH SEE ALSO
- .BR magic (__FSECTION__)
- \- description of magic file format.
- .br
- .BR strings (1), " od" (1), " hexdump(1)"
- \- tools for examining non-textfiles.
- .SH STANDARDS CONFORMANCE
- This program is believed to exceed the System V Interface Definition
- of FILE(CMD), as near as one can determine from the vague language
- contained therein.
- Its behaviour is mostly compatible with the System V program of the same name.
- This version knows more magic, however, so it will produce
- different (albeit more accurate) output in many cases.
- .PP
- The one significant difference
- between this version and System V
- is that this version treats any white space
- as a delimiter, so that spaces in pattern strings must be escaped.
- For example,
- .br
- >10 string language impress\ (imPRESS data)
- .br
- in an existing magic file would have to be changed to
- .br
- >10 string language\e impress (imPRESS data)
- .br
- In addition, in this version, if a pattern string contains a backslash,
- it must be escaped. For example
- .br
- 0 string \ebegindata Andrew Toolkit document
- .br
- in an existing magic file would have to be changed to
- .br
- 0 string \e\ebegindata Andrew Toolkit document
- .br
- .PP
- SunOS releases 3.2 and later from Sun Microsystems include a
- .BR file (1)
- command derived from the System V one, but with some extensions.
- My version differs from Sun's only in minor ways.
- It includes the extension of the `&' operator, used as,
- for example,
- .br
- >16 long&0x7fffffff >0 not stripped
- .SH MAGIC DIRECTORY
- The magic file entries have been collected from various sources,
- mainly USENET, and contributed by various authors.
- Christos Zoulas (address below) will collect additional
- or corrected magic file entries.
- A consolidation of magic file entries
- will be distributed periodically.
- .PP
- The order of entries in the magic file is significant.
- Depending on what system you are using, the order that
- they are put together may be incorrect.
- If your old
- .B file
- command uses a magic file,
- keep the old magic file around for comparison purposes
- (rename it to
- .IR __MAGIC__.orig ).
- .SH HISTORY
- There has been a
- .B file
- command in every \s-1UNIX\s0 since at least Research Version 6
- (man page dated January, 1975).
- The System V version introduced one significant major change:
- the external list of magic number types.
- This slowed the program down slightly but made it a lot more flexible.
- .PP
- This program, based on the System V version,
- was written by Ian Darwin without looking at anybody else's source code.
- .PP
- John Gilmore revised the code extensively, making it better than
- the first version.
- Geoff Collyer found several inadequacies
- and provided some magic file entries.
- The program has undergone continued evolution since.
- .SH AUTHOR
- Written by Ian F. Darwin, UUCP address {utzoo | ihnp4}!darwin!ian,
- Internet address ian@sq.com,
- postal address: P.O. Box 603, Station F, Toronto, Ontario, CANADA M4Y 2L8.
- .PP
- Altered by Rob McMahon, cudcv@warwick.ac.uk, 1989, to extend the `&' operator
- from simple `x&y != 0' to `x&y op z'.
- .PP
- Altered by Guy Harris, guy@netapp.com, 1993, to:
- .RS
- .PP
- put the ``old-style'' `&'
- operator back the way it was, because 1) Rob McMahon's change broke the
- previous style of usage, 2) the SunOS ``new-style'' `&' operator,
- which this version of
- .B file
- supports, also handles `x&y op z', and 3) Rob's change wasn't documented
- in any case;
- .PP
- put in multiple levels of `>';
- .PP
- put in ``beshort'', ``leshort'', etc. keywords to look at numbers in the
- file in a specific byte order, rather than in the native byte order of
- the process running
- .BR file .
- .RE
- .PP
- Changes by Ian Darwin and various authors including
- Christos Zoulas (christos@astron.com), 1990-1997.
- .SH LEGAL NOTICE
- Copyright (c) Ian F. Darwin, Toronto, Canada,
- 1986, 1987, 1988, 1989, 1990, 1991, 1992, 1993.
- .PP
- This software is not subject to and may not be made subject to any
- license of the American Telephone and Telegraph Company, Sun
- Microsystems Inc., Digital Equipment Inc., Lotus Development Inc., the
- Regents of the University of California, The X Consortium or MIT, or
- The Free Software Foundation.
- .PP
- This software is not subject to any export provision of the United States
- Department of Commerce, and may be exported to any country or planet.
- .PP
- Permission is granted to anyone to use this software for any purpose on
- any computer system, and to alter it and redistribute it freely, subject
- to the following restrictions:
- .PP
- 1. The author is not responsible for the consequences of use of this
- software, no matter how awful, even if they arise from flaws in it.
- .PP
- 2. The origin of this software must not be misrepresented, either by
- explicit claim or by omission. Since few users ever read sources,
- credits must appear in the documentation.
- .PP
- 3. Altered versions must be plainly marked as such, and must not be
- misrepresented as being the original software. Since few users
- ever read sources, credits must appear in the documentation.
- .PP
- 4. This notice may not be removed or altered.
- .PP
- A few support files (\fIgetopt\fP, \fIstrtok\fP)
- distributed with this package
- are by Henry Spencer and are subject to the same terms as above.
- .PP
- A few simple support files (\fIstrtol\fP, \fIstrchr\fP)
- distributed with this package
- are in the public domain; they are so marked.
- .PP
- The files
- .I tar.h
- and
- .I is_tar.c
- were written by John Gilmore from his public-domain
- .B tar
- program, and are not covered by the above restrictions.
- .SH BUGS
- There must be a better way to automate the construction of the Magic
- file from all the glop in Magdir. What is it?
- Better yet, the magic file should be compiled into binary (say,
- .BR ndbm (3)
- or, better yet, fixed-length
- .SM ASCII
- strings for use in heterogenous network environments) for faster startup.
- Then the program would run as fast as the Version 7 program of the same name,
- with the flexibility of the System V version.
- .PP
- .B File
- uses several algorithms that favor speed over accuracy,
- thus it can be misled about the contents of
- .SM ASCII
- files.
- .PP
- The support for
- .SM ASCII
- files (primarily for programming languages)
- is simplistic, inefficient and requires recompilation to update.
- .PP
- There should be an ``else'' clause to follow a series of continuation lines.
- .PP
- The magic file and keywords should have regular expression support.
- Their use of
- .SM "ASCII TAB"
- as a field delimiter is ugly and makes
- it hard to edit the files, but is entrenched.
- .PP
- It might be advisable to allow upper-case letters in keywords
- for e.g.,
- .BR troff (1)
- commands vs man page macros.
- Regular expression support would make this easy.
- .PP
- The program doesn't grok \s-2FORTRAN\s0.
- It should be able to figure \s-2FORTRAN\s0 by seeing some keywords which
- appear indented at the start of line.
- Regular expression support would make this easy.
- .PP
- The list of keywords in
- .I ascmagic
- probably belongs in the Magic file.
- This could be done by using some keyword like `*' for the offset value.
- .PP
- Another optimisation would be to sort
- the magic file so that we can just run down all the
- tests for the first byte, first word, first long, etc, once we
- have fetched it. Complain about conflicts in the magic file entries.
- Make a rule that the magic entries sort based on file offset rather
- than position within the magic file?
- .PP
- The program should provide a way to give an estimate
- of ``how good'' a guess is.
- We end up removing guesses (e.g. ``From '' as first 5 chars of file) because
- they are not as good as other guesses (e.g. ``Newsgroups:'' versus
- "Return-Path:"). Still, if the others don't pan out, it should be
- possible to use the first guess.
- .PP
- This program is slower than some vendors' file commands.
- .PP
- This manual page, and particularly this section, is too long.
- .SH AVAILABILITY
- You can obtain the original author's latest version by anonymous FTP
- on
- .B ftp.astron.com
- in the directory
- .I /pub/file/file-X.YY.tar.gz
|