123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549 |
- .\" $File: file.man,v 1.79 2008/11/06 22:49:08 rrt Exp $
- .Dd October 9, 2008
- .Dt FILE __CSECTION__
- .Os
- .Sh NAME
- .Nm file
- .Nd determine file type
- .Sh SYNOPSIS
- .Nm
- .Op Fl bchikLnNprsvz
- .Op Fl -mime-type
- .Op Fl -mime-encoding
- .Op Fl f Ar namefile
- .Op Fl F Ar separator
- .Op Fl m Ar magicfiles
- .Ar file
- .Nm
- .Fl C
- .Op Fl m Ar magicfile
- .Nm
- .Op Fl -help
- .Sh DESCRIPTION
- This manual page documents version __VERSION__ of the
- .Nm
- command.
- .Pp
- .Nm
- tests each argument in an attempt to classify it.
- There are three sets of tests, performed in this order:
- filesystem tests, magic tests, and language tests.
- The
- .Em first
- test that succeeds causes the file type to be printed.
- .Pp
- The type printed will usually contain one of the words
- .Em text
- (the file contains only
- printing characters and a few common control
- characters and is probably safe to read on an
- .Dv ASCII
- terminal),
- .Em executable
- (the file contains the result of compiling a program
- in a form understandable to some
- .Dv UNIX
- kernel or another),
- or
- .Em data
- meaning anything else (data is usually
- .Sq binary
- or non-printable).
- Exceptions are well-known file formats (core files, tar archives)
- that are known to contain binary data.
- When modifying magic files or the program itself, make sure to
- .Em "preserve these keywords" .
- Users depend on knowing that all the readable files in a directory
- have the word
- .Sq text
- printed.
- Don't do as Berkeley did and change
- .Sq shell commands text
- to
- .Sq shell script .
- .Pp
- The filesystem tests are based on examining the return from a
- .Xr stat 2
- system call.
- The program checks to see if the file is empty,
- or if it's some sort of special file.
- Any known file types appropriate to the system you are running on
- (sockets, symbolic links, or named pipes (FIFOs) on those systems that
- implement them)
- are intuited if they are defined in
- the system header file
- .In sys/stat.h .
- .Pp
- The magic tests are used to check for files with data in
- particular fixed formats.
- The canonical example of this is a binary executable (compiled program)
- .Dv a.out
- file, whose format is defined in
- .In elf.h ,
- .In a.out.h
- and possibly
- .In exec.h
- in the standard include directory.
- These files have a
- .Sq "magic number"
- stored in a particular place
- near the beginning of the file that tells the
- .Dv UNIX operating system
- that the file is a binary executable, and which of several types thereof.
- The concept of a
- .Sq "magic"
- has been applied by extension to data files.
- Any file with some invariant identifier at a small fixed
- offset into the file can usually be described in this way.
- The information identifying these files is read from the compiled
- magic file
- .Pa __MAGIC__.mgc ,
- or the files in the directory
- .Pa __MAGIC__
- if the compiled file does not exist. In addition, if
- .Pa $HOME/.magic.mgc
- or
- .Pa $HOME/.magic
- exists, it will be used in preference to the system magic files.
- .Pp
- If a file does not match any of the entries in the magic file,
- it is examined to see if it seems to be a text file.
- ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
- (such as those used on Macintosh and IBM PC systems),
- UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
- character sets can be distinguished by the different
- ranges and sequences of bytes that constitute printable text
- in each set.
- If a file passes any of these tests, its character set is reported.
- ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
- as
- .Sq text
- because they will be mostly readable on nearly any terminal;
- UTF-16 and EBCDIC are only
- .Sq character data
- because, while
- they contain text, it is text that will require translation
- before it can be read.
- In addition,
- .Nm
- will attempt to determine other characteristics of text-type files.
- If the lines of a file are terminated by CR, CRLF, or NEL, instead
- of the Unix-standard LF, this will be reported.
- Files that contain embedded escape sequences or overstriking
- will also be identified.
- .Pp
- Once
- .Nm
- has determined the character set used in a text-type file,
- it will
- attempt to determine in what language the file is written.
- The language tests look for particular strings (cf.
- .In names.h
- ) that can appear anywhere in the first few blocks of a file.
- For example, the keyword
- .Em .br
- indicates that the file is most likely a
- .Xr troff 1
- input file, just as the keyword
- .Em struct
- indicates a C program.
- These tests are less reliable than the previous
- two groups, so they are performed last.
- The language test routines also test for some miscellany
- (such as
- .Xr tar 1
- archives).
- .Pp
- Any file that cannot be identified as having been written
- in any of the character sets listed above is simply said to be
- .Sq data .
- .Sh OPTIONS
- .Bl -tag -width indent
- .It Fl b , -brief
- Do not prepend filenames to output lines (brief mode).
- .It Fl c , -checking-printout
- Cause a checking printout of the parsed form of the magic file.
- This is usually used in conjunction with the
- .Fl m
- flag to debug a new magic file before installing it.
- .It Fl C , -compile
- Write a
- .Pa magic.mgc
- output file that contains a pre-parsed version of the magic file or directory.
- .It Fl e , -exclude Ar testname
- Exclude the test named in
- .Ar testname
- from the list of tests made to determine the file type. Valid test names
- are:
- .Bl -tag -width
- .It apptype
- .Dv EMX
- application type (only on EMX).
- .It text
- Various types of text files (this test will try to guess the text encoding, irrespective of the setting of the
- .Sq encoding
- option).
- .It encoding
- Different text encodings for soft magic tests.
- .It tokens
- Looks for known tokens inside text files.
- .It cdf
- Prints details of Compound Document Files.
- .It compress
- Checks for, and looks inside, compressed files.
- .It elf
- Prints ELF file details.
- .It soft
- Consults magic files.
- .It tar
- Examines tar files.
- .El
- .It Fl f , -files-from Ar namefile
- Read the names of the files to be examined from
- .Ar namefile
- (one per line)
- before the argument list.
- Either
- .Ar namefile
- or at least one filename argument must be present;
- to test the standard input, use
- .Sq -
- as a filename argument.
- .It Fl F , -separator Ar separator
- Use the specified string as the separator between the filename and the
- file result returned. Defaults to
- .Sq \&: .
- .It Fl h , -no-dereference
- option causes symlinks not to be followed
- (on systems that support symbolic links). This is the default if the
- environment variable
- .Dv POSIXLY_CORRECT
- is not defined.
- .It Fl i , -mime
- Causes the file command to output mime type strings rather than the more
- traditional human readable ones. Thus it may say
- .Sq text/plain; charset=us-ascii
- rather than
- .Sq ASCII text .
- In order for this option to work, file changes the way
- it handles files recognized by the command itself (such as many of the
- text file types, directories etc), and makes use of an alternative
- .Sq magic
- file.
- (See the FILES section, below).
- .It Fl -mime-type , -mime-encoding
- Like
- .Fl i ,
- but print only the specified element(s).
- .It Fl k , -keep-going
- Don't stop at the first match, keep going. Subsequent matches will be
- have the string
- .Sq "\[rs]012\- "
- prepended.
- (If you want a newline, see the
- .Sq "\-r"
- option.)
- .It Fl L , -dereference
- option causes symlinks to be followed, as the like-named option in
- .Xr ls 1
- (on systems that support symbolic links).
- This is the default if the environment variable
- .Dv POSIXLY_CORRECT
- is defined.
- .It Fl m , -magic-file Ar list
- Specify an alternate list of files and directories containing magic.
- This can be a single item, or a colon-separated list.
- If a compiled magic file is found alongside a file or directory, it will be used instead.
- .It Fl n , -no-buffer
- Force stdout to be flushed after checking each file.
- This is only useful if checking a list of files.
- It is intended to be used by programs that want filetype output from a pipe.
- .It Fl N , -no-pad
- Don't pad filenames so that they align in the output.
- .It Fl p , -preserve-date
- On systems that support
- .Xr utime 2
- or
- .Xr utimes 2 ,
- attempt to preserve the access time of files analyzed, to pretend that
- .Nm
- never read them.
- .It Fl r , -raw
- Don't translate unprintable characters to \eooo.
- Normally
- .Nm
- translates unprintable characters to their octal representation.
- .It Fl s , -special-files
- Normally,
- .Nm
- only attempts to read and determine the type of argument files which
- .Xr stat 2
- reports are ordinary files.
- This prevents problems, because reading special files may have peculiar
- consequences.
- Specifying the
- .Fl s
- option causes
- .Nm
- to also read argument files which are block or character special files.
- This is useful for determining the filesystem types of the data in raw
- disk partitions, which are block special files.
- This option also causes
- .Nm
- to disregard the file size as reported by
- .Xr stat 2
- since on some systems it reports a zero size for raw disk partitions.
- .It Fl v , -version
- Print the version of the program and exit.
- .It Fl z , -uncompress
- Try to look inside compressed files.
- .It Fl 0 , -print0
- Output a null character
- .Sq \e0
- after the end of the filename. Nice to
- .Xr cut 1
- the output. This does not affect the separator which is still printed.
- .It Fl -help
- Print a help message and exit.
- .El
- .Sh FILES
- .Bl -tag -width __MAGIC__.mgc -compact
- .It Pa __MAGIC__.mgc
- Default compiled list of magic.
- .It Pa __MAGIC__
- Directory containing default magic files.
- .El
- .Sh ENVIRONMENT
- The environment variable
- .Dv MAGIC
- can be used to set the default magic file name.
- If that variable is set, then
- .Nm
- will not attempt to open
- .Pa $HOME/.magic .
- .Nm
- adds
- .Sq .mgc
- to the value of this variable as appropriate.
- The environment variable
- .Dv POSIXLY_CORRECT
- controls (on systems that support symbolic links), whether
- .Nm
- will attempt to follow symlinks or not. If set, then
- .Nm
- follows symlink, otherwise it does not. This is also controlled
- by the
- .Fl L
- and
- .Fl h
- options.
- .Sh SEE ALSO
- .Xr magic __FSECTION__ ,
- .Xr strings 1 ,
- .Xr od 1 ,
- .Xr hexdump 1,
- .Xr file 1posix
- .Sh STANDARDS CONFORMANCE
- This program is believed to exceed the System V Interface Definition
- of FILE(CMD), as near as one can determine from the vague language
- contained therein.
- Its behavior is mostly compatible with the System V program of the same name.
- This version knows more magic, however, so it will produce
- different (albeit more accurate) output in many cases.
- .\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
- .Pp
- The one significant difference
- between this version and System V
- is that this version treats any white space
- as a delimiter, so that spaces in pattern strings must be escaped.
- For example,
- .Bd -literal -offset indent
- >10 string language impress\ (imPRESS data)
- .Ed
- .Pp
- in an existing magic file would have to be changed to
- .Bd -literal -offset indent
- >10 string language\e impress (imPRESS data)
- .Ed
- .Pp
- In addition, in this version, if a pattern string contains a backslash,
- it must be escaped.
- For example
- .Bd -literal -offset indent
- 0 string \ebegindata Andrew Toolkit document
- .Ed
- .Pp
- in an existing magic file would have to be changed to
- .Bd -literal -offset indent
- 0 string \e\ebegindata Andrew Toolkit document
- .Ed
- .Pp
- SunOS releases 3.2 and later from Sun Microsystems include a
- .Nm
- command derived from the System V one, but with some extensions.
- My version differs from Sun's only in minor ways.
- It includes the extension of the
- .Sq &
- operator, used as,
- for example,
- .Bd -literal -offset indent
- >16 long&0x7fffffff >0 not stripped
- .Ed
- .Sh MAGIC DIRECTORY
- The magic file entries have been collected from various sources,
- mainly USENET, and contributed by various authors.
- Christos Zoulas (address below) will collect additional
- or corrected magic file entries.
- A consolidation of magic file entries
- will be distributed periodically.
- .Pp
- The order of entries in the magic file is significant.
- Depending on what system you are using, the order that
- they are put together may be incorrect.
- If your old
- .Nm
- command uses a magic file,
- keep the old magic file around for comparison purposes
- (rename it to
- .Pa __MAGIC__.orig ).
- .Sh EXAMPLES
- .Bd -literal -offset indent
- $ file file.c file /dev/{wd0a,hda}
- file.c: C program text
- file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
- dynamically linked (uses shared libs), stripped
- /dev/wd0a: block special (0/0)
- /dev/hda: block special (3/0)
- $ file -s /dev/wd0{b,d}
- /dev/wd0b: data
- /dev/wd0d: x86 boot sector
- $ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
- /dev/hda: x86 boot sector
- /dev/hda1: Linux/i386 ext2 filesystem
- /dev/hda2: x86 boot sector
- /dev/hda3: x86 boot sector, extended partition table
- /dev/hda4: Linux/i386 ext2 filesystem
- /dev/hda5: Linux/i386 swap file
- /dev/hda6: Linux/i386 swap file
- /dev/hda7: Linux/i386 swap file
- /dev/hda8: Linux/i386 swap file
- /dev/hda9: empty
- /dev/hda10: empty
- $ file -i file.c file /dev/{wd0a,hda}
- file.c: text/x-c
- file: application/x-executable
- /dev/hda: application/x-not-regular-file
- /dev/wd0a: application/x-not-regular-file
- .Ed
- .Sh HISTORY
- There has been a
- .Nm
- command in every
- .Dv UNIX since at least Research Version 4
- (man page dated November, 1973).
- The System V version introduced one significant major change:
- the external list of magic types.
- This slowed the program down slightly but made it a lot more flexible.
- .Pp
- This program, based on the System V version,
- was written by Ian Darwin <ian@darwinsys.com>
- without looking at anybody else's source code.
- .Pp
- John Gilmore revised the code extensively, making it better than
- the first version.
- Geoff Collyer found several inadequacies
- and provided some magic file entries.
- Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989.
- .Pp
- Guy Harris, guy@netapp.com, made many changes from 1993 to the present.
- .Pp
- Primary development and maintenance from 1990 to the present by
- Christos Zoulas (christos@astron.com).
- .Pp
- Altered by Chris Lowth, chris@lowth.com, 2000:
- Handle the
- .Fl i
- option to output mime type strings, using an alternative
- magic file and internal logic.
- .Pp
- Altered by Eric Fischer (enf@pobox.com), July, 2000,
- to identify character codes and attempt to identify the languages
- of non-ASCII files.
- .Pp
- Altered by Reuben Thomas (rrt@sc3d.org), 2007 to 2008, to improve MIME
- support and merge MIME and non-MIME magic, support directories as well
- as files of magic, apply many bug fixes and improve the build system.
- .Pp
- The list of contributors to the
- .Sq magic
- directory (magic files)
- is too long to include here.
- You know who you are; thank you.
- Many contributors are listed in the source files.
- .Sh LEGAL NOTICE
- Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
- Covered by the standard Berkeley Software Distribution copyright; see the file
- LEGAL.NOTICE in the source distribution.
- .Pp
- The files
- .Dv tar.h
- and
- .Dv is_tar.c
- were written by John Gilmore from his public-domain
- .Xr tar 1
- program, and are not covered by the above license.
- .Sh BUGS
- .Pp
- There must be a better way to automate the construction of the Magic
- file from all the glop in Magdir.
- What is it?
- .Pp
- .Nm
- uses several algorithms that favor speed over accuracy,
- thus it can be misled about the contents of
- text
- files.
- .Pp
- The support for text files (primarily for programming languages)
- is simplistic, inefficient and requires recompilation to update.
- .Pp
- The list of keywords in
- .Dv ascmagic
- probably belongs in the Magic file.
- This could be done by using some keyword like
- .Sq *
- for the offset value.
- .Pp
- Complain about conflicts in the magic file entries.
- Make a rule that the magic entries sort based on file offset rather
- than position within the magic file?
- .Pp
- The program should provide a way to give an estimate
- of
- .Sq how good
- a guess is.
- We end up removing guesses (e.g.
- .Sq From\
- as first 5 chars of file) because
- they are not as good as other guesses (e.g.
- .Sq Newsgroups:
- versus
- .Sq Return-Path:
- ).
- Still, if the others don't pan out, it should be possible to use the
- first guess.
- .Pp
- This manual page, and particularly this section, is too long.
- .Sh RETURN CODE
- .Nm
- returns 0 on success, and non-zero on error.
- .Sh AVAILABILITY
- You can obtain the original author's latest version by anonymous FTP
- on
- .Dv ftp.astron.com
- in the directory
- .Dv /pub/file/file-X.YZ.tar.gz
|