file.man 17 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565566567568569570571572573574575576577578579580581582583584585586587588589590591592593594595596597598599600601602603604605606607608609
  1. .\" $File: file.man,v 1.65 2007/01/25 21:05:46 christos Exp $
  2. .Dd January 8, 2007
  3. .Dt FILE __CSECTION__
  4. .Os
  5. .Sh NAME
  6. .Nm file
  7. .Nd determine file type
  8. .Sh SYNOPSIS
  9. .Nm
  10. .Op Fl bchikLnNprsvz
  11. .Op Fl f Ar namefile
  12. .Op Fl F Ar separator
  13. .Op Fl m Ar magicfiles
  14. .Ar file
  15. .Nm
  16. .Fl C
  17. .Op Fl m Ar magicfile
  18. .Sh DESCRIPTION
  19. This manual page documents version __VERSION__ of the
  20. .Nm
  21. command.
  22. .Pp
  23. .Nm
  24. tests each argument in an attempt to classify it.
  25. There are three sets of tests, performed in this order:
  26. filesystem tests, magic number tests, and language tests.
  27. The
  28. .Em first
  29. test that succeeds causes the file type to be printed.
  30. .Pp
  31. The type printed will usually contain one of the words
  32. .Em text
  33. (the file contains only
  34. printing characters and a few common control
  35. characters and is probably safe to read on an
  36. .Dv ASCII
  37. terminal),
  38. .Em executable
  39. (the file contains the result of compiling a program
  40. in a form understandable to some
  41. .Dv UNIX
  42. kernel or another),
  43. or
  44. .Em data
  45. meaning anything else (data is usually
  46. .Sq binary
  47. or non-printable).
  48. Exceptions are well-known file formats (core files, tar archives)
  49. that are known to contain binary data.
  50. When adding local definitions to
  51. .Pa /etc/magic ,
  52. .Em "preserve these keywords" .
  53. People depend on knowing that all the readable files in a directory
  54. have the word
  55. .Dq text
  56. printed.
  57. Don't do as Berkeley did and change
  58. .Dq shell commands text
  59. to
  60. .Dq shell script .
  61. Note that the file
  62. .Pa __MAGIC__
  63. is built mechanically from a large number of small files in
  64. the subdirectory
  65. .Pa Magdir
  66. in the source distribution of this program.
  67. .Pp
  68. The filesystem tests are based on examining the return from a
  69. .Xr stat 2
  70. system call.
  71. The program checks to see if the file is empty,
  72. or if it's some sort of special file.
  73. Any known file types appropriate to the system you are running on
  74. (sockets, symbolic links, or named pipes (FIFOs) on those systems that
  75. implement them)
  76. are intuited if they are defined in
  77. the system header file
  78. .In sys/stat.h .
  79. .Pp
  80. The magic number tests are used to check for files with data in
  81. particular fixed formats.
  82. The canonical example of this is a binary executable (compiled program)
  83. .Dv a.out
  84. file, whose format is defined in
  85. .In elf.h ,
  86. .In a.out.h
  87. and possibly
  88. .In exec.h
  89. in the standard include directory.
  90. These files have a
  91. .Sq "magic number"
  92. stored in a particular place
  93. near the beginning of the file that tells the
  94. .Dv UNIX operating system
  95. that the file is a binary executable, and which of several types thereof.
  96. The concept of a
  97. .Sq "magic number"
  98. has been applied by extension to data files.
  99. Any file with some invariant identifier at a small fixed
  100. offset into the file can usually be described in this way.
  101. The information identifying these files is read from
  102. .Pa /etc/magic
  103. and the compiled
  104. magic file
  105. .Pa __MAGIC__.mgc ,
  106. or
  107. .Pa __MAGIC__
  108. if the compiled file does not exist. In addition
  109. .Nm
  110. will look in
  111. .Pa $HOME/.magic.mgc ,
  112. or
  113. .Pa $HOME/.magic
  114. for magic entries.
  115. .Pp
  116. If a file does not match any of the entries in the magic file,
  117. it is examined to see if it seems to be a text file.
  118. ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
  119. (such as those used on Macintosh and IBM PC systems),
  120. UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
  121. character sets can be distinguished by the different
  122. ranges and sequences of bytes that constitute printable text
  123. in each set.
  124. If a file passes any of these tests, its character set is reported.
  125. ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
  126. as
  127. .Dq text
  128. because they will be mostly readable on nearly any terminal;
  129. UTF-16 and EBCDIC are only
  130. .Dq character data
  131. because, while
  132. they contain text, it is text that will require translation
  133. before it can be read.
  134. In addition,
  135. .Nm
  136. will attempt to determine other characteristics of text-type files.
  137. If the lines of a file are terminated by CR, CRLF, or NEL, instead
  138. of the Unix-standard LF, this will be reported.
  139. Files that contain embedded escape sequences or overstriking
  140. will also be identified.
  141. .Pp
  142. Once
  143. .Nm
  144. has determined the character set used in a text-type file,
  145. it will
  146. attempt to determine in what language the file is written.
  147. The language tests look for particular strings (cf
  148. .In names.h
  149. that can appear anywhere in the first few blocks of a file.
  150. For example, the keyword
  151. .Em .br
  152. indicates that the file is most likely a
  153. .Xr troff 1
  154. input file, just as the keyword
  155. .Em struct
  156. indicates a C program.
  157. These tests are less reliable than the previous
  158. two groups, so they are performed last.
  159. The language test routines also test for some miscellany
  160. (such as
  161. .Xr tar 1
  162. archives).
  163. .Pp
  164. Any file that cannot be identified as having been written
  165. in any of the character sets listed above is simply said to be
  166. .Dq data .
  167. .Sh OPTIONS
  168. .Bl -tag -width indent
  169. .It Fl b , -brief
  170. Do not prepend filenames to output lines (brief mode).
  171. .It Fl c , -checking-printout
  172. Cause a checking printout of the parsed form of the magic file.
  173. This is usually used in conjunction with the
  174. .Fl m
  175. flag to debug a new magic file before installing it.
  176. .It Fl C , -compile
  177. Write a
  178. .Pa magic.mgc
  179. output file that contains a pre-parsed version of the magic file.
  180. .It Fl e , -exclude Ar testname
  181. Exclude the test named in
  182. .Ar testname
  183. from the list of tests made to determine the file type. Valid test names
  184. are:
  185. .Bl -tag -width
  186. .It apptype
  187. Check for
  188. .Dv EMX
  189. application type (only on EMX).
  190. .It ascii
  191. Check for various types of ascii files.
  192. .It compress
  193. Don't look for, or inside compressed files.
  194. .It elf
  195. Don't print elf details.
  196. .It fortran
  197. Don't look for fortran sequences inside ascii files.
  198. .It soft
  199. Don't consult magic files.
  200. .It tar
  201. Don't examine tar files.
  202. .It token
  203. Don't look for known tokens inside ascii files.
  204. .It troff
  205. Don't look for troff sequences inside ascii files.
  206. .El
  207. .It Fl f , -files-from Ar namefile
  208. Read the names of the files to be examined from
  209. .Ar namefile
  210. (one per line)
  211. before the argument list.
  212. Either
  213. .Ar namefile
  214. or at least one filename argument must be present;
  215. to test the standard input, use
  216. .Sq -
  217. as a filename argument.
  218. .It Fl F , -separator Ar separator
  219. Use the specified string as the separator between the filename and the
  220. file result returned. Defaults to
  221. .Sq \&: .
  222. .It Fl h , -no-dereference
  223. option causes symlinks not to be followed
  224. (on systems that support symbolic links). This is the default if the
  225. environment variable
  226. .Dv POSIXLY_CORRECT
  227. is not defined.
  228. .It Fl i , -mime
  229. Causes the file command to output mime type strings rather than the more
  230. traditional human readable ones. Thus it may say
  231. .Dq text/plain; charset=us-ascii
  232. rather
  233. than
  234. .Dq ASCII text .
  235. In order for this option to work, file changes the way
  236. it handles files recognized by the command itself (such as many of the
  237. text file types, directories etc), and makes use of an alternative
  238. .Dq magic
  239. file.
  240. (See
  241. .Dq FILES
  242. section, below).
  243. .It Fl k , -keep-going
  244. Don't stop at the first match, keep going. Subsequent matches will be
  245. prepended by
  246. .Dq "\[rs]012\- ".
  247. (If you want a newline, see
  248. .Dq "\-r"
  249. option.)
  250. .It Fl L , -dereference
  251. option causes symlinks to be followed, as the like-named option in
  252. .Xr ls 1
  253. (on systems that support symbolic links).
  254. This is the default if the environment variable
  255. .Dv POSIXLY_CORRECT
  256. is defined.
  257. .It Fl m , -magic-file Ar list
  258. Specify an alternate list of files containing magic numbers.
  259. This can be a single file, or a colon-separated list of files.
  260. If a compiled magic file is found alongside, it will be used instead.
  261. With the
  262. .Fl i
  263. or
  264. .Fl "mime"
  265. option, the program adds
  266. .Dq .mime
  267. to each file name.
  268. .It Fl n , -no-buffer
  269. Force stdout to be flushed after checking each file.
  270. This is only useful if checking a list of files.
  271. It is intended to be used by programs that want filetype output from a pipe.
  272. .It Fl N , -no-pad
  273. Don't pad filenames so that they align in the output.
  274. .It Fl p , -preserve-date
  275. On systems that support
  276. .Xr utime 2
  277. or
  278. .Xr utimes 2 ,
  279. attempt to preserve the access time of files analyzed, to pretend that
  280. .Nm
  281. never read them.
  282. .It Fl r , -raw
  283. Don't translate unprintable characters to \eooo.
  284. Normally
  285. .Nm
  286. translates unprintable characters to their octal representation.
  287. .It Fl s , -special-files
  288. Normally,
  289. .Nm
  290. only attempts to read and determine the type of argument files which
  291. .Xr stat 2
  292. reports are ordinary files.
  293. This prevents problems, because reading special files may have peculiar
  294. consequences.
  295. Specifying the
  296. .Fl s
  297. option causes
  298. .Nm
  299. to also read argument files which are block or character special files.
  300. This is useful for determining the filesystem types of the data in raw
  301. disk partitions, which are block special files.
  302. This option also causes
  303. .Nm
  304. to disregard the file size as reported by
  305. .Xr stat 2
  306. since on some systems it reports a zero size for raw disk partitions.
  307. .It Fl v , -version
  308. Print the version of the program and exit.
  309. .It Fl z , -uncompress
  310. Try to look inside compressed files.
  311. .It Fl 0 , -print0
  312. Output a null character
  313. .Sq \e0
  314. after the end of the filename. Nice to
  315. .Xr cut 1
  316. the output. This does not affect the separator which is still printed.
  317. .It Fl -help
  318. Print a help message and exit.
  319. .El
  320. .Sh FILES
  321. .Bl -tag -width __MAGIC__.mime.mgc -compact
  322. .It Pa __MAGIC__.mgc
  323. Default compiled list of magic numbers
  324. .It Pa __MAGIC__
  325. Default list of magic numbers
  326. .It Pa __MAGIC__.mime.mgc
  327. Default compiled list of magic numbers, used to output mime types when
  328. the
  329. .Fl i
  330. option is specified.
  331. .It Pa __MAGIC__.mime
  332. Default list of magic numbers, used to output mime types when the
  333. .Fl i
  334. option is specified.
  335. .El
  336. .Sh ENVIRONMENT
  337. The environment variable
  338. .Dv MAGIC
  339. can be used to set the default magic number file name.
  340. If that variable is set, then
  341. .Nm
  342. will not attempt to open
  343. .Pa $HOME/.magic .
  344. .Nm
  345. adds
  346. .Dq .mime
  347. and/or
  348. .Dq .mgc
  349. to the value of this variable as appropriate.
  350. However,
  351. .Pa file
  352. has to exist in order for
  353. .Pa file.mime
  354. to be considered.
  355. The environment variable
  356. .Dv POSIXLY_CORRECT
  357. controls (on systems that support symbolic links), if
  358. .Nm
  359. will attempt to follow symlinks or not. If set, then
  360. .Nm
  361. follows symlink, otherwise it does not. This is also controlled
  362. by the
  363. .Fl L
  364. and
  365. .Fl h
  366. options.
  367. .Sh SEE ALSO
  368. .Xr magic __FSECTION__ ,
  369. .Xr strings 1 ,
  370. .Xr od 1 ,
  371. .Xr hexdump 1
  372. .Sh STANDARDS CONFORMANCE
  373. This program is believed to exceed the System V Interface Definition
  374. of FILE(CMD), as near as one can determine from the vague language
  375. contained therein.
  376. Its behavior is mostly compatible with the System V program of the same name.
  377. This version knows more magic, however, so it will produce
  378. different (albeit more accurate) output in many cases.
  379. .\" URL: http://www.opengroup.org/onlinepubs/009695399/utilities/file.html
  380. .Pp
  381. The one significant difference
  382. between this version and System V
  383. is that this version treats any white space
  384. as a delimiter, so that spaces in pattern strings must be escaped.
  385. For example,
  386. .Bd -literal -offset indent
  387. >10 string language impress\ (imPRESS data)
  388. .Ed
  389. .Pp
  390. in an existing magic file would have to be changed to
  391. .Bd -literal -offset indent
  392. >10 string language\e impress (imPRESS data)
  393. .Ed
  394. .Pp
  395. In addition, in this version, if a pattern string contains a backslash,
  396. it must be escaped.
  397. For example
  398. .Bd -literal -offset indent
  399. 0 string \ebegindata Andrew Toolkit document
  400. .Ed
  401. .Pp
  402. in an existing magic file would have to be changed to
  403. .Bd -literal -offset indent
  404. 0 string \e\ebegindata Andrew Toolkit document
  405. .Ed
  406. .Pp
  407. SunOS releases 3.2 and later from Sun Microsystems include a
  408. .Nm
  409. command derived from the System V one, but with some extensions.
  410. My version differs from Sun's only in minor ways.
  411. It includes the extension of the
  412. .Sq &
  413. operator, used as,
  414. for example,
  415. .Bd -literal -offset indent
  416. >16 long&0x7fffffff >0 not stripped
  417. .Ed
  418. .Sh MAGIC DIRECTORY
  419. The magic file entries have been collected from various sources,
  420. mainly USENET, and contributed by various authors.
  421. Christos Zoulas (address below) will collect additional
  422. or corrected magic file entries.
  423. A consolidation of magic file entries
  424. will be distributed periodically.
  425. .Pp
  426. The order of entries in the magic file is significant.
  427. Depending on what system you are using, the order that
  428. they are put together may be incorrect.
  429. .Sh EXAMPLES
  430. .Bd -literal -offset indent
  431. $ file file.c file /dev/{wd0a,hda}
  432. file.c: C program text
  433. file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
  434. dynamically linked (uses shared libs), stripped
  435. /dev/wd0a: block special (0/0)
  436. /dev/hda: block special (3/0)
  437. $ file -s /dev/wd0{b,d}
  438. /dev/wd0b: data
  439. /dev/wd0d: x86 boot sector
  440. $ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
  441. /dev/hda: x86 boot sector
  442. /dev/hda1: Linux/i386 ext2 filesystem
  443. /dev/hda2: x86 boot sector
  444. /dev/hda3: x86 boot sector, extended partition table
  445. /dev/hda4: Linux/i386 ext2 filesystem
  446. /dev/hda5: Linux/i386 swap file
  447. /dev/hda6: Linux/i386 swap file
  448. /dev/hda7: Linux/i386 swap file
  449. /dev/hda8: Linux/i386 swap file
  450. /dev/hda9: empty
  451. /dev/hda10: empty
  452. $ file -i file.c file /dev/{wd0a,hda}
  453. file.c: text/x-c
  454. file: application/x-executable, dynamically linked (uses shared libs),
  455. not stripped
  456. /dev/hda: application/x-not-regular-file
  457. /dev/wd0a: application/x-not-regular-file
  458. .Ed
  459. .Sh HISTORY
  460. There has been a
  461. .Nm
  462. command in every
  463. .Dv UNIX since at least Research Version 4
  464. (man page dated November, 1973).
  465. The System V version introduced one significant major change:
  466. the external list of magic number types.
  467. This slowed the program down slightly but made it a lot more flexible.
  468. .Pp
  469. This program, based on the System V version,
  470. was written by Ian Darwin <ian@darwinsys.com>
  471. without looking at anybody else's source code.
  472. .Pp
  473. John Gilmore revised the code extensively, making it better than
  474. the first version.
  475. Geoff Collyer found several inadequacies
  476. and provided some magic file entries.
  477. Contributions by the `&' operator by Rob McMahon, cudcv@warwick.ac.uk, 1989.
  478. .Pp
  479. Guy Harris, guy@netapp.com, made many changes from 1993 to the present.
  480. .Pp
  481. Primary development and maintenance from 1990 to the present by
  482. Christos Zoulas (christos@astron.com).
  483. .Pp
  484. Altered by Chris Lowth, chris@lowth.com, 2000:
  485. Handle the
  486. .Fl i
  487. option to output mime type strings and using an alternative
  488. magic file and internal logic.
  489. .Pp
  490. Altered by Eric Fischer (enf@pobox.com), July, 2000,
  491. to identify character codes and attempt to identify the languages
  492. of non-ASCII files.
  493. .Pp
  494. The list of contributors to the "Magdir" directory (source for the
  495. .Pa __MAGIC__
  496. file) is too long to include here.
  497. You know who you are; thank you.
  498. .Sh LEGAL NOTICE
  499. Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
  500. Covered by the standard Berkeley Software Distribution copyright; see the file
  501. LEGAL.NOTICE in the source distribution.
  502. .Pp
  503. The files
  504. .Dv tar.h
  505. and
  506. .Dv is_tar.c
  507. were written by John Gilmore from his public-domain
  508. .Xr tar 1
  509. program, and are not covered by the above license.
  510. .Sh BUGS
  511. There must be a better way to automate the construction of the Magic
  512. file from all the glop in Magdir.
  513. What is it?
  514. .\" Compilation support has been done
  515. .\" Better yet, the magic file should be compiled into binary (say,
  516. .\" .Xr ndbm 3
  517. .\" or, better yet, fixed-length
  518. .\" .Dv ASCII
  519. .\" strings for use in heterogenous network environments) for faster startup.
  520. .\" Then the program would run as fast as the Version 7 program of the same
  521. .\" name, with the flexibility of the System V version.
  522. .Pp
  523. .Nm
  524. uses several algorithms that favor speed over accuracy,
  525. thus it can be misled about the contents of
  526. text
  527. files.
  528. .Pp
  529. The support for text files (primarily for programming languages)
  530. is simplistic, inefficient and requires recompilation to update.
  531. .\" Else support has been done
  532. .\" There should be an
  533. .\" .Dv else
  534. .\" clause to follow a series of continuation lines.
  535. .\" .Pp
  536. .\" Regular expression support has been done
  537. .\" The magic file and keywords should have regular expression support.
  538. Their use of
  539. .Dv ASCII TAB
  540. as a field delimiter is ugly and makes
  541. it hard to edit the files, but is entrenched.
  542. .Pp
  543. It might be advisable to allow upper-case letters in keywords
  544. for e.g.,
  545. .Xr troff 1
  546. commands vs man page macros.
  547. Regular expression support would make this easy.
  548. .Pp
  549. The program doesn't grok
  550. .Dv FORTRAN .
  551. It should be able to figure
  552. .Dv FORTRAN
  553. by seeing some keywords which
  554. appear indented at the start of line.
  555. Regular expression support would make this easy.
  556. .Pp
  557. The list of keywords in
  558. .Dv ascmagic
  559. probably belongs in the Magic file.
  560. This could be done by using some keyword like
  561. .Sq *
  562. for the offset value.
  563. .Pp
  564. .\" Sorting has been done.
  565. .\" Another optimization would be to sort
  566. .\" the magic file so that we can just run down all the
  567. .\" tests for the first byte, first word, first long, etc, once we
  568. .\" have fetched it.
  569. Complain about conflicts in the magic file entries.
  570. Make a rule that the magic entries sort based on file offset rather
  571. than position within the magic file?
  572. .Pp
  573. The program should provide a way to give an estimate
  574. of
  575. .Dq how good
  576. a guess is.
  577. We end up removing guesses (e.g.
  578. .Dq From\
  579. as first 5 chars of file) because
  580. they are not as good as other guesses (e.g.
  581. .Dq Newsgroups:
  582. versus
  583. .Dq Return-Path:
  584. ).
  585. Still, if the others don't pan out, it should be possible to use the
  586. first guess.
  587. .Pp
  588. This program is slower than some vendors' file commands.
  589. The new support for multiple character codes makes it even slower.
  590. .Pp
  591. This manual page, and particularly this section, is too long.
  592. .Sh RETURN CODE
  593. .Nm
  594. almost always returns 0. It returns a different code if it cannot open a file.
  595. .Sh AVAILABILITY
  596. You can obtain the original author's latest version by anonymous FTP
  597. on
  598. .Dv ftp.astron.com
  599. in the directory
  600. .Dv /pub/file/file-X.YZ.tar.gz
  601. .Pp
  602. This Debian version adds a number of new magix entries. It can be
  603. obtained from every site carrying a Debian distribution (that is
  604. .Dv ftp.debian.org
  605. and mirrors).