17 KB

  1. .\" $File:,v 1.66 2007/10/23 19:58:59 christos Exp $
  2. .Dd January 8, 2007
  3. .Dt FILE __CSECTION__
  4. .Os
  5. .Sh NAME
  6. .Nm file
  7. .Nd determine file type
  9. .Nm
  10. .Op Fl bchikLnNprsvz
  11. .Op Fl mime-type
  12. .Op Fl mime-encoding
  13. .Op Fl f Ar namefile
  14. .Op Fl F Ar separator
  15. .Op Fl m Ar magicfiles
  16. .Ar file
  17. .Nm
  18. .Fl C
  19. .Op Fl m Ar magicfile
  21. This manual page documents version __VERSION__ of the
  22. .Nm
  23. command.
  24. .Pp
  25. .Nm
  26. tests each argument in an attempt to classify it.
  27. There are three sets of tests, performed in this order:
  28. filesystem tests, magic number tests, and language tests.
  29. The
  30. .Em first
  31. test that succeeds causes the file type to be printed.
  32. .Pp
  33. The type printed will usually contain one of the words
  34. .Em text
  35. (the file contains only
  36. printing characters and a few common control
  37. characters and is probably safe to read on an
  38. .Dv ASCII
  39. terminal),
  40. .Em executable
  41. (the file contains the result of compiling a program
  42. in a form understandable to some
  43. .Dv UNIX
  44. kernel or another),
  45. or
  46. .Em data
  47. meaning anything else (data is usually
  48. .Sq binary
  49. or non-printable).
  50. Exceptions are well-known file formats (core files, tar archives)
  51. that are known to contain binary data.
  52. When modifying the file
  53. .Pa __MAGIC__
  54. or the program itself, make sure to
  55. .Em "preserve these keywords" .
  56. People depend on knowing that all the readable files in a directory
  57. have the word
  58. .Dq text
  59. printed.
  60. Don't do as Berkeley did and change
  61. .Dq shell commands text
  62. to
  63. .Dq shell script .
  64. Note that the file
  65. .Pa __MAGIC__
  66. is built mechanically from a large number of small files in
  67. the subdirectory
  68. .Pa Magdir
  69. in the source distribution of this program.
  70. .Pp
  71. The filesystem tests are based on examining the return from a
  72. .Xr stat 2
  73. system call.
  74. The program checks to see if the file is empty,
  75. or if it's some sort of special file.
  76. Any known file types appropriate to the system you are running on
  77. (sockets, symbolic links, or named pipes (FIFOs) on those systems that
  78. implement them)
  79. are intuited if they are defined in
  80. the system header file
  81. .In sys/stat.h .
  82. .Pp
  83. The magic number tests are used to check for files with data in
  84. particular fixed formats.
  85. The canonical example of this is a binary executable (compiled program)
  86. .Dv a.out
  87. file, whose format is defined in
  88. .In elf.h ,
  89. .In a.out.h
  90. and possibly
  91. .In exec.h
  92. in the standard include directory.
  93. These files have a
  94. .Sq "magic number"
  95. stored in a particular place
  96. near the beginning of the file that tells the
  97. .Dv UNIX operating system
  98. that the file is a binary executable, and which of several types thereof.
  99. The concept of a
  100. .Sq "magic number"
  101. has been applied by extension to data files.
  102. Any file with some invariant identifier at a small fixed
  103. offset into the file can usually be described in this way.
  104. The information identifying these files is read from the compiled
  105. magic file
  106. .Pa __MAGIC__.mgc ,
  107. or
  108. .Pa __MAGIC__
  109. if the compile file does not exist. In addition
  110. .Nm
  111. will look in
  112. .Pa $HOME/.magic.mgc ,
  113. or
  114. .Pa $HOME/.magic
  115. for magic entries.
  116. .Pp
  117. If a file does not match any of the entries in the magic file,
  118. it is examined to see if it seems to be a text file.
  119. ASCII, ISO-8859-x, non-ISO 8-bit extended-ASCII character sets
  120. (such as those used on Macintosh and IBM PC systems),
  121. UTF-8-encoded Unicode, UTF-16-encoded Unicode, and EBCDIC
  122. character sets can be distinguished by the different
  123. ranges and sequences of bytes that constitute printable text
  124. in each set.
  125. If a file passes any of these tests, its character set is reported.
  126. ASCII, ISO-8859-x, UTF-8, and extended-ASCII files are identified
  127. as
  128. .Dq text
  129. because they will be mostly readable on nearly any terminal;
  130. UTF-16 and EBCDIC are only
  131. .Dq character data
  132. because, while
  133. they contain text, it is text that will require translation
  134. before it can be read.
  135. In addition,
  136. .Nm
  137. will attempt to determine other characteristics of text-type files.
  138. If the lines of a file are terminated by CR, CRLF, or NEL, instead
  139. of the Unix-standard LF, this will be reported.
  140. Files that contain embedded escape sequences or overstriking
  141. will also be identified.
  142. .Pp
  143. Once
  144. .Nm
  145. has determined the character set used in a text-type file,
  146. it will
  147. attempt to determine in what language the file is written.
  148. The language tests look for particular strings (cf
  149. .In names.h
  150. that can appear anywhere in the first few blocks of a file.
  151. For example, the keyword
  152. .Em .br
  153. indicates that the file is most likely a
  154. .Xr troff 1
  155. input file, just as the keyword
  156. .Em struct
  157. indicates a C program.
  158. These tests are less reliable than the previous
  159. two groups, so they are performed last.
  160. The language test routines also test for some miscellany
  161. (such as
  162. .Xr tar 1
  163. archives).
  164. .Pp
  165. Any file that cannot be identified as having been written
  166. in any of the character sets listed above is simply said to be ``data''.
  167. .Sh OPTIONS
  168. .Bl -tag -width indent
  169. .It Fl b , -brief
  170. Do not prepend filenames to output lines (brief mode).
  171. .It Fl c , -checking-printout
  172. Cause a checking printout of the parsed form of the magic file.
  173. This is usually used in conjunction with the
  174. .Fl m
  175. flag to debug a new magic file before installing it.
  176. .It Fl C , -compile
  177. Write a
  178. .Pa magic.mgc
  179. output file that contains a pre-parsed version of the magic file.
  180. .It Fl e , -exclude Ar testname
  181. Exclude the test named in
  182. .Ar testname
  183. from the list of tests made to determine the file type. Valid test names
  184. are:
  185. .Bl -tag -width
  186. .It apptype
  187. Check for
  188. .Dv EMX
  189. application type (only on EMX).
  190. .It ascii
  191. Check for various types of ascii files.
  192. .It compress
  193. Don't look for, or inside compressed files.
  194. .It elf
  195. Don't print elf details.
  196. .It fortran
  197. Don't look for fortran sequences inside ascii files.
  198. .It soft
  199. Don't consult magic files.
  200. .It tar
  201. Don't examine tar files.
  202. .It token
  203. Don't look for known tokens inside ascii files.
  204. .It troff
  205. Don't look for troff sequences inside ascii files.
  206. .El
  207. .It Fl f , -files-from Ar namefile
  208. Read the names of the files to be examined from
  209. .Ar namefile
  210. (one per line)
  211. before the argument list.
  212. Either
  213. .Ar namefile
  214. or at least one filename argument must be present;
  215. to test the standard input, use
  216. .Sq -
  217. as a filename argument.
  218. .It Fl F , -separator Ar separator
  219. Use the specified string as the separator between the filename and the
  220. file result returned. Defaults to
  221. .Sq \&: .
  222. .It Fl h , -no-dereference
  223. option causes symlinks not to be followed
  224. (on systems that support symbolic links). This is the default if the
  225. environment variable
  227. is not defined.
  228. .It Fl i , -mime
  229. Causes the file command to output mime type strings rather than the more
  230. traditional human readable ones. Thus it may say
  231. .Dq text/plain charset=us-ascii
  232. rather than
  233. .Dq ASCII text .
  234. In order for this option to work, file changes the way
  235. it handles files recognized by the command itself (such as many of the
  236. text file types, directories etc), and makes use of an alternative
  237. .Dq magic
  238. file.
  239. (See
  240. .Dq FILES
  241. section, below).
  242. .It Fl -mime-type , -mime-encoding
  243. Like
  244. .Fl i ,
  245. but print only the specified element(s).
  246. .It Fl k , -keep-going
  247. Don't stop at the first match, keep going.
  248. .It Fl L , -dereference
  249. option causes symlinks to be followed, as the like-named option in
  250. .Xr ls 1
  251. (on systems that support symbolic links).
  252. This is the default if the environment variable
  254. is defined.
  255. .It Fl m , -magic-file Ar list
  256. Specify an alternate list of files containing magic numbers.
  257. This can be a single file, or a colon-separated list of files.
  258. If a compiled magic file is found alongside, it will be used instead.
  259. With the
  260. .Fl i
  261. or
  262. .Fl "mime"
  263. option, the program adds
  264. .Dq .mime
  265. to each file name.
  266. .It Fl n , -no-buffer
  267. Force stdout to be flushed after checking each file.
  268. This is only useful if checking a list of files.
  269. It is intended to be used by programs that want filetype output from a pipe.
  270. .It Fl N , -no-pad
  271. Don't pad filenames so that they align in the output.
  272. .It Fl p , -preserve-date
  273. On systems that support
  274. .Xr utime 2
  275. or
  276. .Xr utimes 2 ,
  277. attempt to preserve the access time of files analyzed, to pretend that
  278. .Nm
  279. never read them.
  280. .It Fl r , -raw
  281. Don't translate unprintable characters to \eooo.
  282. Normally
  283. .Nm
  284. translates unprintable characters to their octal representation.
  285. .It Fl s , -special-files
  286. Normally,
  287. .Nm
  288. only attempts to read and determine the type of argument files which
  289. .Xr stat 2
  290. reports are ordinary files.
  291. This prevents problems, because reading special files may have peculiar
  292. consequences.
  293. Specifying the
  294. .Fl s
  295. option causes
  296. .Nm
  297. to also read argument files which are block or character special files.
  298. This is useful for determining the filesystem types of the data in raw
  299. disk partitions, which are block special files.
  300. This option also causes
  301. .Nm
  302. to disregard the file size as reported by
  303. .Xr stat 2
  304. since on some systems it reports a zero size for raw disk partitions.
  305. .It Fl v , -version
  306. Print the version of the program and exit.
  307. .It Fl z , -uncompress
  308. Try to look inside compressed files.
  309. .It Fl 0 , -print0
  310. Output a null character
  311. .Sq \e0
  312. after the end of the filename. Nice to
  313. .Xr cut 1
  314. the output. This does not affect the separator which is still printed.
  315. .It Fl -help
  316. Print a help message and exit.
  317. .El
  318. .Sh FILES
  319. .Bl -tag -width __MAGIC__.mime.mgc -compact
  320. .It Pa __MAGIC__.mgc
  321. Default compiled list of magic numbers
  322. .It Pa __MAGIC__
  323. Default list of magic numbers
  324. .It Pa __MAGIC__.mime.mgc
  325. Default compiled list of magic numbers, used to output mime types when
  326. the
  327. .Fl i
  328. option is specified.
  329. .It Pa __MAGIC__.mime
  330. Default list of magic numbers, used to output mime types when the
  331. .Fl i
  332. option is specified.
  333. .El
  335. The environment variable
  336. .Dv MAGIC
  337. can be used to set the default magic number file name.
  338. If that variable is set, then
  339. .Nm
  340. will not attempt to open
  341. .Pa $HOME/.magic .
  342. .Nm
  343. adds
  344. .Dq .mime
  345. and/or
  346. .Dq .mgc
  347. to the value of this variable as appropriate.
  348. The environment variable
  350. controls (on systems that support symbolic links), if
  351. .Nm
  352. will attempt to follow symlinks or not. If set, then
  353. .Nm
  354. follows symlink, otherwise it does not. This is also controlled
  355. by the
  356. .Fl L
  357. and
  358. .Fl h
  359. options.
  360. .Sh SEE ALSO
  361. .Xr magic __FSECTION__ ,
  362. .Xr strings 1 ,
  363. .Xr od 1 ,
  364. .Xr hexdump 1
  366. This program is believed to exceed the System V Interface Definition
  367. of FILE(CMD), as near as one can determine from the vague language
  368. contained therein.
  369. Its behavior is mostly compatible with the System V program of the same name.
  370. This version knows more magic, however, so it will produce
  371. different (albeit more accurate) output in many cases.
  372. .\" URL:
  373. .Pp
  374. The one significant difference
  375. between this version and System V
  376. is that this version treats any white space
  377. as a delimiter, so that spaces in pattern strings must be escaped.
  378. For example,
  379. .Bd -literal -offset indent
  380. >10 string language impress\ (imPRESS data)
  381. .Ed
  382. .Pp
  383. in an existing magic file would have to be changed to
  384. .Bd -literal -offset indent
  385. >10 string language\e impress (imPRESS data)
  386. .Ed
  387. .Pp
  388. In addition, in this version, if a pattern string contains a backslash,
  389. it must be escaped.
  390. For example
  391. .Bd -literal -offset indent
  392. 0 string \ebegindata Andrew Toolkit document
  393. .Ed
  394. .Pp
  395. in an existing magic file would have to be changed to
  396. .Bd -literal -offset indent
  397. 0 string \e\ebegindata Andrew Toolkit document
  398. .Ed
  399. .Pp
  400. SunOS releases 3.2 and later from Sun Microsystems include a
  401. .Nm
  402. command derived from the System V one, but with some extensions.
  403. My version differs from Sun's only in minor ways.
  404. It includes the extension of the
  405. .Sq &
  406. operator, used as,
  407. for example,
  408. .Bd -literal -offset indent
  409. >16 long&0x7fffffff >0 not stripped
  410. .Ed
  412. The magic file entries have been collected from various sources,
  413. mainly USENET, and contributed by various authors.
  414. Christos Zoulas (address below) will collect additional
  415. or corrected magic file entries.
  416. A consolidation of magic file entries
  417. will be distributed periodically.
  418. .Pp
  419. The order of entries in the magic file is significant.
  420. Depending on what system you are using, the order that
  421. they are put together may be incorrect.
  422. If your old
  423. .Nm
  424. command uses a magic file,
  425. keep the old magic file around for comparison purposes
  426. (rename it to
  427. .Pa __MAGIC__.orig ).
  428. .Sh EXAMPLES
  429. .Bd -literal -offset indent
  430. $ file file.c file /dev/{wd0a,hda}
  431. file.c: C program text
  432. file: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV),
  433. dynamically linked (uses shared libs), stripped
  434. /dev/wd0a: block special (0/0)
  435. /dev/hda: block special (3/0)
  436. $ file -s /dev/wd0{b,d}
  437. /dev/wd0b: data
  438. /dev/wd0d: x86 boot sector
  439. $ file -s /dev/hda{,1,2,3,4,5,6,7,8,9,10}
  440. /dev/hda: x86 boot sector
  441. /dev/hda1: Linux/i386 ext2 filesystem
  442. /dev/hda2: x86 boot sector
  443. /dev/hda3: x86 boot sector, extended partition table
  444. /dev/hda4: Linux/i386 ext2 filesystem
  445. /dev/hda5: Linux/i386 swap file
  446. /dev/hda6: Linux/i386 swap file
  447. /dev/hda7: Linux/i386 swap file
  448. /dev/hda8: Linux/i386 swap file
  449. /dev/hda9: empty
  450. /dev/hda10: empty
  451. $ file -i file.c file /dev/{wd0a,hda}
  452. file.c: text/x-c
  453. file: application/x-executable
  454. /dev/hda: application/x-not-regular-file
  455. /dev/wd0a: application/x-not-regular-file
  456. .Ed
  457. .Sh HISTORY
  458. There has been a
  459. .Nm
  460. command in every
  461. .Dv UNIX since at least Research Version 4
  462. (man page dated November, 1973).
  463. The System V version introduced one significant major change:
  464. the external list of magic number types.
  465. This slowed the program down slightly but made it a lot more flexible.
  466. .Pp
  467. This program, based on the System V version,
  468. was written by Ian Darwin <>
  469. without looking at anybody else's source code.
  470. .Pp
  471. John Gilmore revised the code extensively, making it better than
  472. the first version.
  473. Geoff Collyer found several inadequacies
  474. and provided some magic file entries.
  475. Contributions by the `&' operator by Rob McMahon,, 1989.
  476. .Pp
  477. Guy Harris,, made many changes from 1993 to the present.
  478. .Pp
  479. Primary development and maintenance from 1990 to the present by
  480. Christos Zoulas (
  481. .Pp
  482. Altered by Chris Lowth,, 2000:
  483. Handle the
  484. .Fl i
  485. option to output mime type strings and using an alternative
  486. magic file and internal logic.
  487. .Pp
  488. Altered by Eric Fischer (, July, 2000,
  489. to identify character codes and attempt to identify the languages
  490. of non-ASCII files.
  491. .Pp
  492. The list of contributors to the "Magdir" directory (source for the
  493. .Pa __MAGIC__
  494. file) is too long to include here.
  495. You know who you are; thank you.
  497. Copyright (c) Ian F. Darwin, Toronto, Canada, 1986-1999.
  498. Covered by the standard Berkeley Software Distribution copyright; see the file
  499. LEGAL.NOTICE in the source distribution.
  500. .Pp
  501. The files
  502. .Dv tar.h
  503. and
  504. .Dv is_tar.c
  505. were written by John Gilmore from his public-domain
  506. .Xr tar 1
  507. program, and are not covered by the above license.
  508. .Sh BUGS
  509. There must be a better way to automate the construction of the Magic
  510. file from all the glop in Magdir.
  511. What is it?
  512. .\" Compilation support has been done
  513. .\" Better yet, the magic file should be compiled into binary (say,
  514. .\" .Xr ndbm 3
  515. .\" or, better yet, fixed-length
  516. .\" .Dv ASCII
  517. .\" strings for use in heterogenous network environments) for faster startup.
  518. .\" Then the program would run as fast as the Version 7 program of the same
  519. .\" name, with the flexibility of the System V version.
  520. .Pp
  521. .Nm
  522. uses several algorithms that favor speed over accuracy,
  523. thus it can be misled about the contents of
  524. text
  525. files.
  526. .Pp
  527. The support for text files (primarily for programming languages)
  528. is simplistic, inefficient and requires recompilation to update.
  529. .\" Else support has been done
  530. .\" There should be an
  531. .\" .Dv else
  532. .\" clause to follow a series of continuation lines.
  533. .\" .Pp
  534. .\" Regular expression support has been done
  535. .\" The magic file and keywords should have regular expression support.
  536. Their use of
  537. .Dv ASCII TAB
  538. as a field delimiter is ugly and makes
  539. it hard to edit the files, but is entrenched.
  540. .Pp
  541. It might be advisable to allow upper-case letters in keywords
  542. for e.g.,
  543. .Xr troff 1
  544. commands vs man page macros.
  545. Regular expression support would make this easy.
  546. .Pp
  547. The program doesn't grok
  548. .Dv FORTRAN .
  549. It should be able to figure
  550. .Dv FORTRAN
  551. by seeing some keywords which
  552. appear indented at the start of line.
  553. Regular expression support would make this easy.
  554. .Pp
  555. The list of keywords in
  556. .Dv ascmagic
  557. probably belongs in the Magic file.
  558. This could be done by using some keyword like
  559. .Sq *
  560. for the offset value.
  561. .Pp
  562. .\" Sorting has been done.
  563. .\" Another optimization would be to sort
  564. .\" the magic file so that we can just run down all the
  565. .\" tests for the first byte, first word, first long, etc, once we
  566. .\" have fetched it.
  567. Complain about conflicts in the magic file entries.
  568. Make a rule that the magic entries sort based on file offset rather
  569. than position within the magic file?
  570. .Pp
  571. The program should provide a way to give an estimate
  572. of
  573. .Dq how good
  574. a guess is.
  575. We end up removing guesses (e.g.
  576. .Dq From\
  577. as first 5 chars of file) because
  578. they are not as good as other guesses (e.g.
  579. .Dq Newsgroups:
  580. versus
  581. .Dq Return-Path:
  582. ).
  583. Still, if the others don't pan out, it should be possible to use the
  584. first guess.
  585. .Pp
  586. This program is slower than some vendors' file commands.
  587. The new support for multiple character codes makes it even slower.
  588. .Pp
  589. This manual page, and particularly this section, is too long.
  591. You can obtain the original author's latest version by anonymous FTP
  592. on
  593. .Dv
  594. in the directory
  595. .Dv /pub/file/file-X.YZ.tar.gz