| 123456789101112131415161718192021222324252627282930313233343536373839404142434445464748495051525354555657585960616263646566676869707172737475767778798081828384858687888990919293949596979899100101102103104105106107108109110111112113114115116117118119120121122123124125126127128129130131132133134135136137138139140141142143144145146147148149150151152153154155156157158159160161162163164165166167168169170171172173174175176177178179180181182183184185186187188189190191192193194195196197198199200201202203204205206207208209210211212213214215216217218219220221222223224225226227228229230231232233234235236237238239240241242243244245246247248249250251252253254255256257258259260261262263264265266267268269270271272273274275276277278279280281282283284285286287288289290291292293294295296297298299300301302303304305306307308309310311312313314315316317318319320321322323324325326327328329330331332333334335336337338339340341342343344345346347348349350351352353354355356357358359360361362363364365366367368369370371372373374375376377378379380381382383384385386387388389390391392393394395396397398399400401402403404405406407408409410411412413414415416417418419420421422423424425426427428429430431432433434435436437438439440441442443444445446447448449450451452453454455456457458459460461462463464465466467468469470471472473474475476477478479480481482483484485486487488489490491492493494495496497498499500501502503504505506507508509510511512513514515516517518519520521522523524525526527528529530531532533534535536537538539540541542543544545546547548549550551552553554555556557558559560561562563564565 | 
							- .\" $File: magic.man,v 1.71 2011/12/07 11:58:24 rrt Exp $
 
- .Dd April 20, 2011
 
- .Dt MAGIC __FSECTION__
 
- .Os
 
- .\" install as magic.4 on USG, magic.5 on V7, Berkeley and Linux systems.
 
- .Sh NAME
 
- .Nm magic
 
- .Nd file command's magic pattern file
 
- .Sh DESCRIPTION
 
- This manual page documents the format of the magic file as
 
- used by the
 
- .Xr file __CSECTION__
 
- command, version __VERSION__.
 
- The
 
- .Xr file __CSECTION__
 
- command identifies the type of a file using,
 
- among other tests,
 
- a test for whether the file contains certain
 
- .Dq "magic patterns" .
 
- The file
 
- .Pa __MAGIC__
 
- specifies what patterns are to be tested for, what message or
 
- MIME type to print if a particular pattern is found,
 
- and additional information to extract from the file.
 
- .Pp
 
- Each line of the file specifies a test to be performed.
 
- A test compares the data starting at a particular offset
 
- in the file with a byte value, a string or a numeric value.
 
- If the test succeeds, a message is printed.
 
- The line consists of the following fields:
 
- .Bl -tag -width ".Dv message"
 
- .It Dv offset
 
- A number specifying the offset, in bytes, into the file of the data
 
- which is to be tested.
 
- .It Dv type
 
- The type of the data to be tested.
 
- The possible values are:
 
- .Bl -tag -width ".Dv lestring16"
 
- .It Dv byte
 
- A one-byte value.
 
- .It Dv short
 
- A two-byte value in this machine's native byte order.
 
- .It Dv long
 
- A four-byte value in this machine's native byte order.
 
- .It Dv quad
 
- An eight-byte value in this machine's native byte order.
 
- .It Dv float
 
- A 32-bit single precision IEEE floating point number in this machine's native byte order.
 
- .It Dv double
 
- A 64-bit double precision IEEE floating point number in this machine's native byte order.
 
- .It Dv string
 
- A string of bytes.
 
- The string type specification can be optionally followed
 
- by /[WwcCtb]*.
 
- The
 
- .Dq W
 
- flag compacts whitespace in the target, which must
 
- contain at least one whitespace character.
 
- If the magic has
 
- .Dv n
 
- consecutive blanks, the target needs at least
 
- .Dv n
 
- consecutive blanks to match.
 
- The
 
- .Dq w
 
- flag treats every blank in the magic as an optional blank.
 
- The
 
- .Dq c
 
- flag specifies case insensitive matching: lower case
 
- characters in the magic match both lower and upper case characters in the
 
- target, whereas upper case characters in the magic only match upper case
 
- characters in the target.
 
- The
 
- .Dq C
 
- flag specifies case insensitive matching: upper case
 
- characters in the magic match both lower and upper case characters in the
 
- target, whereas lower case characters in the magic only match upper case
 
- characters in the target.
 
- To do a complete case insensitive match, specify both
 
- .Dq c
 
- and
 
- .Dq C .
 
- The
 
- .Dq t
 
- flag forces the test to be done for text files, while the
 
- .Dq b
 
- flag forces the test to be done for binary files.
 
- .It Dv pstring
 
- A Pascal-style string where the first byte/short/int is interpreted as the an
 
- unsigned length.
 
- The length defaults to byte and can be specified as a modifier.
 
- The following modifiers are supported:
 
- .Bl -tag -compact -width B
 
- .It B
 
- A byte length (default).
 
- .It H
 
- A 2 byte big endian length.
 
- .It h
 
- A 2 byte big little length.
 
- .It L
 
- A 4 byte big endian length.
 
- .It l
 
- A 4 byte big little length.
 
- .It J
 
- The length includes itself in its count.
 
- .El
 
- The string is not NUL terminated.
 
- .Dq J
 
- is used rather than the more
 
- valuable
 
- .Dq I
 
- because this type of length is a feature of the JPEG
 
- format.
 
- .It Dv date
 
- A four-byte value interpreted as a UNIX date.
 
- .It Dv qdate
 
- A eight-byte value interpreted as a UNIX date.
 
- .It Dv ldate
 
- A four-byte value interpreted as a UNIX-style date, but interpreted as
 
- local time rather than UTC.
 
- .It Dv qldate
 
- An eight-byte value interpreted as a UNIX-style date, but interpreted as
 
- local time rather than UTC.
 
- .It Dv beid3
 
- A 32-bit ID3 length in big-endian byte order.
 
- .It Dv beshort
 
- A two-byte value in big-endian byte order.
 
- .It Dv belong
 
- A four-byte value in big-endian byte order.
 
- .It Dv bequad
 
- An eight-byte value in big-endian byte order.
 
- .It Dv befloat
 
- A 32-bit single precision IEEE floating point number in big-endian byte order.
 
- .It Dv bedouble
 
- A 64-bit double precision IEEE floating point number in big-endian byte order.
 
- .It Dv bedate
 
- A four-byte value in big-endian byte order,
 
- interpreted as a Unix date.
 
- .It Dv beqdate
 
- An eight-byte value in big-endian byte order,
 
- interpreted as a Unix date.
 
- .It Dv beldate
 
- A four-byte value in big-endian byte order,
 
- interpreted as a UNIX-style date, but interpreted as local time rather
 
- than UTC.
 
- .It Dv beqldate
 
- An eight-byte value in big-endian byte order,
 
- interpreted as a UNIX-style date, but interpreted as local time rather
 
- than UTC.
 
- .It Dv bestring16
 
- A two-byte unicode (UCS16) string in big-endian byte order.
 
- .It Dv leid3
 
- A 32-bit ID3 length in little-endian byte order.
 
- .It Dv leshort
 
- A two-byte value in little-endian byte order.
 
- .It Dv lelong
 
- A four-byte value in little-endian byte order.
 
- .It Dv lequad
 
- An eight-byte value in little-endian byte order.
 
- .It Dv lefloat
 
- A 32-bit single precision IEEE floating point number in little-endian byte order.
 
- .It Dv ledouble
 
- A 64-bit double precision IEEE floating point number in little-endian byte order.
 
- .It Dv ledate
 
- A four-byte value in little-endian byte order,
 
- interpreted as a UNIX date.
 
- .It Dv leqdate
 
- An eight-byte value in little-endian byte order,
 
- interpreted as a UNIX date.
 
- .It Dv leldate
 
- A four-byte value in little-endian byte order,
 
- interpreted as a UNIX-style date, but interpreted as local time rather
 
- than UTC.
 
- .It Dv leqldate
 
- An eight-byte value in little-endian byte order,
 
- interpreted as a UNIX-style date, but interpreted as local time rather
 
- than UTC.
 
- .It Dv lestring16
 
- A two-byte unicode (UCS16) string in little-endian byte order.
 
- .It Dv melong
 
- A four-byte value in middle-endian (PDP-11) byte order.
 
- .It Dv medate
 
- A four-byte value in middle-endian (PDP-11) byte order,
 
- interpreted as a UNIX date.
 
- .It Dv meldate
 
- A four-byte value in middle-endian (PDP-11) byte order,
 
- interpreted as a UNIX-style date, but interpreted as local time rather
 
- than UTC.
 
- .It Dv indirect
 
- Starting at the given offset, consult the magic database again.
 
- .It Dv regex
 
- A regular expression match in extended POSIX regular expression syntax
 
- (like egrep).
 
- Regular expressions can take exponential time to process, and their
 
- performance is hard to predict, so their use is discouraged.
 
- When used in production environments, their performance
 
- should be carefully checked.
 
- The type specification can be optionally followed by
 
- .Dv /[c][s] .
 
- The
 
- .Dq c
 
- flag makes the match case insensitive, while the
 
- .Dq s
 
- flag update the offset to the start offset of the match, rather than the end.
 
- The regular expression is tested against line
 
- .Dv N + 1
 
- onwards, where
 
- .Dv N
 
- is the given offset.
 
- Line endings are assumed to be in the machine's native format.
 
- .Dv ^
 
- and
 
- .Dv $
 
- match the beginning and end of individual lines, respectively,
 
- not beginning and end of file.
 
- .It Dv search
 
- A literal string search starting at the given offset.
 
- The same modifier flags can be used as for string patterns.
 
- The modifier flags (if any) must be followed by
 
- .Dv /number
 
- the range, that is, the number of positions at which the match will be
 
- attempted, starting from the start offset.
 
- This is suitable for
 
- searching larger binary expressions with variable offsets, using
 
- .Dv \e
 
- escapes for special characters.
 
- The offset works as for regex.
 
- .It Dv default
 
- This is intended to be used with the test
 
- .Em x
 
- (which is always true) and a message that is to be used if there are
 
- no other matches.
 
- .El
 
- .Pp
 
- Each top-level magic pattern (see below for an explanation of levels)
 
- is classified as text or binary according to the types used.
 
- Types
 
- .Dq regex
 
- and
 
- .Dq search
 
- are classified as text tests, unless non-printable characters are used
 
- in the pattern.
 
- All other tests are classified as binary.
 
- A top-level
 
- pattern is considered to be a test text when all its patterns are text
 
- patterns; otherwise, it is considered to be a binary pattern.
 
- When
 
- matching a file, binary patterns are tried first; if no match is
 
- found, and the file looks like text, then its encoding is determined
 
- and the text patterns are tried.
 
- .Pp
 
- The numeric types may optionally be followed by
 
- .Dv \*[Am]
 
- and a numeric value,
 
- to specify that the value is to be AND'ed with the
 
- numeric value before any comparisons are done.
 
- Prepending a
 
- .Dv u
 
- to the type indicates that ordered comparisons should be unsigned.
 
- .It Dv test
 
- The value to be compared with the value from the file.
 
- If the type is
 
- numeric, this value
 
- is specified in C form; if it is a string, it is specified as a C string
 
- with the usual escapes permitted (e.g. \en for new-line).
 
- .Pp
 
- Numeric values
 
- may be preceded by a character indicating the operation to be performed.
 
- It may be
 
- .Dv = ,
 
- to specify that the value from the file must equal the specified value,
 
- .Dv \*[Lt] ,
 
- to specify that the value from the file must be less than the specified
 
- value,
 
- .Dv \*[Gt] ,
 
- to specify that the value from the file must be greater than the specified
 
- value,
 
- .Dv \*[Am] ,
 
- to specify that the value from the file must have set all of the bits
 
- that are set in the specified value,
 
- .Dv ^ ,
 
- to specify that the value from the file must have clear any of the bits
 
- that are set in the specified value, or
 
- .Dv ~ ,
 
- the value specified after is negated before tested.
 
- .Dv x ,
 
- to specify that any value will match.
 
- If the character is omitted, it is assumed to be
 
- .Dv = .
 
- Operators
 
- .Dv \*[Am] ,
 
- .Dv ^ ,
 
- and
 
- .Dv ~
 
- don't work with floats and doubles.
 
- The operator
 
- .Dv !\&
 
- specifies that the line matches if the test does
 
- .Em not
 
- succeed.
 
- .Pp
 
- Numeric values are specified in C form; e.g.
 
- .Dv 13
 
- is decimal,
 
- .Dv 013
 
- is octal, and
 
- .Dv 0x13
 
- is hexadecimal.
 
- .Pp
 
- For string values, the string from the
 
- file must match the specified string.
 
- The operators
 
- .Dv = ,
 
- .Dv \*[Lt]
 
- and
 
- .Dv \*[Gt]
 
- (but not
 
- .Dv \*[Am] )
 
- can be applied to strings.
 
- The length used for matching is that of the string argument
 
- in the magic file.
 
- This means that a line can match any non-empty string (usually used to
 
- then print the string), with
 
- .Em \*[Gt]\e0
 
- (because all non-empty strings are greater than the empty string).
 
- .Pp
 
- The special test
 
- .Em x
 
- always evaluates to true.
 
- .It Dv message
 
- The message to be printed if the comparison succeeds.
 
- If the string contains a
 
- .Xr printf 3
 
- format specification, the value from the file (with any specified masking
 
- performed) is printed using the message as the format string.
 
- If the string begins with
 
- .Dq \eb ,
 
- the message printed is the remainder of the string with no whitespace
 
- added before it: multiple matches are normally separated by a single
 
- space.
 
- .El
 
- .Pp
 
- An APPLE 4+4 character APPLE creator and type can be specified as:
 
- .Bd -literal -offset indent
 
- !:apple	CREATYPE
 
- .Ed
 
- .Pp
 
- A MIME type is given on a separate line, which must be the next
 
- non-blank or comment line after the magic line that identifies the
 
- file type, and has the following format:
 
- .Bd -literal -offset indent
 
- !:mime	MIMETYPE
 
- .Ed
 
- .Pp
 
- i.e. the literal string
 
- .Dq !:mime
 
- followed by the MIME type.
 
- .Pp
 
- An optional strength can be supplied on a separate line which refers to
 
- the current magic description using the following format:
 
- .Bd -literal -offset indent
 
- !:strength OP VALUE
 
- .Ed
 
- .Pp
 
- The operand
 
- .Dv OP
 
- can be:
 
- .Dv + ,
 
- .Dv - ,
 
- .Dv * ,
 
- or
 
- .Dv /
 
- and
 
- .Dv VALUE
 
- is a constant between 0 and 255.
 
- This constant is applied using the specified operand
 
- to the currently computed default magic strength.
 
- .Pp
 
- Some file formats contain additional information which is to be printed
 
- along with the file type or need additional tests to determine the true
 
- file type.
 
- These additional tests are introduced by one or more
 
- .Em \*[Gt]
 
- characters preceding the offset.
 
- The number of
 
- .Em \*[Gt]
 
- on the line indicates the level of the test; a line with no
 
- .Em \*[Gt]
 
- at the beginning is considered to be at level 0.
 
- Tests are arranged in a tree-like hierarchy:
 
- if the test on a line at level
 
- .Em n
 
- succeeds, all following tests at level
 
- .Em n+1
 
- are performed, and the messages printed if the tests succeed, until a line
 
- with level
 
- .Em n
 
- (or less) appears.
 
- For more complex files, one can use empty messages to get just the
 
- "if/then" effect, in the following way:
 
- .Bd -literal -offset indent
 
- 0      string   MZ
 
- \*[Gt]0x18  leshort  \*[Lt]0x40   MS-DOS executable
 
- \*[Gt]0x18  leshort  \*[Gt]0x3f   extended PC executable (e.g., MS Windows)
 
- .Ed
 
- .Pp
 
- Offsets do not need to be constant, but can also be read from the file
 
- being examined.
 
- If the first character following the last
 
- .Em \*[Gt]
 
- is a
 
- .Em \&(
 
- then the string after the parenthesis is interpreted as an indirect offset.
 
- That means that the number after the parenthesis is used as an offset in
 
- the file.
 
- The value at that offset is read, and is used again as an offset
 
- in the file.
 
- Indirect offsets are of the form:
 
- .Em (( x [.[bislBISL]][+\-][ y ]) .
 
- The value of
 
- .Em x
 
- is used as an offset in the file.
 
- A byte, id3 length, short or long is read at that offset depending on the
 
- .Em [bislBISLm]
 
- type specifier.
 
- The capitalized types interpret the number as a big endian
 
- value, whereas the small letter versions interpret the number as a little
 
- endian value;
 
- the
 
- .Em m
 
- type interprets the number as a middle endian (PDP-11) value.
 
- To that number the value of
 
- .Em y
 
- is added and the result is used as an offset in the file.
 
- The default type if one is not specified is long.
 
- .Pp
 
- That way variable length structures can be examined:
 
- .Bd -literal -offset indent
 
- # MS Windows executables are also valid MS-DOS executables
 
- 0           string  MZ
 
- \*[Gt]0x18       leshort \*[Lt]0x40   MZ executable (MS-DOS)
 
- # skip the whole block below if it is not an extended executable
 
- \*[Gt]0x18       leshort \*[Gt]0x3f
 
- \*[Gt]\*[Gt](0x3c.l)  string  PE\e0\e0  PE executable (MS-Windows)
 
- \*[Gt]\*[Gt](0x3c.l)  string  LX\e0\e0  LX executable (OS/2)
 
- .Ed
 
- .Pp
 
- This strategy of examining has a drawback: You must make sure that
 
- you eventually print something, or users may get empty output (like, when
 
- there is neither PE\e0\e0 nor LE\e0\e0 in the above example)
 
- .Pp
 
- If this indirect offset cannot be used directly, simple calculations are
 
- possible: appending
 
- .Em [+-*/%\*[Am]|^]number
 
- inside parentheses allows one to modify
 
- the value read from the file before it is used as an offset:
 
- .Bd -literal -offset indent
 
- # MS Windows executables are also valid MS-DOS executables
 
- 0           string  MZ
 
- # sometimes, the value at 0x18 is less that 0x40 but there's still an
 
- # extended executable, simply appended to the file
 
- \*[Gt]0x18       leshort \*[Lt]0x40
 
- \*[Gt]\*[Gt](4.s*512) leshort 0x014c  COFF executable (MS-DOS, DJGPP)
 
- \*[Gt]\*[Gt](4.s*512) leshort !0x014c MZ executable (MS-DOS)
 
- .Ed
 
- .Pp
 
- Sometimes you do not know the exact offset as this depends on the length or
 
- position (when indirection was used before) of preceding fields.
 
- You can specify an offset relative to the end of the last up-level
 
- field using
 
- .Sq \*[Am]
 
- as a prefix to the offset:
 
- .Bd -literal -offset indent
 
- 0           string  MZ
 
- \*[Gt]0x18       leshort \*[Gt]0x3f
 
- \*[Gt]\*[Gt](0x3c.l)  string  PE\e0\e0    PE executable (MS-Windows)
 
- # immediately following the PE signature is the CPU type
 
- \*[Gt]\*[Gt]\*[Gt]\*[Am]0       leshort 0x14c     for Intel 80386
 
- \*[Gt]\*[Gt]\*[Gt]\*[Am]0       leshort 0x184     for DEC Alpha
 
- .Ed
 
- .Pp
 
- Indirect and relative offsets can be combined:
 
- .Bd -literal -offset indent
 
- 0             string  MZ
 
- \*[Gt]0x18         leshort \*[Lt]0x40
 
- \*[Gt]\*[Gt](4.s*512)   leshort !0x014c MZ executable (MS-DOS)
 
- # if it's not COFF, go back 512 bytes and add the offset taken
 
- # from byte 2/3, which is yet another way of finding the start
 
- # of the extended executable
 
- \*[Gt]\*[Gt]\*[Gt]\*[Am](2.s-514) string  LE      LE executable (MS Windows VxD driver)
 
- .Ed
 
- .Pp
 
- Or the other way around:
 
- .Bd -literal -offset indent
 
- 0                 string  MZ
 
- \*[Gt]0x18             leshort \*[Gt]0x3f
 
- \*[Gt]\*[Gt](0x3c.l)        string  LE\e0\e0  LE executable (MS-Windows)
 
- # at offset 0x80 (-4, since relative offsets start at the end
 
- # of the up-level match) inside the LE header, we find the absolute
 
- # offset to the code area, where we look for a specific signature
 
- \*[Gt]\*[Gt]\*[Gt](\*[Am]0x7c.l+0x26) string  UPX     \eb, UPX compressed
 
- .Ed
 
- .Pp
 
- Or even both!
 
- .Bd -literal -offset indent
 
- 0                string  MZ
 
- \*[Gt]0x18            leshort \*[Gt]0x3f
 
- \*[Gt]\*[Gt](0x3c.l)       string  LE\e0\e0 LE executable (MS-Windows)
 
- # at offset 0x58 inside the LE header, we find the relative offset
 
- # to a data area where we look for a specific signature
 
- \*[Gt]\*[Gt]\*[Gt]\*[Am](\*[Am]0x54.l-3)  string  UNACE  \eb, ACE self-extracting archive
 
- .Ed
 
- .Pp
 
- Finally, if you have to deal with offset/length pairs in your file, even the
 
- second value in a parenthesized expression can be taken from the file itself,
 
- using another set of parentheses.
 
- Note that this additional indirect offset is always relative to the
 
- start of the main indirect offset.
 
- .Bd -literal -offset indent
 
- 0                 string       MZ
 
- \*[Gt]0x18             leshort      \*[Gt]0x3f
 
- \*[Gt]\*[Gt](0x3c.l)        string       PE\e0\e0 PE executable (MS-Windows)
 
- # search for the PE section called ".idata"...
 
- \*[Gt]\*[Gt]\*[Gt]\*[Am]0xf4          search/0x140 .idata
 
- # ...and go to the end of it, calculated from start+length;
 
- # these are located 14 and 10 bytes after the section name
 
- \*[Gt]\*[Gt]\*[Gt]\*[Gt](\*[Am]0xe.l+(-4)) string       PK\e3\e4 \eb, ZIP self-extracting archive
 
- .Ed
 
- .Sh SEE ALSO
 
- .Xr file __CSECTION__
 
- \- the command that reads this file.
 
- .Sh BUGS
 
- The formats
 
- .Dv long ,
 
- .Dv belong ,
 
- .Dv lelong ,
 
- .Dv melong ,
 
- .Dv short ,
 
- .Dv beshort ,
 
- .Dv leshort ,
 
- .Dv date ,
 
- .Dv bedate ,
 
- .Dv medate ,
 
- .Dv ledate ,
 
- .Dv beldate ,
 
- .Dv leldate ,
 
- and
 
- .Dv meldate
 
- are system-dependent; perhaps they should be specified as a number
 
- of bytes (2B, 4B, etc),
 
- since the files being recognized typically come from
 
- a system on which the lengths are invariant.
 
- .\"
 
- .\" From: guy@sun.uucp (Guy Harris)
 
- .\" Newsgroups: net.bugs.usg
 
- .\" Subject: /etc/magic's format isn't well documented
 
- .\" Message-ID: <2752@sun.uucp>
 
- .\" Date: 3 Sep 85 08:19:07 GMT
 
- .\" Organization: Sun Microsystems, Inc.
 
- .\" Lines: 136
 
- .\"
 
- .\" Here's a manual page for the format accepted by the "file" made by adding
 
- .\" the changes I posted to the S5R2 version.
 
- .\"
 
- .\" Modified for Ian Darwin's version of the file command.
 
 
  |