Browse Source

Import upstream version 0.4.15

Adam Hupp 6 years ago
commit
44e931be1a
20 changed files with 843 additions and 0 deletions
  1. 2 0
      .gitignore
  2. 26 0
      .travis.yml
  3. 21 0
      LICENSE
  4. 2 0
      MANIFEST.in
  5. 120 0
      README.md
  6. 0 0
      __init__.py
  7. 301 0
      magic.py
  8. 5 0
      setup.cfg
  9. 34 0
      setup.py
  10. 3 0
      stdeb.cfg
  11. 0 0
      test/__init__.py
  12. 14 0
      test/run.sh
  13. 111 0
      test/test.py
  14. BIN
      test/testdata/keep-going.jpg
  15. 1 0
      test/testdata/lambda
  16. BIN
      test/testdata/magic._pyc_
  17. BIN
      test/testdata/test.gz
  18. 199 0
      test/testdata/test.pdf
  19. 2 0
      test/testdata/text-iso8859-1.txt
  20. 2 0
      test/testdata/text.txt

+ 2 - 0
.gitignore

@@ -0,0 +1,2 @@
+deb_dist
+python_magic.egg-info

+ 26 - 0
.travis.yml

@@ -0,0 +1,26 @@
+language: python
+
+# needed to use trusty
+sudo: required
+
+dist: xenial
+
+python:
+  - "2.6"
+  - "2.7"
+  - "3.3"
+  - "3.4"
+  - "3.5"
+  - "3.6"
+  - "nightly"
+
+install:
+  - pip install coverage
+  - python setup.py install
+
+script:
+  - coverage run setup.py test
+
+after_success:
+  - pip install coveralls && coveralls
+  - pip install codecov && codecov

+ 21 - 0
LICENSE

@@ -0,0 +1,21 @@
+The MIT License (MIT)
+
+Copyright (c) 2001-2014 Adam Hupp
+
+Permission is hereby granted, free of charge, to any person obtaining a copy
+of this software and associated documentation files (the "Software"), to deal
+in the Software without restriction, including without limitation the rights
+to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
+copies of the Software, and to permit persons to whom the Software is
+furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
+AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
+OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
+SOFTWARE.

+ 2 - 0
MANIFEST.in

@@ -0,0 +1,2 @@
+include *.py
+include LICENSE

+ 120 - 0
README.md

@@ -0,0 +1,120 @@
+# python-magic
+[![PyPI version](https://badge.fury.io/py/python-magic.svg)](https://badge.fury.io/py/python-magic)
+[![Build Status](https://travis-ci.org/ahupp/python-magic.svg?branch=master)](https://travis-ci.org/ahupp/python-magic)
+
+python-magic is a python interface to the libmagic file type
+identification library.  libmagic identifies file types by checking
+their headers according to a predefined list of file types. This
+functionality is exposed to the command line by the Unix command
+`file`.
+
+## Usage
+
+```python
+>>> import magic
+>>> magic.from_file("testdata/test.pdf")
+'PDF document, version 1.2'
+>>> magic.from_buffer(open("testdata/test.pdf").read(1024))
+'PDF document, version 1.2'
+>>> magic.from_file("testdata/test.pdf", mime=True)
+'application/pdf'
+```
+
+There is also a `Magic` class that provides more direct control,
+including overriding the magic database file and turning on character
+encoding detection.  This is not recommended for general use.  In
+particular, it's not safe for sharing across multiple threads and
+will fail throw if this is attempted.
+
+```python
+>>> f = magic.Magic(uncompress=True)
+>>> f.from_file('testdata/test.gz')
+'ASCII text (gzip compressed data, was "test", last modified: Sat Jun 28
+21:32:52 2008, from Unix)'
+```
+
+You can also combine the flag options:
+
+```python
+>>> f = magic.Magic(mime=True, uncompress=True)
+>>> f.from_file('testdata/test.gz')
+'text/plain'
+```
+
+## Versioning
+
+Minor version bumps should be backwards compatible.  Major bumps are not.
+
+## Name Conflict
+
+There are, sadly, two libraries which use the module name `magic`.  Both have been around for quite a while.If you are using this module and get an error using a method like `open`, your code is expecting the other one.  Hopefully one day these will be reconciled.
+
+## Installation
+
+The current stable version of python-magic is available on pypi and
+can be installed by running `pip install python-magic`.
+
+Other sources:
+
+- pypi: http://pypi.python.org/pypi/python-magic/
+- github: https://github.com/ahupp/python-magic
+
+### Windows
+
+You'll need DLLs for libmagic.  @julian-r has uploaded a versoin of this project that includes binaries to pypi:
+https://pypi.python.org/pypi/python-magic-bin/0.4.14
+
+Other sources of the libraries in the past have been [File for Windows](http://gnuwin32.sourceforge.net/packages/file.htm) .  You will need to copy the file `magic` out of `[binary-zip]\share\misc`, and pass it's location to `Magic(magic_file=...)`.  
+
+If you are using a 64-bit build of python, you'll need 64-bit libmagic binaries which can be found here: https://github.com/pidydx/libmagicwin64. Newer version can be found here: https://github.com/nscaife/file-windows.
+
+
+
+### OSX
+
+- When using Homebrew: `brew install libmagic`
+- When using macports: `port install file`
+
+### Troubleshooting
+
+- 'MagicException: could not find any magic files!': some
+  installations of libmagic do not correctly point to their magic
+  database file.  Try specifying the path to the file explicitly in the
+  constructor: `magic.Magic(magic_file="path_to_magic_file")`.
+
+- 'WindowsError: [Error 193] %1 is not a valid Win32 application':
+  Attempting to run the 32-bit libmagic DLL in a 64-bit build of
+  python will fail with this error.  Here are 64-bit builds of libmagic for windows: https://github.com/pidydx/libmagicwin64
+
+- 'WindowsError: exception: access violation writing 0x00000000 ' This may indicate you are mixing 
+  Windows Python and Cygwin Python. Make sure your libmagic and python builds are consistent.
+
+## Author
+
+Written by Adam Hupp in 2001 for a project that never got off the
+ground.  It originally used SWIG for the C library bindings, but
+switched to ctypes once that was part of the python standard library.
+
+You can contact me via my [website](http://hupp.org/adam) or
+[github](http://github.com/ahupp).
+
+## Contributors
+
+Thanks to these folks on github who submitted features and bugfixes.
+
+-   Amit Sethi
+-   [bigben87](https://github.com/bigben87)
+-   [fallgesetz](https://github.com/fallgesetz)
+-   [FlaPer87](https://github.com/FlaPer87)
+-   [lukenowak](https://github.com/lukenowak)
+-   NicolasDelaby
+-   sacha@ssl.co.uk
+-   SimpleSeb
+-   [tehmaze](https://github.com/tehmaze)
+
+## License
+
+python-magic is distributed under the MIT license.  See the included
+LICENSE file for details.
+
+

+ 0 - 0
__init__.py


+ 301 - 0
magic.py

@@ -0,0 +1,301 @@
+"""
+magic is a wrapper around the libmagic file identification library.
+
+See README for more information.
+
+Usage:
+
+>>> import magic
+>>> magic.from_file("testdata/test.pdf")
+'PDF document, version 1.2'
+>>> magic.from_file("testdata/test.pdf", mime=True)
+'application/pdf'
+>>> magic.from_buffer(open("testdata/test.pdf").read(1024))
+'PDF document, version 1.2'
+>>>
+
+
+"""
+
+import sys
+import glob
+import os.path
+import ctypes
+import ctypes.util
+import threading
+
+from ctypes import c_char_p, c_int, c_size_t, c_void_p
+
+
+class MagicException(Exception):
+    def __init__(self, message):
+        super(MagicException, self).__init__(message)
+        self.message = message
+
+
+class Magic:
+    """
+    Magic is a wrapper around the libmagic C library.
+
+    """
+
+    def __init__(self, mime=False, magic_file=None, mime_encoding=False,
+                 keep_going=False, uncompress=False):
+        """
+        Create a new libmagic wrapper.
+
+        mime - if True, mimetypes are returned instead of textual descriptions
+        mime_encoding - if True, codec is returned
+        magic_file - use a mime database other than the system default
+        keep_going - don't stop at the first match, keep going
+        uncompress - Try to look inside compressed files.
+        """
+        self.flags = MAGIC_NONE
+        if mime:
+            self.flags |= MAGIC_MIME
+        if mime_encoding:
+            self.flags |= MAGIC_MIME_ENCODING
+        if keep_going:
+            self.flags |= MAGIC_CONTINUE
+
+        if uncompress:
+            self.flags |= MAGIC_COMPRESS
+
+        self.cookie = magic_open(self.flags)
+        self.lock = threading.Lock()
+        
+        magic_load(self.cookie, magic_file)
+
+    def from_buffer(self, buf):
+        """
+        Identify the contents of `buf`
+        """
+        with self.lock:
+            try:
+                # if we're on python3, convert buf to bytes
+                # otherwise this string is passed as wchar*
+                # which is not what libmagic expects
+                if type(buf) == str and str != bytes:
+                   buf = buf.encode('utf-8', errors='replace')
+                return maybe_decode(magic_buffer(self.cookie, buf))
+            except MagicException as e:
+                return self._handle509Bug(e)
+
+    def from_file(self, filename):
+        # raise FileNotFoundException or IOError if the file does not exist
+        with open(filename):
+            pass
+        with self.lock:
+            try:
+                return maybe_decode(magic_file(self.cookie, filename))
+            except MagicException as e:
+                return self._handle509Bug(e)
+
+    def _handle509Bug(self, e):
+        # libmagic 5.09 has a bug where it might fail to identify the
+        # mimetype of a file and returns null from magic_file (and
+        # likely _buffer), but also does not return an error message.
+        if e.message is None and (self.flags & MAGIC_MIME):
+            return "application/octet-stream"
+        else:
+            raise e
+        
+    def __del__(self):
+        # no _thread_check here because there can be no other
+        # references to this object at this point.
+
+        # during shutdown magic_close may have been cleared already so
+        # make sure it exists before using it.
+
+        # the self.cookie check should be unnecessary and was an
+        # incorrect fix for a threading problem, however I'm leaving
+        # it in because it's harmless and I'm slightly afraid to
+        # remove it.
+        if self.cookie and magic_close:
+            magic_close(self.cookie)
+            self.cookie = None
+
+_instances = {}
+
+def _get_magic_type(mime):
+    i = _instances.get(mime)
+    if i is None:
+        i = _instances[mime] = Magic(mime=mime)
+    return i
+
+def from_file(filename, mime=False):
+    """"
+    Accepts a filename and returns the detected filetype.  Return
+    value is the mimetype if mime=True, otherwise a human readable
+    name.
+
+    >>> magic.from_file("testdata/test.pdf", mime=True)
+    'application/pdf'
+    """
+    m = _get_magic_type(mime)
+    return m.from_file(filename)
+
+def from_buffer(buffer, mime=False):
+    """
+    Accepts a binary string and returns the detected filetype.  Return
+    value is the mimetype if mime=True, otherwise a human readable
+    name.
+
+    >>> magic.from_buffer(open("testdata/test.pdf").read(1024))
+    'PDF document, version 1.2'
+    """
+    m = _get_magic_type(mime)
+    return m.from_buffer(buffer)
+
+
+
+
+libmagic = None
+# Let's try to find magic or magic1
+dll = ctypes.util.find_library('magic') or ctypes.util.find_library('magic1') or ctypes.util.find_library('cygmagic-1')
+
+# This is necessary because find_library returns None if it doesn't find the library
+if dll:
+    libmagic = ctypes.CDLL(dll)
+
+if not libmagic or not libmagic._name:
+    windows_dlls = ['magic1.dll','cygmagic-1.dll']
+    platform_to_lib = {'darwin': ['/opt/local/lib/libmagic.dylib',
+                                  '/usr/local/lib/libmagic.dylib'] +
+                         # Assumes there will only be one version installed
+                         glob.glob('/usr/local/Cellar/libmagic/*/lib/libmagic.dylib'),
+                       'win32': windows_dlls,
+                       'cygwin': windows_dlls,
+                       'linux': ['libmagic.so.1'],    # fallback for some Linuxes (e.g. Alpine) where library search does not work
+                      }
+    platform = 'linux' if sys.platform.startswith('linux') else sys.platform
+    for dll in platform_to_lib.get(platform, []):
+        try:
+            libmagic = ctypes.CDLL(dll)
+            break
+        except OSError:
+            pass
+
+if not libmagic or not libmagic._name:
+    # It is better to raise an ImportError since we are importing magic module
+    raise ImportError('failed to find libmagic.  Check your installation')
+
+magic_t = ctypes.c_void_p
+
+def errorcheck_null(result, func, args):
+    if result is None:
+        err = magic_error(args[0])
+        raise MagicException(err)
+    else:
+        return result
+
+def errorcheck_negative_one(result, func, args):
+    if result is -1:
+        err = magic_error(args[0])
+        raise MagicException(err)
+    else:
+        return result
+
+
+# return str on python3.  Don't want to unconditionally
+# decode because that results in unicode on python2
+def maybe_decode(s):
+    if str == bytes:
+        return s
+    else:
+        return s.decode('utf-8')
+    
+def coerce_filename(filename):
+    if filename is None:
+        return None
+
+    # ctypes will implicitly convert unicode strings to bytes with
+    # .encode('ascii').  If you use the filesystem encoding 
+    # then you'll get inconsistent behavior (crashes) depending on the user's
+    # LANG environment variable
+    is_unicode = (sys.version_info[0] <= 2 and
+                  isinstance(filename, unicode)) or \
+                  (sys.version_info[0] >= 3 and
+                   isinstance(filename, str))
+    if is_unicode:
+        return filename.encode('utf-8', 'surrogateescape')
+    else:
+        return filename
+
+magic_open = libmagic.magic_open
+magic_open.restype = magic_t
+magic_open.argtypes = [c_int]
+
+magic_close = libmagic.magic_close
+magic_close.restype = None
+magic_close.argtypes = [magic_t]
+
+magic_error = libmagic.magic_error
+magic_error.restype = c_char_p
+magic_error.argtypes = [magic_t]
+
+magic_errno = libmagic.magic_errno
+magic_errno.restype = c_int
+magic_errno.argtypes = [magic_t]
+
+_magic_file = libmagic.magic_file
+_magic_file.restype = c_char_p
+_magic_file.argtypes = [magic_t, c_char_p]
+_magic_file.errcheck = errorcheck_null
+
+def magic_file(cookie, filename):
+    return _magic_file(cookie, coerce_filename(filename))
+
+_magic_buffer = libmagic.magic_buffer
+_magic_buffer.restype = c_char_p
+_magic_buffer.argtypes = [magic_t, c_void_p, c_size_t]
+_magic_buffer.errcheck = errorcheck_null
+
+def magic_buffer(cookie, buf):
+    return _magic_buffer(cookie, buf, len(buf))
+
+
+_magic_load = libmagic.magic_load
+_magic_load.restype = c_int
+_magic_load.argtypes = [magic_t, c_char_p]
+_magic_load.errcheck = errorcheck_negative_one
+
+def magic_load(cookie, filename):
+    return _magic_load(cookie, coerce_filename(filename))
+
+magic_setflags = libmagic.magic_setflags
+magic_setflags.restype = c_int
+magic_setflags.argtypes = [magic_t, c_int]
+
+magic_check = libmagic.magic_check
+magic_check.restype = c_int
+magic_check.argtypes = [magic_t, c_char_p]
+
+magic_compile = libmagic.magic_compile
+magic_compile.restype = c_int
+magic_compile.argtypes = [magic_t, c_char_p]
+
+
+
+MAGIC_NONE = 0x000000 # No flags
+MAGIC_DEBUG = 0x000001 # Turn on debugging
+MAGIC_SYMLINK = 0x000002 # Follow symlinks
+MAGIC_COMPRESS = 0x000004 # Check inside compressed files
+MAGIC_DEVICES = 0x000008 # Look at the contents of devices
+MAGIC_MIME = 0x000010 # Return a mime string
+MAGIC_MIME_ENCODING = 0x000400 # Return the MIME encoding
+MAGIC_CONTINUE = 0x000020 # Return all matches
+MAGIC_CHECK = 0x000040 # Print warnings to stderr
+MAGIC_PRESERVE_ATIME = 0x000080 # Restore access time on exit
+MAGIC_RAW = 0x000100 # Don't translate unprintable chars
+MAGIC_ERROR = 0x000200 # Handle ENOENT etc as real errors
+
+MAGIC_NO_CHECK_COMPRESS = 0x001000 # Don't check for compressed files
+MAGIC_NO_CHECK_TAR = 0x002000 # Don't check for tar files
+MAGIC_NO_CHECK_SOFT = 0x004000 # Don't check magic entries
+MAGIC_NO_CHECK_APPTYPE = 0x008000 # Don't check application type
+MAGIC_NO_CHECK_ELF = 0x010000 # Don't check for elf details
+MAGIC_NO_CHECK_ASCII = 0x020000 # Don't check for ascii files
+MAGIC_NO_CHECK_TROFF = 0x040000 # Don't check ascii/troff
+MAGIC_NO_CHECK_FORTRAN = 0x080000 # Don't check ascii/fortran
+MAGIC_NO_CHECK_TOKENS = 0x100000 # Don't check ascii/tokens

+ 5 - 0
setup.cfg

@@ -0,0 +1,5 @@
+[global]
+command_packages=stdeb.command
+
+[bdist_wheel]
+universal = 1

+ 34 - 0
setup.py

@@ -0,0 +1,34 @@
+#!/usr/bin/env python
+# -*- coding: utf-8 -*-
+
+from setuptools import setup
+
+setup(name='python-magic',
+      description='File type identification using libmagic',
+      author='Adam Hupp',
+      author_email='adam@hupp.org',
+      url="http://github.com/ahupp/python-magic",
+      version='0.4.15',
+      py_modules=['magic'],
+      long_description="""This module uses ctypes to access the libmagic file type
+identification library.  It makes use of the local magic database and
+supports both textual and MIME-type output.
+""",
+      keywords="mime magic file",
+      license="MIT",
+      test_suite='test',
+      classifiers=[
+          'Intended Audience :: Developers',
+          'License :: OSI Approved :: MIT License',
+          'Programming Language :: Python',
+          'Programming Language :: Python :: 2',
+          'Programming Language :: Python :: 2.6',
+          'Programming Language :: Python :: 2.7',
+          'Programming Language :: Python :: 3',
+          'Programming Language :: Python :: 3.3',
+          'Programming Language :: Python :: 3.4',
+          'Programming Language :: Python :: 3.5',
+          'Programming Language :: Python :: 3.6',
+          'Programming Language :: Python :: Implementation :: CPython',
+      ],
+      )

+ 3 - 0
stdeb.cfg

@@ -0,0 +1,3 @@
+[python-magic]
+Depends: libmagic1
+Conflicts: python-magic

+ 0 - 0
test/__init__.py


+ 14 - 0
test/run.sh

@@ -0,0 +1,14 @@
+#!/bin/sh
+
+
+# ensure we can use unicode filenames in the test
+export LC_ALL=en_US.UTF-8
+THISDIR=`dirname $0`
+export PYTHONPATH=${THISDIR}/..
+
+echo "python2.6"
+python2.6 ${THISDIR}/test.py
+echo "python2.7"
+python2.7 ${THISDIR}/test.py
+echo "python3.0"
+python3 ${THISDIR}/test.py

+ 111 - 0
test/test.py

@@ -0,0 +1,111 @@
+import os, sys
+# for output which reports a local time
+os.environ['TZ'] = 'GMT'
+import shutil
+import os.path
+import unittest
+
+import magic
+
+class MagicTest(unittest.TestCase):
+    TESTDATA_DIR = os.path.join(os.path.dirname(__file__), 'testdata')
+
+    def assert_values(self, m, expected_values):
+        for filename, expected_value in expected_values.items():
+            try:
+                filename = os.path.join(self.TESTDATA_DIR, filename)
+            except TypeError:
+                filename = os.path.join(self.TESTDATA_DIR.encode('utf-8'), filename)
+
+            
+            if type(expected_value) is not tuple:
+                expected_value = (expected_value,)
+
+            for i in expected_value:
+                with open(filename, 'rb') as f:
+                    buf_value = m.from_buffer(f.read())
+
+                file_value = m.from_file(filename)
+                if buf_value == i and file_value == i:
+                    break
+            else:
+                self.assertTrue(False, "no match for " + repr(expected_value))
+
+    def test_from_buffer_str_and_bytes(self):
+        m = magic.Magic(mime=True)
+        s = '#!/usr/bin/env python\nprint("foo")'
+        self.assertEqual("text/x-python", m.from_buffer(s))
+        b = b'#!/usr/bin/env python\nprint("foo")'
+        self.assertEqual("text/x-python", m.from_buffer(b))
+                
+    def test_mime_types(self):
+        dest = os.path.join(MagicTest.TESTDATA_DIR, b'\xce\xbb'.decode('utf-8'))
+        shutil.copyfile(os.path.join(MagicTest.TESTDATA_DIR, 'lambda'), dest)
+        try:
+            m = magic.Magic(mime=True)
+            self.assert_values(m, {
+                'magic._pyc_': 'application/octet-stream',
+                'test.pdf': 'application/pdf',
+                'test.gz': 'application/gzip',
+                'text.txt': 'text/plain',
+                b'\xce\xbb'.decode('utf-8'): 'text/plain',
+                b'\xce\xbb': 'text/plain',
+            })
+        finally:
+            os.unlink(dest)
+
+    def test_descriptions(self):
+        m = magic.Magic()
+        os.environ['TZ'] = 'UTC'  # To get the last modified date of test.gz in UTC
+        try:
+            self.assert_values(m, {
+                'magic._pyc_': 'python 2.4 byte-compiled',
+                'test.pdf': 'PDF document, version 1.2',
+                'test.gz':
+                ('gzip compressed data, was "test", from Unix, last modified: Sun Jun 29 01:32:52 2008',
+                 'gzip compressed data, was "test", last modified: Sun Jun 29 01:32:52 2008, from Unix'),
+                'text.txt': 'ASCII text',
+            })
+        finally:
+            del os.environ['TZ']
+
+    def test_mime_encodings(self):
+        m = magic.Magic(mime_encoding=True)
+        self.assert_values(m, {
+            'text-iso8859-1.txt': 'iso-8859-1',
+            'text.txt': 'us-ascii',
+        })
+
+    def test_errors(self):
+        m = magic.Magic()
+        self.assertRaises(IOError, m.from_file, 'nonexistent')
+        self.assertRaises(magic.MagicException, magic.Magic,
+                          magic_file='nonexistent')
+        os.environ['MAGIC'] = 'nonexistent'
+        try:
+            self.assertRaises(magic.MagicException, magic.Magic)
+        finally:
+            del os.environ['MAGIC']
+
+    def test_keep_going(self):
+        filename = os.path.join(self.TESTDATA_DIR, 'keep-going.jpg')
+
+        m = magic.Magic(mime=True)
+        self.assertEqual(m.from_file(filename), 'image/jpeg')
+        
+        m = magic.Magic(mime=True, keep_going=True)
+        self.assertEqual(m.from_file(filename), 'image/jpeg')
+
+
+    def test_rethrow(self):
+        old = magic.magic_buffer
+        try:
+            def t(x,y):
+                raise magic.MagicException("passthrough")
+            magic.magic_buffer = t
+            
+            self.assertRaises(magic.MagicException, magic.from_buffer, "hello", True)
+        finally:
+            magic.magic_buffer = old
+if __name__ == '__main__':
+    unittest.main()

BIN
test/testdata/keep-going.jpg


+ 1 - 0
test/testdata/lambda

@@ -0,0 +1 @@
+test

BIN
test/testdata/magic._pyc_


BIN
test/testdata/test.gz


+ 199 - 0
test/testdata/test.pdf

@@ -0,0 +1,199 @@
+%PDF-1.2
+7 0 obj
+[5 0 R/XYZ 111.6 757.86]
+endobj
+13 0 obj
+<<
+/Title(About this document)
+/A<<
+/S/GoTo
+/D(subsection.1.1)
+>>
+/Parent 12 0 R
+/Next 14 0 R
+>>
+endobj
+15 0 obj
+<<
+/Title(Compiling with GHC)
+/A<<
+/S/GoTo
+/D(subsubsection.1.2.1)
+>>
+/Parent 14 0 R
+/Next 16 0 R
+>>
+endobj
+16 0 obj
+<<
+/Title(Compiling with Hugs)
+/A<<
+/S/GoTo
+/D(subsubsection.1.2.2)
+>>
+/Parent 14 0 R
+/Prev 15 0 R
+>>
+endobj
+14 0 obj
+<<
+/Title(Compatibility)
+/A<<
+/S/GoTo
+/D(subsection.1.2)
+>>
+/Parent 12 0 R
+/Prev 13 0 R
+/First 15 0 R
+/Last 16 0 R
+/Count -2
+/Next 17 0 R
+>>
+endobj
+17 0 obj
+<<
+/Title(Reporting bugs)
+/A<<
+/S/GoTo
+/D(subsection.1.3)
+>>
+/Parent 12 0 R
+/Prev 14 0 R
+/Next 18 0 R
+>>
+endobj
+18 0 obj
+<<
+/Title(History)
+/A<<
+/S/GoTo
+/D(subsection.1.4)
+>>
+/Parent 12 0 R
+/Prev 17 0 R
+/Next 19 0 R
+>>
+endobj
+19 0 obj
+<<
+/Title(License)
+/A<<
+/S/GoTo
+/D(subsection.1.5)
+>>
+/Parent 12 0 R
+/Prev 18 0 R
+>>
+endobj
+12 0 obj
+<<
+/Title(Introduction)
+/A<<
+/S/GoTo
+/D(section.1)
+>>
+/Parent 11 0 R
+/First 13 0 R
+/Last 19 0 R
+/Count -5
+/Next 20 0 R
+>>
+endobj
+21 0 obj
+<<
+/Title(Running a parser)
+/A<<
+/S/GoTo
+/D(subsection.2.1)
+>>
+/Parent 20 0 R
+/Next 22 0 R
+>>
+endobj
+22 0 obj
+<<
+/Title(Sequence and choice)
+/A<<
+/S/GoTo
+/D(subsection.2.2)
+>>
+/Parent 20 0 R
+/Prev 21 0 R
+/Next 23 0 R
+>>
+endobj
+23 0 obj
+<<
+/Title(Predictive parsers)
+/A<<
+/S/GoTo
+/D(subsection.2.3)
+>>
+/Parent 20 0 R
+/Prev 22 0 R
+/Next 24 0 R
+>>
+endobj
+24 0 obj
+<<
+/Title(Adding semantics)
+/A<<
+/S/GoTo
+/D(subsection.2.4)
+>>
+/Parent 20 0 R
+/Prev 23 0 R
+/Next 25 0 R
+>>
+endobj
+25 0 obj
+<<
+/Title(Sequences and seperators)
+/A<<
+/S/GoTo
+/D(subsection.2.5)
+>>
+/Parent 20 0 R
+/Prev 24 0 R
+/Next 26 0 R
+>>
+endobj
+26 0 obj
+<<
+/Title(Improving error messages)
+/A<<
+/S/GoTo
+/D(subsection.2.6)
+>>
+/Parent 20 0 R
+/Prev 25 0 R
+/Next 27 0 R
+>>
+endobj
+27 0 obj
+<<
+/Title(Expressions)
+/A<<
+/S/GoTo
+/D(subsection.2.7)
+>>
+/Parent 20 0 R
+/Prev 26 0 R
+/Next 28 0 R
+>>
+endobj
+28 0 obj
+<<
+/Title(Lexical analysis)
+/A<<
+/S/GoTo
+/D(subsection.2.8)
+>>
+/Parent 20 0 R
+/Prev 27 0 R
+/Next 29 0 R
+>>
+endobj
+30 0 obj
+<<
+/Title(Lexeme parsers

+ 2 - 0
test/testdata/text-iso8859-1.txt

@@ -0,0 +1,2 @@
+This is a web page encoded in iso-8859-1
+éèàùôâïî

+ 2 - 0
test/testdata/text.txt

@@ -0,0 +1,2 @@
+Hello, World!
+