statistics 1.9 KB

123456789101112131415161718192021222324252627282930313233343536373839404142434445
  1. #------------------------------------------------------------------------------
  2. # $File: statistics,v 1.2 2020/10/08 17:51:53 christos Exp $
  3. # statistics: file(1) magic for statistics related software
  4. #
  5. # From Remy Rampin
  6. # Stata is a statistical software tool that was created in 1985. While I
  7. # don't personally use it, data files in its native (proprietary) format
  8. # are common (.dta files).
  9. #
  10. # Because they are so common, especially in statistical and social
  11. # sciences, Stata files and SPSS files can be opened by a lot of modern
  12. # software, for example Python's pandas package provides built-in
  13. # support for them (read_stata() and read_spss()).
  14. #
  15. # I noticed that the magic database includes an entry for SPSS files but
  16. # not Stata files. Stata files for Stata 13 and newer (formats 117, 118,
  17. # and 119) always begin with the string "<stata_dta><header>" as per
  18. # https://www.stata.com/help.cgi?dta#definition
  19. #
  20. # The format version number always follows, for example:
  21. # <stata_dta><header><release>117</release>
  22. # <stata_dta><header><release>118</release>
  23. #
  24. # Therefore the following line would do the trick:
  25. # 0 string <stata_dta><header> Stata Data File
  26. #
  27. # (I'm sure the version number could be captured as well but I did not
  28. # manage this without a regex)
  29. #
  30. # Unfortunately the previous formats (created by Stata before 13, which
  31. # was released 2013) are harder to recognize. Format 115 starts with the
  32. # four bytes 0x73010100 or 0x73020100, format 114 with 0x72010100 or
  33. # 0x72020100, format 113 with 0x71010101 or 0x71020101.
  34. #
  35. # For additional reference, the Library of Congress website has an entry
  36. # for the Stata Data File Format 118:
  37. # https://www.loc.gov/preservation/digital/formats/fdd/fdd000471.shtml
  38. #
  39. # Example of those files can be found on Zenodo:
  40. # https://zenodo.org/search?page=1&size=20&q=&file_type=dta
  41. 0 string \<stata_dta\>\<header\>\<release\> Stata Data File
  42. >&0 regex [0-9]* (Release %s)