Categories

Updated: 04/30/2006

Simple method for indexing MS Word documents

Building indexers/spiders that can read binary MS Word (.doc) documents can be difficult, expecially on *nix servers, which don''t support PHP''s COM abilities. Solutions usually involve installing binaries on the server (often impossible or disallowed). This simple PHP snippet makes a pretty good job of extracting text from an MS Word document for use in a search index. While not pretending to be perfect, it has proved itself useful on thousands of test documents.
©2003-2019 jCay.com