Extract text from .rtf and .doc files with PHP and COM
Ever wanted to extract text from a Microsoft Word document with PHP and COM? It’s not so hard as it seems. Create a Word document on your computer and use this piece of code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 | <?php $word = new COM("word.application") or die("Unable to instantiate application object"); $name='c:\\TEST.doc'; try { $word->Documents->Open("$name"); $content = (string) $word->ActiveDocument->Content; $word->ActiveDocument->Close(false); } catch(Exception $e){ $content="Error!"; } echo $content; ?> |
You can even parse a folder and get text from every .doc file in the folder.
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.





September 17th, 2009 at 11:04 am
Thanks for this info!