There's a real lack of decent quality text processing software in PHP, but we think ours is pretty good. It uses pure UTF-8 (even if the incoming text isn't UTF-8 yet), can calculate highly accurate readability scores, and extract all the interesting (human readable, sentence-form) text from HTML. Additionally, it can extract meaningful terms from text, but this part needs the most work so stay tuned for version 0.2 which should fix most of the inaccuracies and errors in the Term Extractor.
textlib version 0.2
We also provide additional data about Daylife Sources at no charge. You can currently get it in a gzipped SQL file. XML and CSV versions are coming soon.