Given a HTML document, it pulls out the main body text and cleans it up.
