AGHParser is a HTML content extractor. It is completely modular, hence you are not required to use any confusing regular expression. You give it the name of the html file or the 'dirty' file stored in char array and you get the output in char array, so, no need to parse a XML file again (I have seen some HTML parsing libraries that put their output in XML files).
So, back to where we were. To this library you give a tag to extract content from and you can also specify tags that you want to be removed from the output. Since, XML is generalized but simplified version of HTML, hence I think AGHParser can be used to parse XML files too (though it has not been tested).
How it started
Have you ever tried using the query 'define:Ogre' or any such queries where you preceed a word with the word 'define:' (note there is no gap between ':' and the 'word'). If you haven't then try http://www.google.com/search?q=define:Ogre now!
You will get well formatted meanings of words alongwith link to sources of the definition and also related phrases. The definitions are very accurate. So much so, that I fire up my browser everytime to check for meanings of words on Google. So I thought why not create a desktop application, a dictionary, which queries for any word given to it, to Google, using the url of the form www.google.com/search?q=define:TheWord. Try it; it works! You can use similar queries to search in Google Images and videos too! So the concept was to query Google.
Anyway, the first thing I needed was a html parser. I knew that many such libraries might already exist, but for the sake of learning I decided to create my own. After 3 days of coding and 2 days of debugging and 1 day for preparing this package; I am tired. Also due to my busy schedule now I think I can no longer make that software. That's why I have created this package and released this library unde GPL. The accompaining example program (GDefineParser) is a sort of command-line version of that dictionary. It's coded fast and dirty (it is also under GPL). So, please make the dictionary software (I will of course help if I am in a position to do so). ;-)
NOTE if you compile the example program in gcc then it may sometimes crash silently without giving any output. It works fine when compiled in MS VC++6.0 or MS VC++ 2005. I don't know why. I you know then please do inform me. :-)