Text mining for discourse processing
Text mining is the extraction of knowledge from text. The common ground to text mining approaches is that they have to be applicable to large sets of texts written in natural language.
In this work we are interested in the discourse level: how propositions, sentences, and groups of sentences are organised through to a coherent whole. Sentiment analysis permits to classify a text as being positive or negative. By approaching text mining with the discourse level, we will bring a new explanatory dimension. For example, we could explain which are the main reasons why such opinion is positive or negative. In a biomedical context, we could also provide information about the sequence of symptoms that lead to a particular illness.
Discourse formalisms and theories constitute a framework to describe the document above sentence level. Natural language processing and data mining recent advances will allow us to compute the discourse representation of documents as graphs, and thus apply complex data mining technics to infer new knowledge from it.