The way ReaderBench extracts topics from a text is by computing a linear combination between three factors:
individual normalized term frequency of each term.
semantic similarities through the cohesion function (average of LSA cosine similarity and inverse of LDA Jensen-Shannon divergence) between the term and the whole document for ensuring global resemblance and significance
a weighted similarity with the corresponding semantic chain multiplied by the importance of the chain (similarity between the chain and the entire document).
In the end this score is added to the overall concept relevance (besides all other factors)
Voice extraction in forum or chat discussions is similar to topic extraction. It is computed as the sum of sentences in which a least a concept form that semantic chain occurs.
And this sentence score actually takes into consideration the previous relevance scores as the sentence score is computed as the coverage of document topics. In order to avoid a double dependency between topic relevance scoring and the voice's importance score.