Textual cohesion addresses the local connections in a text based primarily on features that signal relationships between constituent elements (words or sentences)" (Dascalu, 2013, p. 28). Cohesion is one of the core functions of ReaderBench, used either for both input (read texts) and output (learner's productions).
Cohesion is function of three main parameters:
the inverse normalized distance between textual elements
the lexical proximity given by ontologies, like WordNet
the semantic similarity through both LSA (Latent Semantic Analysis) and LDA (Latent Dirichlet Allocation)
The cohesion score is computed as follows:
each sentence is initially assigned an individual score equal to the normalized term frequency of each concept, multiplied by its relevance that is assigned globally during the topics identification process (measuring to what extent each sentence conveys the main concepts of the overall conversation);
afterwards, at block level (utterance or paragraph), individual sentence scores are weighted by cohesion measures and summed up in order to define the inner-block score.
going further into the discourse decomposition model (document > block > sentence), inter-block cohesive links are used to augment the previous inner-block scores, by also considering all block-document similarities as a weighting factor of block importance. Moreover, as it would have been a discrepancy in the evaluation in terms of the first and the last sentence of each block for which there were no previous or next adjacency links within the current block, their corresponding scores are increased through the cohesive link enforced to the previous, respectively next block;
in the end, all block scores are combined at document level by using the block-document hierarchical link’s cohesion as weight, in order to determine the overall score of the reading material or of the conversation.
This paragraph is from Crossley et al. (2017). RB generates a cohesion graph using cohesion values to determine connections between discourse elements. The cohesion graph is a multi-layered structure containing different nodes (Dascalu, 2014) and the links between them. A central node, representing the conversation’s thread, is divided into contributions, which are further divided into sentences and words. Links are then built between nodes in order to determine a cohesion score that denotes the relevance of a contribution within the conversation, or the impact of a word within a sentence or contribution. Other links are generated between adjacent contributions, which are used to determine changes in the topics or of the conversation’s thread. These changes are reflected by cohesion gaps between units of texts. Explicit links, created using an interface functionality such as the “reply-to” option, are contained within the cohesion graph as well. In addition, cohesive links determined using semantic similarity techniques are added between related contributions within a timeframe of maximum 20 successive contributions, which can be considered the maximum span for these type of cohesive links (Rebedea, 2012).
Crossley, S. A., Dascalu, M., McNamara, D. S., Baker, R., & Trausan-Matu, S. (2017). Predicting Success in Massive Open Online Courses (MOOCs) Using Cohesion Network Analysis. 12th Int. Conf. on Computer-Supported Collaborative Learning (CSCL 2017). Philadelphia, PA.: ISLS.
Dascalu, M. (2014). Analysing discourse and text complexity for learning and collaborating. New York: Springer.
Rebedea, T. (2012). Computer-Based Support and Feedback for Collaborative Chat Conversations and Discussion
Forums. (Doctoral dissertation), University Politehnica of Bucharest, Bucharest, Romania.