This section describes the main features of the document flow processing (Crossley et al., 2016).
As Crossley et al. (2016, p. 765) put it: cohesion flow is "[…] a measure of a document’s structure derived from the order of different paragraphs and of the manner in which they combine to hold the text together. A text that demonstrates strong cohesion flow by linking ideas between paragraphs will likely be a more coherent text. This would allow ideas to bond together and flow smoothly from one paragraph to another, creating a text that readers can more easily comprehend. In addition, the ability to illustrate and automatically assess cohesion flow would enable researchers to observe how text segments in a document fit together and examine how the sequencing of these segments may affect readers’ comprehension."
ReaderBench analyses the cohesion flow between paragraph rather than sentences', using an aggregated mesure of cohesion (through LSA or LDA). The following measures are computed (Crossley et al., 2016, p. 766):
Absolute position accuracy: number of paragraphs that, after performing the topological sort on the cohesion flow graph, are in the correct position (the ordered paragraph index is the same as the initial index).
Absolute distance accuracy: the absolute value of the difference of ordered and initial paragraph indexes. A value closer to 0 characterizes a more cohesive text in terms of adjacency links.
Adjacency accuracy determines how many paragraphs follow the idea of adjacency maximum flow: sum of absolute values of (j-i-1) where cohesion[i, j]>0.
Average flow cohesion is determined as the average cohesion in our cohesion flow graph, i.e., average of all cohesion[i, j];
Spearman correlation between the ordered paragraph index and the initial sequence index.
Max order sequence determines how many ordered paragraph indices follow an increasing trend to determine if flow moves forward in a document (i.e., what is the longest sequence that follows an ascending trend).
Edges are proportional to the inverse semantic relatedness of paragraphs, while elasticity coefficients are used to obtain a more realistic visualization that minimizes edge crossings and the overall network energy. The size of each paragraph can be proportional to its betweenness centrality (i.e., the number of times it acts as a bridge along all shortest paths between pairs of two other paragraphs from the input text). (Crossley et al., 2016, p. 766)
Crossley, S. A., Dascalu, M., Trausan-Matu, S., Allen, L., & McNamara, D. S. (2016). Document Cohesion Flow: Striving towards Coherence. In 38th Annual Meeting of the Cognitive Science Society (pp. 764–769). Philadelphia, PA: Cognitive Science Society.