This is an old revision of the document!
The textual data come from open-source digitalized Nestle 1904 GNT edition, available in a CSV format via GitHub from here. These data have an ideal form for an analysis, as there are columns corresponding to such things as book:verse, word as found in the text, POStag and lemma etc.
For the interactive 3D network vizualization below, I used oly a subsellection of lemmata from the source text, i.e. only semantically crucial words like substantives, verbs, adjectives and adverbs in thir basic words (so,e.g., instead of analyzing “see”, “sees”, “saw” or “seeing” independently, I use only “see”). In languages like Greek, Latin, or Czech this represents one of the main challenges for text vizualization.
Network Formation Method
The newtwork data in background have been formed on the basis of an algorithm formerly developed for analysis of collocations, relaying on the python NLTK library. Here I use it in a modified form: The algorithm goes through the whole raw text (a list of extracted words in their basic form) from the beginning to the end. One by one, in each step it focuses only one three neighbouring words and, if there is not already present, forms a link between each of them. If the link between a pairs of words already exists, it makes this link stronger by one (i.e. modifies weight of the link, to use network analysis terminology). As a results, it produces a list of pairs of words characterized by how often they occur close to each other (in the neighbourhood of three; the scope of the neighbourhood window is easily made broader in the code, if needed. But the scope of three seems to work quite fine so far). The produced list is then used to form a network available for standard network analysis metrices by using other python packages, like iGraph package (rather minimalistic tool, but easily communicating with plotly) or https://networkx.github.io/ (an advanced network analysis tool).
The vizualization of the network is independent of the network as such and relies on its own algorithms. It has been formed by using the python plotly library . Despite the fact it is not suitable to cover all relevant features of the network data in background, it is still useful as it helps to explore visually the basic structure of the network.
Regrettably, by a coincidence, I probably lost the code as my computer crashed just in the time when I was (unsuccessfully) experimenting with a new backup method. However, I still have the visualizations and (hopefully) the skills to write it again. My ongoing attempt to reproduce the scripts is in this GitHub repository.