For the next few weeks, two of the three graphs are effectively down. The teacher-student graph is still up and running. But the local interaction and global interaction graphs are only sparsely populated. Please bear with us.
This was the result of refactoring the text analytics. We segmented the code into four stages, namely (a) statement processing, (b) person identification, (c) html generation and (d) relationship processing. We are also tagging some (English) verbs indicating speech, not just names. And rather than building every graph up front, we build the graph on demand, on page load.
The benefit of this refactoring is that processing the entire Talmudic text takes about 10 minutes rather than 5 hours, and the intermediate results of each stage are stored for inspection and improvement. However, the stage [a] output doesn't play as nicely as it used to with the regular expressions used in stage [d]. Once these are re-aligned, we should restore these graphs and be able to add more complex relationships or hard-to-detect relationships to the graphs.
An example of a relationship that is harder to detect is on today's daf (Zevachim 96b):
אמר ליה בעי מיני מילתא דאיפשיט לך כי מתניתא
Neither the speaker or listener is present in the Aramaic. The English translation has:
“Rami bar Ḥama said to him : Ask me about a matter, which I will resolve for you in accordance with a mishna.”
which has the speaker but not the listener. We can obtain the listener, Rav Yitzḥak bar Yehuda, from context. This functionality should be easier to isolate and focus on when it is in a stand-alone component that runs in 2 minutes, based on output of the previous stages.
This was the result of refactoring the text analytics. We segmented the code into four stages, namely (a) statement processing, (b) person identification, (c) html generation and (d) relationship processing. We are also tagging some (English) verbs indicating speech, not just names. And rather than building every graph up front, we build the graph on demand, on page load.
The benefit of this refactoring is that processing the entire Talmudic text takes about 10 minutes rather than 5 hours, and the intermediate results of each stage are stored for inspection and improvement. However, the stage [a] output doesn't play as nicely as it used to with the regular expressions used in stage [d]. Once these are re-aligned, we should restore these graphs and be able to add more complex relationships or hard-to-detect relationships to the graphs.
An example of a relationship that is harder to detect is on today's daf (Zevachim 96b):
אמר ליה בעי מיני מילתא דאיפשיט לך כי מתניתא
Neither the speaker or listener is present in the Aramaic. The English translation has:
“Rami bar Ḥama said to him : Ask me about a matter, which I will resolve for you in accordance with a mishna.”
which has the speaker but not the listener. We can obtain the listener, Rav Yitzḥak bar Yehuda, from context. This functionality should be easier to isolate and focus on when it is in a stand-alone component that runs in 2 minutes, based on output of the previous stages.
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.