AI will uncover the mystery of the Vatican's confidential archives
The Vatican Secret Archives is listed as one of the world's top ten forbidden places. It is the archival custodian of the Pope and the most abundant and oldest archive in the European church.
Many of its collections have never been transcribed, even though church archivists know nothing about the hidden secrets. However, the machine vision system will unveil the mystery of medieval texts.
The Vatican Confidential Archive is quite legendary. It is said that the private letters and other documents of the previous Pope preserved in the museum can be traced back to the 8th century AD and can be extended to 85 kilometers.
The museum is heavily guarded. Since 1881, the documents that scholars have come into contact with are extremely limited, but the amount of information is considerable.
For example, a 60-meter-long parchment is full of trial confessions to the French Templars. This trial lasted for several years from the beginning of 1307. Among these letters are Michelangelo's manuscript, the application for the revocation of marriage by King Henry VIII, and the love letter before the beheading of Queen Mary of Scotland.
In addition, the archives contain shorter communication documents. For example, during the American Civil War, Abraham Lincoln and Jefferson Davis sent letters to try to convince Pope Pius IX to support their respective camps - the Northern Confederation and the Confederacy. Also during the Second World War, the correspondence between the Pope and the Nazi regime was never published. In fact, all files after 1939 are completely confidential.
Although these documents are forbidden to publish, the archives have an image backup and file protection studio. Like many other historical archives, they have begun to image backup files for scholars to study in depth.
However, the archives are too large, and it is impossible to complete the manual copying and backup. So, does machine vision technology work?
Fortunately, Donatella Firmani of the University of Rome, Italy, and his colleagues launched the "In Codice Ratio" project to develop systems that automatically transcribe Vatican confidential documents (called Vatican Registers). .
The corpus contains 18,000 pages of official letters from the 13th century, covering a wide range of topics, from Catholicism to kings and queens, from politics to religion, across all areas of Europe. Firmani and his team said: "These documents have never been transcribed before, so the historical significance is unprecedented."
The particularity of medieval texts presents many challenges to machine vision technology. Due to the different styles of manuscripts, there are continuous strokes (connecting adjacent letters into one stroke) and special abbreviations, so traditional visual recognition algorithms are not competent for transcription work.
To solve this problem, scholars have developed computer vision systems that recognize the entire word (not just letters). However, the effect is still not ideal. Most words appear only a few times in a long document, so it is difficult to create a data set that meets the needs of machine learning.
Today, Firmani and his team have invented a new way to train text visual recognition systems: split words into strokes and combine strokes like puzzles. They said: "We want to develop a mature system that can transcribe as many manuscripts as possible."
After the system splits the words into strokes, it tries to combine the strokes into letters, analyzes all possible combinations, and finally eliminates all combinations that do not conform to the grammar.
For example, strokes can usually be combined into "iii" and "m", excluding the former due to grammatical errors. The same stroke combination also has "in" or "ni", the system needs to further study the entire word and its context, and then make a choice.
The Firmani team first created a data set to train a neural network-based computer vision system.
This data set needs to be tagged. Therefore, the vision system can learn the mapping of strokes to possible letters.
They outsourced the data tags, splitting the puzzle-like words into pattern recognition questions (such as puzzle verification codes) and presenting them to 120 college students, allowing them to manually mark data sets containing 15,000 words in a few hours.
The results are very good. The Firmani team said: "We are able to accurately transcribe 65% of the letter images in the dataset."
Obviously, this achievement is of great significance for the transcription of medieval articles and historians. But there are still more problems to overcome. For example, the transcription of lowercase letters still exists, so the next step is to expand the vocabulary and include abbreviations in uppercase letters and medieval texts into the data set.
It is unclear how the Vatican Confidential Archive will use this technology, and whether the Vatican Registers will be made public after being transcribed.
But even if the documents were not published, the cutting-edge technology developed by the Firmani team can help scholars conduct in-depth research in related fields. For example, historical documents can be developed using data such as words, phrase frequencies, and changes over time. Can be used as an important entry point for analyzing historical culture.
Switch & Socket,Sockets And Switches,Brass Plug Sockets,Light Switches And Sockets
WENZHOU TENGCAI ELECTRIC CO.,LTD , https://www.tengcaielectric.com