Skip to content

Commit

Permalink
Insert NER Plugin into ttw
Browse files Browse the repository at this point in the history
  • Loading branch information
pBxr committed Oct 20, 2024
1 parent bf595be commit 2002ed7
Show file tree
Hide file tree
Showing 7 changed files with 519 additions and 128 deletions.
5 changes: 5 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
*.docx

*.bak

*.exe
32 changes: 22 additions & 10 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ ttw consists of two components:
- it also runs several integrity checks on the files
(- step by step it will also take over the functions from the `c++` core)

2.) The `c++` core (`tagtool_v2-0-0.exe`):
2.) The `c++` core (`tagtool_v2-1-0.exe`):
- it runs most of the main tasks
- using the `Python` framework it needs to be embedded into the framework´s main folder
- like in former releases it still can be run as a standalone application using a terminal.
Expand All @@ -57,13 +57,13 @@ pyinstaller -wF --icon="Logo.ico" TagTool_WiZArd_Start.py

Result is `TagTool_WiZArd_Start.exe`.

3.) Create `tagtool_v2-0-0.exe` using this repo (`cpp_core`):
A simple way to create the `tagtool_v2-0-0.exe` file from the `c++` core is to use Embarcadero Dev-C++ 6.3.:
3.) Create `tagtool_v2-1-0.exe` using this repo (`cpp_core`):
A simple way to create the `tagtool_v2-1-0.exe` file from the `c++` core is to use Embarcadero Dev-C++ 6.3.:
- Open the `.dev` file and add all `c++` files to your project (`main.cpp` and all header files (`.h`))
- If using Embarcadero Dev-C++ 6.3 add "`-std=c++17`" in Project Options -> Parameter s -> C++ compilers.
- Run "Rebuild all".

Result is `tagtool_v2-0-0.exe`
Result is `tagtool_v2-1-0.exe`

## How to setup and run

Expand All @@ -74,7 +74,7 @@ Result is `tagtool_v2-0-0.exe`
- ttw_help.html
- Logo.ico
- Logo.gif
- tagtool_v2-0-0.exe (how to create the `.exe` file from the `c++` core see above)
- tagtool_v2-1-0.exe (how to create the `.exe` file from the `c++` core see above)
- and the \resources folder (with all necessary files downloaded together with the ttw release)

If you create a shortcut on your desktop to start `TagTool_WiZArd_Start.exe` you don´t have to touch the ttw folder again.
Expand All @@ -90,23 +90,34 @@ For preparing the `.csv` files and all other questions how to run the applicatio

**Alternatively: Stand alone from console:**

After compiling the binary (tagtool_v2-0-0.exe, see above) open a terminal and run "tagtool_v2-0-0.exe" either with the parameter "--help" to get further informations or together with the name of the file you want to process.
After compiling the binary (tagtool_v2-1-0.exe, see above) open a terminal and run "tagtool_v2-1-0.exe" either with the parameter "--help" to get further informations or together with the name of the file you want to process.
Be sure not to omit the `.html`-ending of the file you want to process.
Be sure that all necessary files are saved in the **same folder** together with the `tagtool_v2-0-0.exe` file, i. e.
Be sure that all necessary files are saved in the **same folder** together with the `tagtool_v2-1-0.exe` file, i. e.
- 01_MetadataValueList.csv
- 02_AuthorYearList.csv
- 03_ImageCreditList.csv
- 04_ToSearchAndReplaceList.csv
- article.html
- tagtool_v2-0-0.exe
- tagtool_v2-1-0.exe
- \resources

See "--help" to find all necessary informations to run the application in a standalone version.
For preparing the `.csv` files see `ttw_help.html`.

## New in v2.1.0

Starting with v2.1.0 `ttw` comes with a test version of a `Named Entity Recognition (NER)` Plugin option. The NER Plugin needs a specific environment and various additional libraries with special dependencies. This plugin therefore is switched off by default in the release versions to avoid conflicts. If you want to test the plugin:
- Prepare your environment carefully, see the README.md file with the complete documentation here: https://github.com/pBxr/NER_Plugin_for_ttw.
- Activate the plugin in the `Python` source code before re-interpreting the Python files. See `TagTool_WiZArd_Start.py` and set the `NER_Plugin_Switch` to `True`.
The insufficient quality of the `iDAI.gazetteer` query results was ignored for this first test version (as well as the webservice´s default query limit). To work on filter mechanisms to improve the quality of the result will be a task for forthcoming commits.
For more information see the "Help" file and especially the documentation here: https://github.com/pBxr/NER_Plugin_for_ttw.

New also:
- Function to convert tables to XML, implemented with Beautiful Soup (therefore not availabe when using the console version).

## New in v2.0.0

- Starting with v2.0.0 ttw comes with a GUI, based on `Python/tkinter`. Although the `c++` core can still be used as terminal standalone application (`tagtool_v2-0-0.exe`, see above), it is not recommended, because the `Python` framework does several integrity checks.
- Starting with v2.0.0 ttw comes with a GUI, based on `Python/tkinter`. Although the `c++` core can still be used as terminal standalone application (`tagtool_v2-1-0.exe`, see above), it is not recommended, because the `Python` framework does several integrity checks.

Also new to previous versions:
- The article file and value lists no longer need to be saved in the same folder with ttw, any directory can be chosen.
Expand Down Expand Up @@ -157,4 +168,5 @@ Therefore new in v1.3.0: Additional mode implemented when ttw is called from web
## See also

- For ttw_webx see https://github.com/pBxr/ttw_WebExtension
- ID_Extractor (ID_Ex) for extracting IDs and references from `.jats` article files, especially for the above mentioned journals, see https://github.com/pBxr/ID_Extractor
- ID_Extractor (ID_Ex) for extracting IDs and references from `.jats` article files, especially for the above mentioned journals, see https://github.com/pBxr/ID_Extractor
- Test Environment for a TagTool_WiZArD Named Entity Recognition Plugin, see https://github.com/pBxr/NER_Plugin_for_ttw.
2 changes: 1 addition & 1 deletion cpp_core/TagTool_WiZArd.dev
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,7 @@ ObjectOutput=
LogOutput=
LogOutputEnabled=0
OverrideOutput=1
OverrideOutputName=tagtool_v2-0-0.exe
OverrideOutputName=tagtool_v2-1-0.exe
HostApplication=
UseCustomMakefile=0
CustomMakefile=
Expand Down
4 changes: 2 additions & 2 deletions cpp_core/ttwClasses.h
Original file line number Diff line number Diff line change
Expand Up @@ -634,8 +634,8 @@ string strongEndXML_ = "</bold>";

//Global settings and switches...

string versionNumber = "v2-0-0";
string versionTag = "v2.0.0";
string versionNumber = "v2-1-0";
string versionTag = "v2.1.0";

bool firstRun=true;
bool nextRunIsSet=true;
Expand Down
209 changes: 121 additions & 88 deletions python_frame/TagTool_WiZArd_Start.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,7 @@
from Settings import files

import pyScripts as pyScr
import pyNER


class MainWindow(tkinter.Frame):

Expand Down Expand Up @@ -143,7 +143,7 @@ def actualize_widgets(self):

#NER plugin button
self.buttonStartNER = ttk.Button(self, text = "Open NER plugin", style = "TButton",
command=lambda: self.set_NER_settings())
command=lambda: self.basic_NER_lib_check())
self.buttonStartNER["width"] = 20
self.buttonStartNER.grid(column = 3, row = heightRow1 + 2, sticky="nw")

Expand Down Expand Up @@ -188,6 +188,51 @@ def actualize_widgets(self):
self.buttonOpenBrowser["width"] = 25
self.buttonOpenBrowser.grid(column = 2, row = heightRow1 + 3, sticky="nw")

def basic_NER_lib_check(self):

if NER_Plugin_Switch == False:
textInfo = ("NER Plugin must be activated first.\n\n"
"See \"About\" -> \"Help\" for instructions.\n\n\n")
tkinter.messagebox.showwarning(title="ERROR", \
message=textInfo)
return

if self.files.fileName == "":
tkinter.messagebox.showwarning(title="ERROR", \
message="No file selected!")
self.settings.selectedFileIsReady = False
self.bgColorTboxArticle = "red"
self.actualize_widgets()
return

#Extensive tests ommitted, but at least a quick check,
#whether it can be assumed that the necessary environment exists.
try:
from transformers import pipeline
except ModuleNotFoundError as err:
textInfo = ("The required NER libraries do not seem to be installed.\n\n"
"Check your environment.\n\n"
"See \"About\" -> \"Help\" for instructions.\n\n\n")
tkinter.messagebox.showwarning(title="ERROR", \
message=textInfo)
return
else:
yesnoResult = tkinter.messagebox.askyesno(title="Important Information", \
message=( "The now opening NER Plugin must to be run before "
"preparing the file \"04_ToSearchAndReplaceList.csv\".\n\n"
"So:\n"
"1. Run the NER Pluging first. Its results will be saved "
"in the file \"NER_results\\02_Gazetteer_IDs_DRAFT.csv\"\n\n"
"2. Copy the entries you have approved and selected into "
"\"04_ToSearchAndReplaceList.csv\"\n\n"
"3. After having prepared the other mandatory .csv files run TagTool.\n\n\n"
"Do you wish to continue?"
))
if yesnoResult == False:
return
else:
self.set_NER_settings()

def create_MenuBar(self):
self.menueBarFile = tkinter.Menu(self.menu, tearoff=False)

Expand Down Expand Up @@ -270,9 +315,7 @@ def reset_app(self):
def run_NER_Plugin(self, window):

#In this version the default settings cannot be changed so they are hard coded
pyNER.run_NER_process(self.files, self.settings)

textInfo = "Process finished. Check result"
success, textInfo = pyNER.run_NER_process(self.files, self.settings)

tkinter.messagebox.showinfo(title="Info", \
message=textInfo)
Expand Down Expand Up @@ -348,89 +391,71 @@ def set_functions(self):


def set_NER_settings(self):

setNER_PluginWindow = tkinter.Toplevel()
setNER_PluginWindow.geometry('600x500')
setNER_PluginWindow.title('Named Entity Recognition Plugin')
setNER_PluginWindow.iconbitmap(self.settings.cwd+"\\Logo.ico")

#Models
self.groupModels = tkinter.LabelFrame(setNER_PluginWindow)
self.groupModels["text"] = "Models"
self.groupModels.grid(sticky="w", pady = 10, padx = 10)
self.tboxModels = Text(self.groupModels, height=len(self.settings.NER_Settings['Model']), width=70,
background=self.settings.colorNeutral)
self.tboxModels.configure(font=self.textFont)
self.tboxModels.grid()
for item in self.settings.NER_Settings['Model']:
self.tboxModels.insert("end", item + "\n")
self.tboxModels.config(state='disabled')

#Entities
self.groupEntities = tkinter.LabelFrame(setNER_PluginWindow)
self.groupEntities["text"] = "Entity Types"
self.groupEntities.grid(sticky="w", pady = 10, padx = 10)
self.tboxEntities = Text(self.groupEntities, height=len(self.settings.NER_Settings['Entity Type']), width=70,
background=self.settings.colorNeutral)
self.tboxEntities.configure(font=self.textFont)
self.tboxEntities.grid()
for item in self.settings.NER_Settings['Entity Type']:
self.tboxEntities.insert("end", item + "\n")
self.tboxEntities.config(state='disabled')

#Sources
self.groupSources = tkinter.LabelFrame(setNER_PluginWindow)
self.groupSources["text"] = "Ways of Source Text Extraction"
self.groupSources.grid(sticky="w", pady = 10, padx = 10)
self.tboxSources = Text(self.groupSources, height=len(self.settings.NER_Settings['Source']), width=70,
background=self.settings.colorNeutral)
self.tboxSources.configure(font=self.textFont)
self.tboxSources.grid()
for item in self.settings.NER_Settings['Source']:
self.tboxSources.insert("end", item + "\n")
self.tboxSources.config(state='disabled')

if self.files.fileName == "":
tkinter.messagebox.showwarning(title="ERROR", \
message="No file selected!")
self.settings.selectedFileIsReady = False
self.bgColorTboxArticle = "red"
self.actualize_widgets()
else:
tkinter.messagebox.showwarning(title="Important Information", \
message=( "The now opening NER Plugin needs to be run before "
"preparing the file \"04_ToSearchAndReplaceList.csv\".\n\n"
"So:\n"
"1. Run the NER Pluging first. Its results will be saved "
"in the file \"NER_results\\02_Gazetteer_IDs_DRAFT.csv\"\n\n"
"2. Copy the entries you have approved into "
"\"04_ToSearchAndReplaceList.csv\"\n\n"
"3. After having prepared the other mandatory files run TagTool"
))

setNER_PluginWindow = tkinter.Toplevel()
setNER_PluginWindow.geometry('600x500')
setNER_PluginWindow.title('Named Entity Recognition Plugin')
setNER_PluginWindow.iconbitmap(self.settings.cwd+"\\Logo.ico")

#Models
self.groupModels = tkinter.LabelFrame(setNER_PluginWindow)
self.groupModels["text"] = "Models"
self.groupModels.grid(sticky="w", pady = 10, padx = 10)
self.tboxModels = Text(self.groupModels, height=len(self.settings.NER_Settings['Model']), width=70,
background=self.settings.colorNeutral)
self.tboxModels.configure(font=self.textFont)
self.tboxModels.grid()
for item in self.settings.NER_Settings['Model']:
self.tboxModels.insert("end", item + "\n")
self.tboxModels.config(state='disabled')

#Entities
self.groupEntities = tkinter.LabelFrame(setNER_PluginWindow)
self.groupEntities["text"] = "Entity Types"
self.groupEntities.grid(sticky="w", pady = 10, padx = 10)
self.tboxEntities = Text(self.groupEntities, height=len(self.settings.NER_Settings['Entity Type']), width=70,
background=self.settings.colorNeutral)
self.tboxEntities.configure(font=self.textFont)
self.tboxEntities.grid()
for item in self.settings.NER_Settings['Entity Type']:
self.tboxEntities.insert("end", item + "\n")
self.tboxEntities.config(state='disabled')

#Sources
self.groupSources = tkinter.LabelFrame(setNER_PluginWindow)
self.groupSources["text"] = "Ways of Source Text Extraction"
self.groupSources.grid(sticky="w", pady = 10, padx = 10)
self.tboxSources = Text(self.groupSources, height=len(self.settings.NER_Settings['Source']), width=70,
background=self.settings.colorNeutral)
self.tboxSources.configure(font=self.textFont)
self.tboxSources.grid()
for item in self.settings.NER_Settings['Source']:
self.tboxSources.insert("end", item + "\n")
self.tboxSources.config(state='disabled')

#In this version the default settings cannot be changed so they are hard coded
#Selected Settings
self.groupSettings = tkinter.LabelFrame(setNER_PluginWindow)
self.groupSettings["text"] = "Selected Settings"
self.groupSettings.grid(sticky="w", pady = 10, padx = 10)
self.tboxSettings = Text(self.groupSettings, height=3, width=70,
background=self.settings.okGreen)
self.tboxSettings.configure(font=self.textFont)
self.tboxSettings.grid()
for x, y in self.settings.NER_SettingsSet.items():
self.tboxSettings.insert("end", x + ": " + y + "\n")
self.tboxSettings.config(state='disabled')

self.buttonRunNER = ttk.Button(setNER_PluginWindow, text = "Run NER Plugin", style = "TButton",
command=lambda: self.run_NER_Plugin(setNER_PluginWindow))
self.buttonRunNER.grid(sticky="e")
#In this version the default settings cannot be changed so they are hard coded
#Selected Settings
self.groupSettings = tkinter.LabelFrame(setNER_PluginWindow)
self.groupSettings["text"] = "Selected Settings"
self.groupSettings.grid(sticky="w", pady = 10, padx = 10)
self.tboxSettings = Text(self.groupSettings, height=3, width=70,
background=self.settings.okGreen)
self.tboxSettings.configure(font=self.textFont)
self.tboxSettings.grid()
for x, y in self.settings.NER_SettingsSet.items():
self.tboxSettings.insert("end", x + ": " + y + "\n")
self.tboxSettings.config(state='disabled')

self.buttonRunNER = ttk.Button(setNER_PluginWindow, text = "Run NER Plugin", style = "TButton",
command=lambda: self.run_NER_Plugin(setNER_PluginWindow))
self.buttonRunNER.grid(sticky="e")

self.tboxInfo = Text(setNER_PluginWindow, height=1, width=70, background="#ffff66")
self.tboxInfo.configure(font=self.textFont)
infoText = "NOTE: In this test versions this selected settings are predefined."
self.tboxInfo.insert("end", infoText)
self.tboxInfo.grid(sticky = "w", pady = 10, padx = 10)
self.tboxInfo.config(state='disabled')
self.tboxInfo = Text(setNER_PluginWindow, height=1, width=70, background="#ffff66")
self.tboxInfo.configure(font=self.textFont)
infoText = "NOTE: In this test versions this selected settings are predefined."
self.tboxInfo.insert("end", infoText)
self.tboxInfo.grid(sticky = "w", pady = 10, padx = 10)
self.tboxInfo.config(state='disabled')


def show_help(self):
Expand Down Expand Up @@ -474,7 +499,8 @@ def start_process(self):
subprocess.run(pandocCall, stdout=FNULL, stderr=FNULL, shell=False)

#Step 2: Run ttw
ttwCall = "\"" + self.settings.cwd + "\\tagtool_v2-0-0.exe\"" + " \""\
versionNumberCall = versionNumber.replace(".","-")
ttwCall = "\"" + self.settings.cwd + "\\tagtool_v"+versionNumberCall+".exe\"" + " \""\
+ self.files.projectPath + self.settings.target + "\""

#In case of whitespaces
Expand All @@ -500,7 +526,14 @@ def start_process(self):

root = tkinter.Tk()
global versionNumber
versionNumber = "2.0.0"
versionNumber = "2.1.0"

#Here is the switch if you want to test the NER Plugin
global NER_Plugin_Switch
NER_Plugin_Switch = False
if NER_Plugin_Switch == True:
import pyNER

currentDirectory = os.getcwd()
titleText = "Welcome to TagToolWiZArd application " + "(v"+versionNumber+")"
root.title(titleText)
Expand Down
Loading

0 comments on commit 2002ed7

Please sign in to comment.