Skip to content

Text2Process

Rawtec300 edited this page Jul 4, 2021 · 47 revisions

General Information: Group organization

In this project, the SCRUM approch model was used. The sprint length was set at two weeks. At the end of each sprint, a short sprint review was held to discuss what went well in the last sprint, what went badly and which tasks were still open and how to proceed further. Afterwards, in a short Sprint Planning, it was discussed which Epics should be dealt with in the following Sprint and which resources (developers) should be used for this Epic.

In the period of the project the following roles were assigned:

Scrum Master: Jonas Trautmann

Group: Harun Bajric, Noah Colby, Jannik Steck, Jonas Trautmann

Period: 10.05.2021 - 12.07.2021

The last week of the project was used to prepare the documentation

Epic: Work on parallel splits

(Sprint 1,2 / Members: Harun Bajric, Noah Colby, Jannik Steck, Jonas Trautmann)

  • BRANCH: ControlFlowStructures

Original problem:

  • Parallel workflows were not being recognized or being interpreted incorrectly.

Main issues:

  • Old Stanford parser
  • Parallel workflows rely heavily on context -->structure of code does not support this
  • Some marker words for parallel workflows were not being recognized or were treated differently than others or were not specified

Informations about Stanford Parser version:

  • Current build is from 2010 --> complex sentences with parallel workflows or loops are not handled correctly
  • Update will be very time consuming --> the API changed in a major way --> T2P will possibly have to be reconstructed from the ground up again

Findings:

  • syntax tree returned by the Stanford parser incorrect --> update version
  • updating Stanford parser version will require a lot of time --> newest version much newer than current build
  • current Stanford parser returns the correct tree

Idea:

  • Workaround with python-service

Python Service workaround:

  • To allow woped to use the new Stanford parser version without having to update the codebase in any major way, a python script was created, which creates a separate server to compute the tree using the new Stanford parser version. For each sentence, the service is called via a POST-command and a Tree (Stanford CoreNLP API) is returned. After that, the program resumes its work as usual.

  • Input for Service: String

  • Output from Service: String

  • THE CoreNLP version can be updated via the dockerfile

Work on loops:

  • Large issues on the work of parallel workflows --> loops were deprioritized to create capacity for work on the parallel workflows
  • Generalized approach in Epic: Conception

Two transitions in one path:

  • Two transitions are shown in one path
  • Example Sentence: The process is registered. The history is checked then the funds are generated while the authorization is tested. The process is completed
  • Generalized approach and more Information in Epic: Conception

Epic: Labels

(Sprint 2 / Members: Harun Bajric, Jonas Trautmann)

When using WoPeD, it has been noticed that punctuation marks are occasionally included in the process models created and that too much information is also provided in places. The aim is to correct this situation.

In order to fix this condition, it was initially tried to identify the cause when too much information was written into the label. Here it could be determined that there is a connection with the control flow structures. If the parallelism is misinterpreted, labels contain words that are used to represent parallelism. With the improvement of the interpretation of parallelism, this problem has not arisen any further.

In order to identify why labels protrude into other labels, the location responsible for the size of the labels was identified. This is done in the NameModul.java class of the main project. Here it was recognizable that the size of the labels was determined via autosize. This ensures that labels with a lot of text protrude into others. The labels were then set to a fixed size and centered. To avoid that the texts are cut off unattractively, after a certain number of chars, the rest is cut off with a ... abbreviation. The whole text of the label can still be viewed by double-clicking on the respective label.

The last problem, that punctuation marks are taken over arbitrarily, could not be reproduced and was not considered further after consultation with the supervisor.

Epic: Different kinds of imports

(Sprint 3 / Members: Noah Colby, Jannik Steck)

  • BRANCH: FileExtensions

Idea

  • Extend the file formats of the input text
  • In the first step only pdf
  • Use of TikaParser --> Parser detects automatically the format --> you only need one method
  • TikaParser: Link to TikaParser
  • Use of Aspose to translate docx and pptx to PDF because Tika is not compatible with either
  • Aspose: Link to Aspose

Problem

  • The parser couldn't find docx and pptx

Solution

  • Workaround for the formats docx and pptx --> Woped converts the files (docx, pptx) into a pdf file
  • After the file has been read correctly, the converted pdf file is deleted
  • You can now import the following file types for Mac and Windows: rtf, txt, doc, docx, ppt, pptx, pdf

Epic: Work on BPMN

(Sprint 3 / Members: Jonas Trautmann)

Initial situation: T2P provides the ability to create process model in angular.js based frontend. At the beginning of this project this was only possible for the creation of petri nets. Target state was to provide another common notation (BPMN).

To accomplish this, radio buttons have been added to the frontend to choose between PNML and BPMN. Depending on this choice, the process model is displayed in the canvas either as pnml or bpmn and the respective other model is hidden using ngif. This is done in the class index.html. The different representation is achieved by extending the Petrinet.js class with another component called BPMN. This component interpreted the created process model in the symbols typical for BPMN notation. To add more symbols they have to be extended in the bower_components (class vis.js). For example (https://www.bitdegree.org/learn/best-code-editor/html-canvas-tag-example-2) can be used to create these forms. In addition, another button has been added that allows to download the respective processes as a TXT file in the form of an XML file. Via the radioService.js is determined which XML file should be downloaded. For the creation of the bpmn xml an additional endpoint (/generateBPMN) was created in the t2pcontroller which delivers the corresponding XML as a response to a post request.

Additional work needed: Some classes should be split logically. For example, the BPMN component currently in petrinet.js should be moved to another class bpmn.js. In addition, a more up-to-date version of angular should be used in the near future. The LTS of the currently used version expires at the end of the year 2021 which leads to the fact that security relevant updates are no longer updated.

Epic: Conception

(Sprint 4 / Members: Jannik Steck)

  • There are conceptions for loops and parallel splits
  • Information: The conceptions are in german
  • The two Conceptions are in the last comment: conception

Epic: Secondary task

(Sprint 4 / Members: Jannik Steck, Jonas Trautmann)

  • Wrong error message by header "URL-Fehler" --> In the class T2PUI.java in the main woped project the wrong error message was mapped to a 500 error. This was replaced with the correct message T2PUI.500Error.Text. Furthermore, the text for the T2PUI.GeneralError.Text in Messages.properties and Messages_en.properties was rewritten to be more meaningful.

  • Code-Refactoring

Further required work

  • Implement concepts (Epic: Conception)
  • Update of the Stanford Parser version
  • Bpmn
  • Continue improve Parallelism