Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent duplicate and invalid titles #6326

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

BartChris
Copy link
Collaborator

Fixes
#6303
#6055

@BartChris BartChris force-pushed the fix_duplicate_titles branch 2 times, most recently from e8c63dd to f23289c Compare November 22, 2024 12:51
@@ -367,7 +367,7 @@ public enum ParameterCore implements ParameterInterface {
/**
* Validation of process title via regular expression.
*/
VALIDATE_PROCESS_TITLE_REGEX(new Parameter<>("validateProzessTitelRegex", "[\\w-]*")),
VALIDATE_PROCESS_TITLE_REGEX(new Parameter<>("validateProzessTitelRegex", "^[a-zA-Z0-9_-]+$")),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the new default value not even updated in the kitodo_config.properties files?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering: This is a hard limitation of the Kitodo software, some titles lead to problems later. Should we allow to change that in the config.properties at all? Or should we hardcode that in the application? Additionally one can think of allowing additional regexes which might be checked on top of the minimal requirements. See #6055

cc @solth

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At SLUB we allow even the . inside the process title as for historical reasons this was needed to create the processes. Maybe this can be changed now but this should be discussed.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more note: the w meta character in regular expressions stands for a-z, A-Z, 0-9, including _ (underscore) . So the expression can be simplified to ^[\\w-]+$?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about chosing a new parameter name "validateProcessTitle" which avoids the current "denglisch" (English-German mix)?

Copy link
Collaborator Author

@BartChris BartChris Dec 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more note: the w meta character in regular expressions stands for a-z, A-Z, 0-9, including _ (underscore) . So the expression can be simplified to ^[\\w-]+$?

Thanks i changed that. It turned out that [\\w-]+ in combination with Java's matches als works (the problem was that the necessary validation is just not called during process creation), but i think the changed regex is more expressive.

What about chosing a new parameter name "validateProcessTitle" which avoids the current "denglisch" (English-German mix)?

I also have sympathy for that, but that would require institutions having to adapt their config. I think the Pull Request, if accepted, should probably be backported which would probably be easier, when we do not change the configuration.

@BartChris BartChris force-pushed the fix_duplicate_titles branch from f23289c to 6037595 Compare November 22, 2024 12:55
}

@Test
public void shouldNotAllowDuplicateTitles() throws Exception {
Copy link
Collaborator Author

@BartChris BartChris Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably have to take into account the configuration here. On some systems this is allowed...
Edit: maybe not necessary.

@BartChris BartChris force-pushed the fix_duplicate_titles branch from a035cde to 849dadc Compare December 6, 2024 10:43
@BartChris BartChris marked this pull request as ready for review December 6, 2024 11:05
@BartChris
Copy link
Collaborator Author

BartChris commented Dec 6, 2024

I think, for now, it would be less disruptive if institutions are allowed to override the regex in the configuration. This approach might introduce the possibility of configurations that Kitodo cannot handle (e.g., titles containing whitespace). However, one could consider introducing a baseline regex in the future—one that enforces basic constraints, such as disallowing whitespace (defined in the application and not changeable).

The configuration value could then be redefined and renamed to build on top of this baseline regex, allowing institutions to apply additional rules. For example, institutions could enforce identifiers that always start with a specific prefix, such as "digi_".

Copy link
Collaborator

@henning-gerhardt henning-gerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm tested this change and let a few code comments inside.

The changes are working if you have access to all projects or at least to the same projects where a process is already existing. It did not work if the already process is an other project where the user has no access.

As in both referenced issues there is no mention if process duplication check should work over all projects for a client independent of the user as project access or not I'm can not judge if this is solution is in current state correct or not. In my opinion the duplication check should work over all projects of a client and not only over the projects where the user has access.

}

// Helper to clean up database and file system
private void cleanUpProcess(Process process) throws Exception {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a note: this method is only executed if all method executions before calling this method was successful as this method is called at the end of test execution. This could lead to a dirty repository and tests which are executed after this could executed in a wrong way. Maybe an approach with @AfterEach should be considered so the cleanup is done independent of a successful or failed test.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, i tried to adapt the tests.

@@ -404,7 +404,7 @@ numberOfMetaBackups=8
useMetadatenvalidierung=true

# Validierung der Vorgangstitel ueber regulaeren Ausdruck
validateProzessTitelRegex=[\\w-]*
validateProzessTitelRegex=^[\\w-]+$
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the other kitodo_config.properties files with this entry be adjusted too?

@henning-gerhardt
Copy link
Collaborator

For an other issue I switched to release 3.7.1 and tried by accident to create a new monographic process from a catalogue but I got the message, that this process is already existing after I clicked on the save button in the create process form. So at least some kind of title duplication check is done even without this change.

@BartChris BartChris marked this pull request as draft December 10, 2024 19:04
@BartChris BartChris marked this pull request as ready for review December 11, 2024 17:05
@BartChris
Copy link
Collaborator Author

BartChris commented Dec 11, 2024

The changes are working if you have access to all projects or at least to the same projects where a process is already existing. It did not work if the already process is an other project where the user has no access.

As in both referenced issues there is no mention if process duplication check should work over all projects for a client independent of the user as project access or not I'm can not judge if this is solution is in current state correct or not. In my opinion the duplication check should work over all projects of a client and not only over the projects where the user has access.

I agree, the check is useless if it does not check every title of every project (in the same client); i adapted the logic and the tests to ensure that duplicate titles are not allowed on the same client. (No matter if the user has access to the projects or not)

@BartChris BartChris force-pushed the fix_duplicate_titles branch from 2c3ce88 to b843df8 Compare December 11, 2024 17:08
Integer processId = newProcess.getId();
processService.remove(processId);
fileService.delete(URI.create(processId.toString()));
createdProcess = underTest.getMainProcess();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The determination of the created process could be moved before the first assertion is executed so you have anytime the right process information before cleaning up. If adjusted here it should be adjusted on the other places too.

Copy link
Collaborator Author

@BartChris BartChris Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, good idea. I did that.

Copy link
Collaborator

@henning-gerhardt henning-gerhardt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general the changes are right. My two notes (adjustment of the other existing kitodo_config.properties files and the determination of the created process inside the tests) can be done and even not they do nothing wrong.

// Second process creation with duplicate title
CreateProcessForm underTestTwo = setupCreateProcessForm("Monograph");
underTestTwo.getMainProcess().setTitle("title");
underTestTwo.getMainProcess().setProject(ServiceManager.getProjectService().getById(secondProjectId));
Copy link
Collaborator Author

@BartChris BartChris Dec 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my previous implementation, I used project 2 for both process creations in the shouldNotAllowProcessTitlesInProjectsTheUserDoesNotBelongTo test. I changed that because the second process creation should happen in project 1, where user 2 has access. This ensures that the failure occurs due to a duplicate title, not because the user lacks access to the project. While it worked before, this change clarifies the test's intent.

@solth
Copy link
Member

solth commented Jan 23, 2025

However, one could consider introducing a baseline regex in the future—one that enforces basic constraints, such as disallowing whitespace (defined in the application and not changeable).

I think this is a very good idea. This would prevent technically problematic titles while still keeping some flexibility for institutions that want or need slight variations from the default in their process titles!

@BartChris
Copy link
Collaborator Author

However, one could consider introducing a baseline regex in the future—one that enforces basic constraints, such as disallowing whitespace (defined in the application and not changeable).

I think this is a very good idea. This would prevent technically problematic titles while still keeping some flexibility for institutions that want or need slight variations from the default in their process titles!

If we go down this road, we should maybe make this configurable in the project(?) settings and remove the value from the Kitodo config, to allow for more flexibility.

@henning-gerhardt
Copy link
Collaborator

However, one could consider introducing a baseline regex in the future—one that enforces basic constraints, such as disallowing whitespace (defined in the application and not changeable).

I think this is a very good idea. This would prevent technically problematic titles while still keeping some flexibility for institutions that want or need slight variations from the default in their process titles!

If we go down this road, we should maybe make this configurable in the project(?) settings and remove the value from the Kitodo config, to allow for more flexibility.

Read I this correct that this value is moved from a global / general place to a project specific place? How should this value be transferred if you have a lot of projects which need the same value? Did we really need different values for the projects?

@BartChris
Copy link
Collaborator Author

BartChris commented Jan 23, 2025

Read I this correct that this value is moved from a global / general place to a project specific place? How should this value be transferred if you have a lot of projects which need the same value? Did we really need different values for the projects?

Good point. Maybe a 'layererd' architecture

  1. Global general setting in the application
  2. Instance specific setting in the config.properties
  3. Project specific setting in the project
    (4. Maybe even in the tenant (Mandant), to have tenant-specific settings)

@henning-gerhardt
Copy link
Collaborator

Read I this correct that this value is moved from a global / general place to a project specific place? How should this value be transferred if you have a lot of projects which need the same value? Did we really need different values for the projects?

Good point. Maybe a 'layererd' architecture

1. Global general setting in the **application**

2. **Instance specific** setting in the config.properties

3. **Project specific** setting in the project
   (4. Maybe even in the tenant (Mandant), to have tenant-specific settings)

If we use a layered architecture, I would use

  1. global / instance setting in kitodo_config.properties with an application coded value as fallback (like in many other places already used)
  2. optional tenant / client setting
  3. project specific setting

@solth
Copy link
Member

solth commented Jan 23, 2025

Good point. Maybe a 'layererd' architecture

Yes, I just had the same idea. Not sure, if we need all these layers, though - perhaps just a global parameter in kitodo_config.properties that can be overwritten for each project in the projects settings in the GUI would be enough. Not sure if a client wide setting is really required.

Tbh, I also think it is not asked too much to update one string manually in multiple projects, even if an institution has more than just a handful of projects - this is just a simple copy & paste task, after all.

@solth
Copy link
Member

solth commented Jan 23, 2025

I agree, the check is useless if it does not check every title of every project (in the same client); i adapted the logic and the tests to ensure that duplicate titles are not allowed on the same client. (No matter if the user has access to the projects or not)

I think we have to keep in mind that in special circumstances a user might see processes of projects that do not belong to one of the clients he is assigned to - specifically when he has the special permission to link new processes to processes of unassigned projects, see #5869 - and if and how a duplicate check would have to take this into account.

@BartChris
Copy link
Collaborator Author

BartChris commented Jan 23, 2025

I agree, the check is useless if it does not check every title of every project (in the same client); i adapted the logic and the tests to ensure that duplicate titles are not allowed on the same client. (No matter if the user has access to the projects or not)

I think we have to keep in mind that in special circumstances a user might see processes of projects that do not belong to one of the clients he is assigned to - specifically when he has the special permission to link new processes to processes of unassigned projects, see #5869 - and if and how a duplicate check would have to take this into account.

What i implemented right now is that the search for duplicate titles is done across all projects (of the current client/tenant), even in those the user does NOT have access to. In some setups having duplicate titles could have really bad consequences, so it should be prevented. Some user might not even see the process titles in the list, which he or she is not allowed to set, but that's the way it is.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants