Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Split imports on preprocess #3389

Merged
merged 47 commits into from
Aug 5, 2024
Merged

Split imports on preprocess #3389

merged 47 commits into from
Aug 5, 2024

Conversation

awildturtok
Copy link
Collaborator

No description provided.

@awildturtok awildturtok force-pushed the feature/split-cqpps branch from f1964db to f38d321 Compare April 23, 2024 15:30
@awildturtok awildturtok marked this pull request as ready for review April 23, 2024 16:01
@awildturtok awildturtok requested a review from thoniTUB April 23, 2024 16:01

private final Map<String, Integer> starts;
private final Map<String, Integer> lengths;
// TODO make sure that everyone respects this is an end not a length
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

noch offen?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lol, war an mich selber gerichtet, habe es aber noch nicht angeschaut.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hier ist noch kein outdated dranne

Copy link
Collaborator

@thoniTUB thoniTUB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ich finde das mit dem ConsistentHash cool, aber muss es nochmal durch denken.

preprocessJobs(jobs, buckets, config);


log.info("Successfully Preprocess {} Jobs:", success.size());
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
log.info("Successfully Preprocess {} Jobs:", success.size());
log.info("Successfully preprocessed {} jobs:", success.size());

entity2Bucket.add(entity, bucket);

return bucket;
public void assignEntityBucket(String entity, int bucket) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public void assignEntityBucket(String entity, int bucket) {
public void assignEntityToBucket(String entity, int bucket) {

@@ -31,17 +31,15 @@ class ClusterImportHandler implements ImportHandler {
@SneakyThrows
@Override
public void updateImport(Namespace namespace, InputStream inputStream) {
ImportJob job = ImportJob.createOrUpdate(
final Table table = ImportJob.createOrUpdate(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ja vorher war es besser, hat aber alles auf einmal im RAM gehalten, das will ich ja mit dem PR vermeiden. Aber der Name ist tatsächlich nicht mehr passend.

@awildturtok awildturtok requested a review from thoniTUB June 6, 2024 08:30
Comment on lines 75 to 78
/**
* Handle validity and update logic.
*/
public static Table createOrUpdate(DistributedNamespace namespace, boolean update, PreprocessedHeader header) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vielleicht nochmal splitten, der Kommentar passt besser als der Funktionsname

# Conflicts:
#	backend/src/main/java/com/bakdata/conquery/mode/cluster/ClusterImportHandler.java
#	backend/src/main/java/com/bakdata/conquery/models/jobs/ImportJob.java
#	backend/src/main/java/com/bakdata/conquery/models/preproc/PreprocessedHeader.java
@awildturtok awildturtok requested a review from thoniTUB June 12, 2024 13:55
Copy link
Collaborator

@thoniTUB thoniTUB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bitte ausgiebig testen mit großen daten vor dem Mergen


private final Map<String, Integer> starts;
private final Map<String, Integer> lengths;
// TODO make sure that everyone respects this is an end not a length
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe

@awildturtok awildturtok force-pushed the feature/split-cqpps branch from 220f5ab to 92c3561 Compare June 24, 2024 13:14
@awildturtok awildturtok requested a review from thoniTUB June 26, 2024 14:10
/**
* select, then send buckets.
*/
public static WorkerId sendBucket(Bucket bucket, WorkerInformation responsibleWorker) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
public static WorkerId sendBucket(Bucket bucket, WorkerInformation responsibleWorker) {
private static WorkerId sendBucket(Bucket bucket, WorkerInformation responsibleWorker) {

@thoniTUB
Copy link
Collaborator

Bitte noch mal über die offenen Konversationen schauen

@awildturtok awildturtok merged commit bcad24d into develop Aug 5, 2024
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants