Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Big merge from bruno's branch, more info below #989

Open
wants to merge 5 commits into
base: noetic-devel
Choose a base branch
from

Conversation

brunofavs
Copy link
Collaborator

@brunofavs brunofavs commented Oct 8, 2024

This is a big merge from the work I was doing during my MsC thesis. Amongst others, these are the changes :

  • Added rrbot and softbot batch examples
  • Fixed multiple small bugs associated with batch execution. I will test this thoroughly now when running the batches for Zau. I think I was the last one using batch executions so my fixes still make sense.
  • Added the plotting script for the batch execution.
  • Added a few utility functions such as "stripAutomaticSuffixes","mutually_inclusive_conditions","parse_list_of_transformations","deleteCollectionFromDataset".
  • Change the name of function "addNoiseToJointParameters" to "addBiasToJointParameter" to better reflect function behavior.
  • Reworked how noise works with rotations. Now it uses a uniform function instead of choice. Now all noises use a main "computeNoise" function.
  • Added the functionality to add noise to specific transformations in the calibrate script.

This is a big merge from the work I was doing during my MsC thesis.
Amongst others, these are the changes : - Added rrbot and softbot batch
examples- Fixed multiple small bugs associated with batch execution. I
will test this thoroughly now when running the batches for Zau. I think
I was the last one using batch executions so my fixes still make sense.-
Added the plotting script for the batch execution.- Added a few utility
functions such as
"stripAutomaticSuffixes","mutually_inclusive_conditions","parse_list_of_transformations","deleteCollectionFromDataset".-
Change the name of function "addNoiseToJointParameters" to
"addBiasToJointParameter" to better reflect function behavior.- Reworked
how noise works with rotations. Now it uses a uniform function instead
of choice. Now all noises use a main "computeNoise" function.- Added the
functionality to add noise to specific transformations in the calibrate
script.
@brunofavs
Copy link
Collaborator Author

I tested a full calibration for zau and all evaluations and the results were the same. I will be making all the tests on this branch to ensure its ready to be merged.

Although most alterations are due to how the noise is added and I won't be adding noise on Zau...

This PR should've come already many months ago, I'm real sorry for that, I completely forgot I had this work only on the bruno branch. I just found it out because I noticed the plotting script I had developed for the batch executions was missing on the main branch.

@miguelriemoliveira
Copy link
Member

Hi @brunofavs ,

thanks for the contribution.

Right now @Kazadhum is doing some experiments and since this a very large commit I am not sure if this will have impact of something important ...

@Kazadhum and @manuelgitgomes , what do you think?

@miguelriemoliveira
Copy link
Member

There are some files which are core to atom. I am a bit nervous changing so much at once ... not sure.

M atom_calibration/scripts/calibrate (29)
M atom_calibration/src/atom_calibration/collect/data_collector.py (9)
M atom_calibration/src/atom_calibration/collect/lidar3d_labeler (4)
M atom_core/src/atom_core/config_io.py (19)
M atom_core/src/atom_core/dataset_io.py (136)
M atom_core/src/atom_core/naming.py (2)
M atom_core/src/atom_core/optimization_utils.py (1)
M atom_core/src/atom_core/utilities.py (40)

@Kazadhum
Copy link
Collaborator

Kazadhum commented Oct 9, 2024

Hello @brunofavs, @miguelriemoliveira and @manuelgitgomes! I think I agree with professor Miguel here. This is a rather large PR when we're at the data collection stage for two separate projects. Sounds like an inopportune time to make changes this wide to the code base. Nevertheless, I think it's important we keep the work you did during your MsC!

So I'd say keep this PR open for now and once we're done with our projects we can take a closer look.

Do you agree?

Copy link
Collaborator

@manuelgitgomes manuelgitgomes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @miguelriemoliveira, @Kazadhum, @brunofavs.

To preface this review, I have not ran any of these commands, so I cannot attest the correct functioning of this code.

The concept of the additions is noteworthy, as they introduce the new way to inject noise into the datasets, as well as a revamped results extractions.

However, there are some decisions I do not understand, such as the deletion of some files and the addition of a new way to filter collections.
I have left specific comments on these specific decisions.
Some of the decisions need documentation as well.

The code is also a bit sloppy at times, leaving merge conflicts unresolved and big blocks of code commented.

I suggest @brunofavs to clarify its decisions, enhance documentation and clear code before this can be merged.
I also suggest to change the name of this PR to something more clear.
I also agree with @miguelriemoliveira and @Kazadhum that, if results are currently being taken, this merge should occur afterwards.
Nevertheless, thank you @brunofavs for your contribution!

atom_batch_execution/experiments/softbot_example/README.md Outdated Show resolved Hide resolved
atom_batch_execution/experiments/softbot_example/data.yml Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be removed? Better consult with @miguelriemoliveira. This aplies for all results.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think my thought back then was:

  • If the bagfiles and datasets of the atom_examples are stored in the NAS, and the examples' configurations are stored in the repo, the same logic should apply. The batch configurations can live on the repository, whilst the data should live on the NAS.

Comment on lines +87 to +89
collections_to_remove = data['collections_to_remove']

args["collection_selection_function"] = lambda x : int(x) not in collections_to_remove
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really needed? I think we are creating two different ways to do the same thing. I know this is a wrapper on the old way, but I suggest we are consistent in the code base and all of our scripts should function in the same way to select collections.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initially when I started working with the batch executions, I questioned myself the same thing. When I first worked on batch executions, this line of code was already in the data.yml template. (Line 19)

package_path: "package://softbot_calibration"
dataset_path: '$ATOM_DATASETS/softbot/train'
collections_to_remove: []

It was not functional however. I can recall discussing this with @miguelriemoliveira and we ended up deciding on leaving it there and make it functional to allow greater flexibility.

The user could instead use the remove_collections_from_atom_dataset instead, but it wouldn't be on the fly so it could in some cases be cumbersome. Also I see this script more geared towards removing collections that are purely bad and never going to be used.

Picture one use case where the user wanted to run 2 experiments where he used half of the dataset in each experiment, to test for example the impact of odometry error or something similar. With this on the fly functionality its a easy thing to do. Also other scripts such as calibrate or the evaluations allow to select collections on the fly, so I think the same logic applies here.

It also filters the collections in the same standardized way, using :

filterCollectionsFromDataset(dataset,args)

atom_core/src/atom_core/dataset_io.py Outdated Show resolved Hide resolved
atom_core/src/atom_core/dataset_io.py Outdated Show resolved Hide resolved
atom_core/src/atom_core/dataset_io.py Outdated Show resolved Hide resolved
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change is unneeded.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one? You did not mention any lines.

atom_core/src/atom_core/utilities.py Show resolved Hide resolved
@miguelriemoliveira
Copy link
Member

Hi @brunofavs ,

from all I have seem I think this PR is too large.

The idea is to have smaller PRs focused on specific functionalities.

So in my opinion you could for example do a PR for the change in the randomization mechanism, another for the plot scripts, etc.

As it is I think this PR is just a dump of months of work and it is too risky to include in straght up into atom.

@brunofavs
Copy link
Collaborator Author

Hi @miguelriemoliveira

The idea is to have smaller PRs focused on specific functionalities.

Yep I agree 100%.

So in my opinion you could for example do a PR for the change in the randomization mechanism, another for the plot scripts, etc.

Yeah that may be safer.

So .. list of PR's :

  1. Change in randomization
  2. Functionality to add noise to individual TF's
  3. Plotting/batch
  4. Misc. smaller changes.

My only fear is that some features are dependent on other features. I'm pretty sure 2. is dependent on 1. . 3. should be completely isolated.

Unfortunately Github doesn't allow for dependencies in PR's : https://stackoverflow.com/questions/26619478/are-dependent-pull-requests-in-github-possible

@miguelriemoliveira
Copy link
Member

So .. list of PR's :

  1. Change in randomization
  2. Functionality to add noise to individual TF's
  3. Plotting/batch
  4. Misc. smaller changes.

Sounds nice. For dependencies you can make sure that PR2 is only submitted after PR1 is accepted and it should be fine.

@brunofavs
Copy link
Collaborator Author

Hi @manuelgitgomes !

Thanks a lot for your revisions. Even though you didn't execute commands you made a lot of remarkable observations.
I finished now reviewing your revisions. I left open the ones I think are still up for discussion.

I'm referencing a commit here with the changes.

@brunofavs
Copy link
Collaborator Author

So .. list of PR's :
1. Change in randomization
2. Functionality to add noise to individual TF's
3. Plotting/batch
4. Misc. smaller changes.

I actually have been looking at this. I think at this point doing this has a higher probability of screwing something up.

For one, I don't know the order in which I implemented this functionality. I'm pretty sure the commits in which the features are implemented have some overlap, meaning I can't fully isolate them. I also worked on these features in parallel, meaning the commits behind a certain feature are not a interval (from commit x to y) but instead a list of smaller intervals.

I would have to create 4 new branches, each based on the noetic-devel branch and cherry pick the individual commits from the branch bruno to these new branches. Then I would have to bring the revisions I already spent the yesterday's afternoon on to each of this branches. Thing is I made all the revisions on a single commit so it wouldn't make sense to bring the revisions of feature x into the branch of feature y.

For the future I'm thinking on adopting feature branching + squash merge to make the repo's history cleaner and facilitate the resolution of these kind of issues in the future. I feel a little way in over my head in this version control problem.

Copy link
Collaborator

@Kazadhum Kazadhum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @brunofavs! I just finished going through these changes.

Overall, I agree with the sentiment that this is not the normal size for a PR. This PR touches on 50 files, 13 of which are existing (and core) scripts. I do like, however, the idea of having separate branches for different functionalities and having more frequent PRs and code reviews. I think engaging more with each other's code in this way would make ATOM's code base more consistent.

As mentioned previously, I'd recommend not merging at the moment. Besides the enormous amount of changes made here that have a probability of breaking something or introducing some kind of bug somewhere, there are questions of code structure and consistency I've pointed out I'd like to have answered. Additionally, I'd like to know the reason some of these changes were made. If there are issues relating to these changes, please link them when you answer to my comments for each line.

I also think that the title of this PR should be changed to something more explicit and self-explanatory.

Comment on lines 73 to 76
ap.add_argument("-rs", "--run_suffix", help="Suffix used to signal multiple runs of the same experiment.",
required=False, default='_run', type=str)
ap.add_argument("-fs", "--fold_suffix", help="Suffix used to signal multiple folds of the same run.",
required=False, default='_fold', type=str)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This isn't an issue with this change, but our batch executions mechanism is growing complex, at least w.r.t. how we define certain things - what does "run" mean and what is a fold and what do these arguments do in this script as opposed to the process results script. The documentation should be updated to explain this soon, I think.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember adding this myself. I saw that @manuelgitgomes implemented something with do with this in #899 7d691cf. Before this was used to determine the suffix size for some calculations for string manipulation. Now the only mentions to "suffix" or those args are in their declaration so I'm pretty sure its deprecated. I will remove these.

Comment on lines +85 to +93
dataset = loadJSONFile(data['dataset_path'])

collections_to_remove = data['collections_to_remove']

args["collection_selection_function"] = lambda x : int(x) not in collections_to_remove
args['use_incomplete_collections'] = None
args['remove_partial_detections'] = None

filterCollectionsFromDataset(dataset,args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I like the inclusion of csf in batch executions.

I think batch executions should be the most agnostic possible to ATOM, and I think including this argument might "bloat" the script.

At least it seems "cleaner" to me that any ATOM dataset that goes into batch executions should already have gone through the process of having its problematic collections removed. I think I'd rather make use of the remove_collections_from_dataset script for that and only then start running batch executions using the clean dataset.

csf is useful in the calibrate and the evaluation scripts because it allows for quick testing and granular analysis of the impact certain collections have on a calibration procedure, which I'd argue is not the point of batch executions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#989 (comment)

I discussed this with @Kazadhum and we can't seem to settle our opinions.

@Kazadhum argues it makes the code more confusing and that the user can just filter the dataset beforehand.

I argue that it makes the code more flexible and it does not hurt the readability of the code.

Its a small issue, and we both agreed that either decision would not have a huge impact on atom.

@miguelriemoliveira we might need your opinion on this. What is your take on this?

atom_batch_execution/scripts/plot_graphs Show resolved Hide resolved
# print('translation_delta = ' + str(translation_delta))

# global metrics
translation_error = float(abs(np.average(translation_delta)))
rotation_error = float(np.average([abs(roll), abs(pitch), abs(yaw)]))
translation_error = np.linalg.norm(translation_delta)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's another transformation comparison function in atom_core/transformations.py called compareTransforms() which should use the same metric but still uses the average as before. Please change it (in this branch) to be in conformity to this and add that change to this pull request.

Comment on lines +127 to +131
ap.add_argument("-ntfl", "--noisy_tf_links", type=parse_list_of_transformations, help='''Pairs of links defining the transformations to which noise will be added, in the format linkParent1:linkChild2,linkParentA:linkChildB (links defining a transformation separated by
: and different pairs separeted by ,)
Note : This flag overrides the -nig flag''')
ap.add_argument("-ntfv", "--noisy_tf_values", nargs=2, type=float, metavar=("translation", "rotation"),
help="Translation(m) and Rotation(rad)(respectively) noise values to add to the transformations specified by-ntfl")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems confusing when you consider we now have 6 flags (I think) just for defining noise, and they do 3 different things:

  • Add noise to the sensor and pattern TFs (-nig)
  • Add noise to other tfsm, the parents and child links of which need to be given (-ntfl, -ntfv)
  • Add noise to the joints (-jbn/-jbv/-jbp)

Given how confusing it is to navigate these arguments, I think -nig's help text needs to be more explicit. Additionally, and most importantly, I think, you could just use-ntfl and use the same values as -nig for the noise. This allows for less control over specific noise values but makes this a lot clearer, I think. You could think of the -ntfl flag as an additional list of TFs to add the noise from -nig to.

It also eliminates the need for yet another function that adds noise in ATOM, addNoiseFromNoisyTFLinks().

I think for anyone who didn't work in the development of ATOM, or even those that did, this is much clearer.


As a small aside, this can't be done for joints because you specifically need to specify the joint name and the joint parameter you want to influence and then the value for each param. Additionlly, you can't use noise values for joints of the same magnitude of those you might use for, say, -nig.

atom_core/src/atom_core/dataset_io.py Show resolved Hide resolved
transform_key = generateKey(calibration_parent, calibration_child, suffix='')

atomWarn(f'Adding noise bigger than 1.0m/rad right now is bugged and might yield unexpected results. Check issue #929 for more information')
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After reading the issues, I think I agree with @miguelriemoliveira in #929. This should be an error thrown when noise_trans or noise_rot is greater than 1 instead of just a warning printed everytime noise is added.

This string doesn't need to be an f-string and besides, in terms of code uniformity, ATOM doesn't really use f-strings as a norm. It's not that important but it would be nice to have consistency in the code's format, I think.

In terms of the message itself, I'd change "right now" to "at the moment"; it's cleaner.

@@ -61,21 +63,28 @@ def compareAtomTransforms(transform_1, transform_2):
t2 = quaternion_matrix(transform_2['quat'])
t2[0:3, 3] = transform_2['trans']

# Method: We will use the following method. If T1 and T2 are the same, then multiplying one by the inverse of the other will produce and identity matrix, with zero translation and rotation. So we will do the multiplication and then evaluation the amount of rotation and translation in the resulting matrix.
v = t2[0:3, 3] - t1[0:3, 3]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why was this line added?

atom_core/src/atom_core/utilities.py Outdated Show resolved Hide resolved
@@ -187,7 +196,7 @@ def printComparisonToGroundTruth(
if dataset['calibration_config']['additional_tfs'] is not None:
for additional_tf_key, additional_tf in dataset['calibration_config']['additional_tfs'].items():

transform_key = generateKey(additional_tf["parent_link"], sensor["child_link"])
transform_key = generateKey(additional_tf["parent_link"], additional_tf["child_link"])
row = [transform_key, Fore.LIGHTCYAN_EX + additional_tf_key + Style.RESET_ALL]

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you explain these changes?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants