Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using dynamic number of labels #56

Open
wants to merge 5 commits into
base: master
Choose a base branch
from

Conversation

RozDavid
Copy link

@RozDavid RozDavid commented Feb 8, 2021

Addressing my own issue #55, using dynamic sized semantic prior matrices and initialization in runtime.
Reading number of labels from the launch file as a rosparam.

I believe the biggest concern could be the performance issue, for this here is a comparison of timings on the provided simulated semantic rosbag file.

With static labels


[ INFO] [1612796164.603515954, 60.342383269]: Layer memory: 16679159
[ INFO] [1612796164.603544030, 60.342383269]: Updating mesh.
[ INFO] [1612796164.620216339, 60.367825609]: Updating mesh.78.072170               
[ INFO] [1612796164.639555585, 60.414999999]: Integrating a pointcloud with 345600 points.
[ INFO] [1612796164.711103207, 60.544733472]: Finished integrating in 0.071499 seconds, have 339 blocks.
[ INFO] [1612796164.711173976, 60.544733472]: Timings: 
SM Timing
-----------
inserting_missed_blocks	     59	00.000255	(00.000004 +- 00.000003)	[00.000001,00.000094]
integrate/fast         	     59	05.165177	(00.087545 +- 00.009905)	[00.065052,00.165955]
mesh/publish           	    108	00.286112	(00.002649 +- 00.003277)	[00.000005,00.009061]
mesh/update            	    108	00.578068	(00.005352 +- 00.004863)	[00.000168,00.014176]
ptcloud_preprocess     	     59	00.590554	(00.010009 +- 00.004439)	[00.008309,00.034639]
remove_distant_blocks  	     59	00.000979	(00.000017 +- 00.000005)	[00.000006,00.000031]

With dynamic labels

[ INFO] [1612795707.691550318, 59.152784140]: Layer memory: 16629958
[ INFO] [1612795707.691564063, 59.152784140]: Updating mesh.
[ INFO] [1612795707.707433151, 59.172894621]: Updating mesh.78.072170               
[ INFO] [1612795707.728666358, 59.233487071]: Updating mesh.78.072170               
[ INFO] [1612795707.775409911, 59.323973636]: Integrating a pointcloud with 345600 points.
[ INFO] [1612795707.861115732, 59.483209653]: Finished integrating in 0.085651 seconds, have 339 blocks.
[ INFO] [1612795707.861209100, 59.483209653]: Timings: 
SM Timing
-----------
inserting_missed_blocks	     54	00.000232	(00.000004 +- 00.000003)	[00.000001,00.000066]
integrate/fast         	     54	05.019893	(00.092961 +- 00.010746)	[00.070486,00.244003]
mesh/publish           	     98	00.245623	(00.002506 +- 00.003167)	[00.000007,00.009175]
mesh/update            	     98	00.499049	(00.005092 +- 00.005313)	[00.000144,00.018225]
ptcloud_preprocess     	     54	00.486605	(00.009011 +- 00.000462)	[00.008290,00.018598]
remove_distant_blocks  	     54	00.000844	(00.000016 +- 00.000005)	[00.000002,00.000034]

Copy link
Collaborator

@ToniRV ToniRV left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @RozDavid, fantastic work!
I wonder if there could be a way to make it dynamic yet known at compile time to avoid the slow-down? I am not sure if Eigen vectorizes dynamic matrices by default or not, or if there is any type of memory alignment that needs to be done (I'd expect Eigen to do that automagically).
Otherwise, we could alternatively also allow static size matrices, perhaps using:
i) The quick (a bit ugly) way would be to use pre-processor directives to switch between dyanmic and static...
ii) Perhaps making the hardcoded number of labels being a #define directive which we could set from the CMakeLists.txt (though that doesn't avoid the re-compilation)...
iii) There must be a better way than i) and ii) :)

.gitignore Outdated
@@ -0,0 +1 @@
cmake-build-debug/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you delete this? (or rather add it to your global gitignore).

Suggested change
cmake-build-debug/

@@ -53,6 +53,8 @@ class SemanticLabel2Color {
// Make these public if someone wants to access them directly.
ColorToSemanticLabelMap color_to_semantic_label_;
SemanticLabelToColorMap semantic_label_to_color_map_;

size_t number_of_colored_labels;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use a trailing _ for members of a class.

Suggested change
size_t number_of_colored_labels;
size_t number_of_colored_labels_;

SemanticProbabilities;
// A `#Labels X #Labels` Eigen matrix where each `j` column represents the
// probability of observing label `j` when current label is `i`, where `i`
// is the row index of the matrix.
typedef Eigen::
Matrix<SemanticProbability, kTotalNumberOfLabels, kTotalNumberOfLabels>
Matrix<SemanticProbability, Eigen::Dynamic, Eigen::Dynamic>
SemanticLikelihoodFunction;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@@ -76,6 +76,10 @@ class SemanticIntegratorBase {
// probability of non-match = 1 - measurement_probability_.
SemanticProbability semantic_measurement_probability_ = 0.9f;

//The total number of labels in the segmenation node
//Used by the confidence matrices
size_t total_number_of_labels = 21;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
size_t total_number_of_labels = 21;
size_t total_number_of_labels_ = 21;

@@ -62,6 +62,7 @@ SemanticLabel2Color::SemanticLabel2Color(const std::string& filename)
// TODO(Toni): remove
// Assign color 255,255,255 to unknown object 0u
color_to_semantic_label_[HashableColor::White()] = 0u;
number_of_colored_labels = row_number;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
number_of_colored_labels = row_number;
number_of_colored_labels_ = row_number;

@@ -325,9 +344,9 @@ void SemanticIntegratorBase::normalizeProbabilities(
if (normalization_factor != 0.0) {
unnormalized_probs->normalize();
} else {
CHECK_EQ(unnormalized_probs->size(), kTotalNumberOfLabels);
CHECK_EQ(unnormalized_probs->size(), semantic_config_.total_number_of_labels);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
CHECK_EQ(unnormalized_probs->size(), semantic_config_.total_number_of_labels);
CHECK_EQ(unnormalized_probs->size(), semantic_config_.total_number_of_labels_);

SemanticProbabilities semantic_label_frequencies =
SemanticProbabilities::Zero();
SemanticProbabilities semantic_label_frequencies;
semantic_label_frequencies.resize(semantic_config_.total_number_of_labels, 1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
semantic_label_frequencies.resize(semantic_config_.total_number_of_labels, 1);
semantic_label_frequencies.resize(semantic_config_.total_number_of_labels_, 1);

@@ -319,7 +320,7 @@ void MergedSemanticTsdfIntegrator::integrateVoxel(
merged_weight, voxel);

SemanticVoxel* semantic_voxel =
allocateStorageAndGetSemanticVoxelPtr(global_voxel_idx, &semantic_block, &semantic_block_idx);
allocateStorageAndGetSemanticVoxelPtr(global_voxel_idx, semantic_config_.total_number_of_labels, &semantic_block, &semantic_block_idx);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
allocateStorageAndGetSemanticVoxelPtr(global_voxel_idx, semantic_config_.total_number_of_labels, &semantic_block, &semantic_block_idx);
allocateStorageAndGetSemanticVoxelPtr(global_voxel_idx, semantic_config_.total_number_of_labels_, &semantic_block, &semantic_block_idx);

nh_private.param("total_number_of_labels",
total_number_of_labels,
total_number_of_labels);
semantic_config.total_number_of_labels = static_cast<size_t>(total_number_of_labels);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
semantic_config.total_number_of_labels = static_cast<size_t>(total_number_of_labels);
semantic_config.total_number_of_labels = static_cast<size_t>(total_number_of_labels_);

@@ -48,6 +48,13 @@ getSemanticTsdfIntegratorConfigFromRosParam(const ros::NodeHandle& nh_private) {
semantic_config.semantic_measurement_probability_ =
static_cast<SemanticProbability>(semantic_measurement_probability);

// Get the total number of labels of the prediction
int total_number_of_labels = static_cast<int>(semantic_config.total_number_of_labels);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int total_number_of_labels = static_cast<int>(semantic_config.total_number_of_labels);
int total_number_of_labels = static_cast<int>(semantic_config.total_number_of_labels_);

@RozDavid
Copy link
Author

RozDavid commented Feb 9, 2021

Hey @ToniRV,

Thanks a lot for the feedback and also for the ideas.
I think it would be really nice to have this runtime operation, but you are also perfectly right this can't couse perfromance loss.

I am sadly not a pro here, but did a short research on Eigen Dynamic memory allocation and found this. Source.

Here is this constructor:

inline DenseStorage(int size, int rows, int) : m_data(internal::aligned_new<T>(size)), m_rows(rows) {}

Here, the m_data member is the actual array of coefficients of the matrix. As you see, it is dynamically allocated. Rather than calling new[] or malloc(), as you can see, we have our own internal::aligned_new defined in src/Core/util/Memory.h. What it does is that if vectorization is enabled, then it uses a platform-specific call to allocate a 128-bit-aligned array, as that is very useful for vectorization with both SSE2 and AltiVec. If vectorization is disabled, it amounts to the standard new[].

I believe if we don't change the directives EIGEN_DONT_ALIGN and EIGEN_MAX_ALIGN_BYTES (source) in the CMake build or with #defines, the memory allignment should be allright and the matrices will be automatically vectorized.

What do you think of this? Do you think if I replace the initialization with internal::aligned_new<T> that could bridge the performance gap?

@ToniRV
Copy link
Collaborator

ToniRV commented Feb 9, 2021

@RozDavid I also understood that when re-reading Eigen, the operations should already be vectorized... So it might well be we can't improve on that. It's fine for me, I think this adds a lot of flexibility anyway.
Another thing we should be careful is on the map size: having a 128-bit aligned array on a per-voxel basis may increase dramatically the size of the volumetric map. Maybe you could try to generate a map with and without this feature and see how many Mb is one with respect to the other? The rosservice save_map should just do that.

@RozDavid
Copy link
Author

RozDavid commented Feb 9, 2021

Hello @ToniRV,

I ran a few quick tests comparing the static and dynamic approaches with different number of labels for the probability matrices.

So when tested with static I recompiled the code with kTotalNumberOfLabels=128, changed the hardcoded prior to the apropriate number, and tested the same 21 + 128 setting with the dynamic sizes as well. Run the full rosbag, saved the layer in a vxblx file and copied the timings for the last pointcloud integration with the max. number of initialized blocks. The stats are copied to the end of this comment.

It was a bit surprising for me, that there is no difference between the layer sizes, only in the integration times. My interpretation is that Eigen allocates fixed sized memory up until a certain size of matrix (turns out this might be a bigger treshold than 128).

As all the vxblx file sizes are between 66.3 and 67.2 mb, the only difference here is the number of allocated blocks and about 15% performance loss in pointcloud integration.

Surely it depends on the use case if the flexibility is worth the performance loss or not, but I wanted to share this with you either way if you choose to merge or not.

The results can be compared here:

##########
Dynamic 21
##########

Vxblx file size: 67.1Mb

[ INFO] [1612890061.997756189, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612890062.089571026, 121.805000000]: Finished integrating in 0.091762 seconds, have 1207 blocks.
[ INFO] [1612890062.089766452, 121.805000000]: Timings: 
SM Timing
-----------
inserting_missed_blocks	    285	00.001599	(00.000006 +- 00.000001)	[00.000001,00.000136]
integrate/fast         	    285	27.095124	(00.095071 +- 00.004953)	[00.058792,00.243477]
mesh/publish           	    399	01.489049	(00.003732 +- 00.002256)	[00.000006,00.013048]
mesh/update            	    399	03.108085	(00.007790 +- 00.002853)	[00.000171,00.018809]
ptcloud_preprocess     	    285	02.596166	(00.009109 +- 00.000871)	[00.008302,00.036305]
remove_distant_blocks  	    285	00.019437	(00.000068 +- 00.000045)	[00.000004,00.000402]

[ INFO] [1612890062.089818329, 121.805000000]: Layer memory: 59385627
[ INFO] [1612890062.089849037, 121.805000000]: Updating mesh.

##########
Dynamic 128
##########

Vxblx file size: 66.3Mb

[ INFO] [1612889885.600096098, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612889886.008610556, 121.805000000]: Finished integrating in 0.408455 seconds, have 1205 blocks.
[ INFO] [1612889886.008808061, 121.805000000]: Timings: 
SM Timing
-----------
inserting_missed_blocks	     83	00.001354	(00.000016 +- 00.000028)	[00.000001,00.000178]
integrate/fast         	     83	36.546054	(00.440314 +- 00.047001)	[00.274163,00.750914]
mesh/publish           	     92	00.422405	(00.004591 +- 00.001759)	[00.000006,00.011521]
mesh/update            	     92	00.844049	(00.009174 +- 00.001211)	[00.000178,00.014639]
ptcloud_preprocess     	     83	00.757355	(00.009125 +- 00.001122)	[00.008222,00.022541]
remove_distant_blocks  	     83	00.006077	(00.000073 +- 00.000031)	[00.000002,00.000203]

[ INFO] [1612889886.008843359, 121.805000000]: Layer memory: 59287225

##########
Static 21
##########

Vxblx file size: 67.2Mb

[ INFO] [1612890454.482768384, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612890454.564294501, 121.805000000]: Finished integrating in 0.081479 seconds, have 1209 blocks.
[ INFO] [1612890454.564449636, 121.805000000]: Timings: 
SM Timing
-----------
inserting_missed_blocks	    290	00.001434	(00.000005 +- 00.000001)	[00.000001,00.000117]
integrate/fast         	    290	24.979322	(00.086136 +- 00.004859)	[00.053003,00.161756]
mesh/publish           	    493	01.521718	(00.003087 +- 00.003551)	[00.000007,00.008997]
mesh/update            	    493	03.285606	(00.006665 +- 00.005909)	[00.000138,00.017277]
ptcloud_preprocess     	    290	02.654694	(00.009154 +- 00.002964)	[00.008272,00.034363]
remove_distant_blocks  	    290	00.018787	(00.000065 +- 00.000037)	[00.000002,00.000652]

[ INFO] [1612890454.564478470, 121.805000000]: Layer memory: 59484029

##########
Static 128
##########

Vxblx file size: 66.5Mb

[ INFO] [1612891013.389336555, 121.805000000]: Integrating a pointcloud with 345600 points.
[ INFO] [1612891013.729743330, 121.805000000]: Finished integrating in 0.340356 seconds, have 1204 blocks.
[ INFO] [1612891013.729900746, 121.805000000]: Timings: 
SM Timing
-----------
inserting_missed_blocks	     97	00.001169	(00.000012 +- 00.000019)	[00.000001,00.000123]
integrate/fast         	     97	35.617008	(00.367186 +- 00.036256)	[00.230930,00.561777]
mesh/publish           	    106	00.547046	(00.005161 +- 00.001862)	[00.000006,00.012017]
mesh/update            	    106	01.092757	(00.010309 +- 00.001616)	[00.000182,00.018571]
ptcloud_preprocess     	     97	00.884389	(00.009117 +- 00.000556)	[00.008329,00.018017]
remove_distant_blocks  	     97	00.005814	(00.000060 +- 00.000037)	[00.000003,00.000304]

[ INFO] [1612891013.729936872, 121.805000000]: Layer memory: 59238024
[ INFO] [1612891013.729965223, 121.805000000]: Updating mesh.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants