-
Notifications
You must be signed in to change notification settings - Fork 68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failing to retrieve service type, brings node down #262
Comments
Internal tracking ticket: FG-4894 |
It is the broken bond that causes the node to go down, without it Regardless, I have submitted #263 which I think should make service type retrieval a lot more robust. Could you give it a test drive? Also, if you are not intending to use the |
Thank you for the prompt response @achim-k! That does look to have resolved the issue with the node crashing. |
Looks like its not fully fixed. We get another broken bond when we terminate a foxglove studio instance connected to this patched bridge. This prevents any subsequent connections.
|
Trying with services disabled + your fix branch, we get:
|
It seems that the bond between the nodelet manager and the nodelet loader breaks, causing foxglove bridge to shutdown. You could try changing the following line
to
to disable the bonding mechanism. Usually the bond should not break if the nodelet manager and nodelet loader are running on the same host (see also https://answers.ros.org/question/9700/nodelets-and-bond-timeouts/). If that's the case, try reducing the load by setting a |
### Public-Facing Changes Make ROS1 service type retrieval more robust ### Description For ROS1, foxglove bridge has to retrieve the service type from the service server (by opening a connection) as the ROS master does not store the service type. This patch makes service type retrieval more robust by - Fixing the service link object getting out of scope too early - Better exception handling - Allowing users to set a custom timeout for service type retrieval Fixes #262
Disabling the bond now has the nodelet manager calling out its death:
|
Is the log shortened? Is there any reason given why the node is dying? You can also try running the nodelet as a single node:
|
No that is the full log, no additional context given. I think the bond breaking was pointing to the fact that the code is in fact dying. I can try that. Have you been able to reproduce on your end? |
@achim-k running as standalone, provided us with a few minutes of up-time, but it ultimately crashed with the same output.
|
Could you launch with rosrun foxglove_bridge foxglove_bridge # _num_worker_threads:=4 _port:=8765
Unfortunately not 😕 Is there anything special about your setup (network setup, ros master on different host, ...)? # Path might be different on your system
gdb -ex=run --args ../../install_ros1/lib/foxglove_bridge/foxglove_bridge _num_worker_threads:=4 |
@achim-k The previous output is with the node being launched as so:
Thank you for trying, we are not doing anything special networking wise. All nodes are running on a single host with foxglove studio being on a separate developer machine(s). I will take gdb for a spin and report back what I find. |
@achim-k GDB backtrace:
Also here is the output of valgrind:
|
Great, there is definitely something wrong. I'm suspecting that the connection ptr is a null pointer, and we currently do not check that one:
Would you be able to get another backtrace with debug symbols so we know exactly in which line number the segfault happens? You can do that by building with |
backtrace with debug flags:
|
@carlosatrios could you give #265 a try? Judging from the backtrace, already ros::ServiceManager::createServiceServerLink returns a invalid pointer. This can happen due to the following reasons: In your case it's probably either 2. or 3. Maybe try enabling debug logging and see if you see any roscpp related error message. |
### Public-Facing Changes Fix invalid pointers not being caught ### Description - Fixes invalid pointers not being caught, leading to a segmentation fault as can be seen in #262 (comment) - Also exposes the `service_type_retrieval_timeout_ms` ROS1 parameter through the launch file and adds it to the readme (forgot that in #263) Fixes #262 (hopefully)
Your fix looks to have worked. Thank you for all the help! |
Description
The node aborts during repeated "Failed to retrieve service type" and crashes the node.
Steps To Reproduce
Launch foxglove-bridge along side instances of RVIZ that are hosted on second machine.
Expected Behavior
Exception is that the node continues on even if its unable to get specific services.
The text was updated successfully, but these errors were encountered: