-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Data races TThread TTimer TApplication #8365
Comments
I managed to reduce a bit the number of data races with the attached patch.
|
Is this still an issue after making the collection 'thread-safe'?
Does it say 'which' lock is still being held? |
Thanks for the reply. Yes, it is still an issue because the problem is in TList itself not being (completely) thread-safe even if UseRWLock is called. In particular, there is a suspicious Find attached the complete log. The issue is that when a timer is added, it calls FindObject: but at the same time, a new object might have been added in the meanwhile by the other thread: Maybe a mutex should be added inside the TList class to handle this? |
Side note: I was able to reproduce the same race and deadlock on different computers using:
|
Another comment: If I change the TThread::Printf to std::cout, the deadlock and data races disappear. So this points that the data race happens when Printf is called, which creates as TTimer that is somehow lost during the TObjLink creation. |
The race is likely unrelated to the dead lock but maybe that it is time to follow the comment's advise:
well unless we really still have:
do we? |
Well, that's what I suspect based on the Helgrind log, because Printf creates a TThreadTimer and then it's racing with NewLink: (Side note: the NewLink function appears also in several places in valgrind-root.supp, so that a related memory leak might be hidden.) |
Good point! The change is however non-trivial, because there is: |
In the last screen shot, the "no lock held" in the write is suspicious. It is suppose to have take the write lock in |
Yes, it's weird indeed. Well, I call ROOT::EnableThreadSafety before any thread starts, so that should be fine. the list is immediately made thread-safe in the function TSystem::Init,
I can try to rewrite it using an intermediary variable to lock it before assigning it to fTimers, to see if it changes something. |
If I do:
then I get this helgrind log, seems similar: |
This is the gdb output in Release mode:
And here in Debug mode:
again debug, but a different trace
And in another computer:
|
One hypothesis: there is lock is due to a race between TThread::Join and TThread::Printf. When stopping, the thread is asked to "join". After that, a Printf is issued, which creates a TThreadTimer and retrieves TThread::Self and TThread::GetThread before locking the mutex, and then it hangs at condimp->Wait. |
Describe the bug
I encounter some data races when using TThread. They are reported by helgrind, and I also see sometimes dead-locks, see full discussion on https://root-forum.cern.ch/t/trentrantrwlock-thread-lock-program-freezes/45116/14
--> TThread::fgXact is accessed unprotected, it could have been changed at the same time by XARequest:
and several more can be seen in helgrind.log in the forum post, or in helgrind.xml attached here. helgrind.xml.zip
When opening the XML with QtCreator, they are rendered nicely:
Expected behavior
No data races are found. Or they are added to helgrind-root.supp
To Reproduce
Setup
Additional context
https://root-forum.cern.ch/t/trentrantrwlock-thread-lock-program-freezes/45116/14
#8297
The text was updated successfully, but these errors were encountered: