Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java-Implementation forgets element namespaces #902

Closed
rubiii opened this issue May 12, 2013 · 15 comments
Closed

Java-Implementation forgets element namespaces #902

rubiii opened this issue May 12, 2013 · 15 comments

Comments

@rubiii
Copy link

rubiii commented May 12, 2013

There is a difference between the latest C and Java version of Nokogiri related to namespaces
defined on child elements with a WSDL file I was testing.

The code to reproduce this problem can be found at rubiii/nokogiri_namespace_issue.
The Travis build shows how the spec only fails on JRuby.

Let me know if you need anything else.

@andrewhr
Copy link

I was able to reproduce this bug with a more simple XML example:

<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element xmlns:quer="http://api.geotrust.com/webtrust/query"/>
  <xs:element xmlns:quer="http://api.geotrust.com/webtrust/query"/>
</xs:schema>

If we iterate over some xpath like //xs:element asking for each element's #namespace only the first time we get the right answer. The behavior is correct with other ruby implementations.

@rubiii
Copy link
Author

rubiii commented May 12, 2013

@andrewhr it seems to be somehow related to iterating over a NodeSet. finding a single element and checking its namespaces works for me:

# works
document.root.at_xpath('./xs:element', 'xs' => 'http://www.w3.org/2001/XMLSchema')

@andrewhr
Copy link

@rubiii I don't think so. at_xpath uses iteration internally?

In the PR I've sent you I've used the xpath's last() and the issue still occurs.

@rubiii
Copy link
Author

rubiii commented May 27, 2013

@yokolet would you mind taking a look at this? it's a very annoying problem, but simple to reproduce.

@rubiii
Copy link
Author

rubiii commented May 27, 2013

updated the specs to verify this has nothing to do with xpath. it also happens when traversing.
but, somehow the namespace is only missing for the last element node in the example, not for its equal sibling.
hope this helps.

@rubiii
Copy link
Author

rubiii commented Jun 7, 2013

this appears to be related to the namespace cache. i don't really get it right now, but somehow cache entries are associated with a single ownerNode? and if the same prefix+href is already registered, it won't be added twice. which is how a cache should work i guess. but since cache entries belong to a single ownerNode, this check fails.

@rubiii
Copy link
Author

rubiii commented Jul 4, 2013

anything else i can do to get this resolved?

@rdingwell
Copy link

This may be related to the issue I discovered while tracking down a bug dealing the defautl namespace aliasing, and yes it is in the namespace cache. You can check it out here #940

@rubiii
Copy link
Author

rubiii commented Jul 19, 2013

@rdingwell i think i incorporated your fix into a fork of nokogiri, compiled the jar and included it into my test repository via bundler and my tests are still failing. i need to come up with a nokogiri testcase to make this a little easier. maybe you can run these tests to see if your change is realted to this problem.

@rdingwell
Copy link

I've been able to track this down a little further and there are some real problems with the NokogiriNamespaceCache and the way in which it is used. When parsing a document all of the namespaces are cached but the NokogiriNamespaceCache expects a prefix -> URI mapping to only occur once in a document.

So in your example the first element contains the namespace mapping but the second one does not because the NokogiriNamespaceCache already contains that prefix -> URI mapping and it is associated with the first node. The namespace caching in general does not look like it has been very well tested and storing it at the document level seems like a bad idea and storing it in a lazy loading fashion on the node seems like it would be far less error prone and simpler to implement. There wouldn't be a need for attempting to create a hashCode or mapping things to nodes and ensuring correct ownership. I;m not really convinced that what is there now would accurately handle re-defining a prefix to a different uri at a nested level, which is perfectly legal xml.

I'll see if there is something that I can put together seeing how I would really like some of the things that I have to run under jruby and currently cannot due to the namespace issue in the java impl of nokogiri.

BTW, heres the testcase equiv of your spec file.

require "helper"

module Nokogiri
  module XML
    class TestNamespacePreservation < Nokogiri::TestCase

      def setup
        @xml = Nokogiri.XML('
          <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema">
            <xs:element xmlns:quer="http://api.geotrust.com/webtrust/query"/>
            <xs:element xmlns:quer="http://api.geotrust.com/webtrust/query"/>
          </xs:schema>
        ')
      end

      def test_xpath   
          first = @xml.at_xpath('//xs:element', 'xs' => 'http://www.w3.org/2001/XMLSchema')
          last = @xml.at_xpath('//xs:element[last()]', 'xs' => 'http://www.w3.org/2001/XMLSchema')
          assert_equal 'http://api.geotrust.com/webtrust/query' , first.namespaces['xmlns:quer'], "Should contain quer namespace"
          assert_equal 'http://api.geotrust.com/webtrust/query' , last.namespaces['xmlns:quer'], "Should contain quer namespace"
      end

      def test_traversing   
          first = @xml.root.element_children.first
          last = @xml.root.element_children.last
          assert_equal 'http://api.geotrust.com/webtrust/query' , first.namespaces['xmlns:quer'], "Should contain quer namespace"
          assert_equal 'http://api.geotrust.com/webtrust/query' , last.namespaces['xmlns:quer'], "Should contain quer namespace"
      end
    end
  end
end

@rubiii
Copy link
Author

rubiii commented Jul 20, 2013

@rdingwell thanks for verifying my assumption!
i'd still like to get some kind of affirmation from the core team though.

@rdingwell
Copy link

Just to keep you updated I issued a pull request that if gets accepted should resolve your issue. #959

@rubiii
Copy link
Author

rubiii commented Aug 23, 2013

thanks @rdingwell

@yokolet
Copy link
Member

yokolet commented Apr 15, 2014

Hi,
This bug should be fixed on master. I added the same test file to the master, which passes.

@yokolet yokolet closed this as completed Apr 15, 2014
@rubiii
Copy link
Author

rubiii commented Apr 16, 2014

nice! thanks @yokolet

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants