Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GeoShape box format unclear #101

Closed
ashepherd opened this issue Apr 29, 2020 · 27 comments
Closed

GeoShape box format unclear #101

ashepherd opened this issue Apr 29, 2020 · 27 comments
Assignees
Labels
bug Something isn't working good first issue Good for newcomers Update Documentation updates to the guidance docs
Milestone

Comments

@ashepherd
Copy link
Member

The following NCEI dataset at the Google Dataset Search shows an "Area Covered" section with a map plot of the GeoShape box field.
Dataset: https://data.nodc.noaa.gov/cgi-bin/iso?id=gov.noaa.ngdc.mgg.dem:11503

Screen Shot 2020-04-29 at 9 10 46 AM

It appears they use the format for GeoShape box of "{south} {west} {north} {east}", and our guidelines specify the use of comma.

As an example of Area Covered not appearing for a Dataset using commas, see:
https://www.bco-dmo.org/dataset/753124

Compare using the Google Structured Data Testing Tool:

Credit: @atmickle

@ashepherd ashepherd added this to the v1.2 milestone Apr 29, 2020
@ashepherd ashepherd self-assigned this Apr 29, 2020
@ashepherd ashepherd added bug Something isn't working good first issue Good for newcomers Update Documentation updates to the guidance docs labels Apr 29, 2020
@ashepherd
Copy link
Member Author

related to schemaorg/schemaorg#1538

@fils
Copy link
Collaborator

fils commented Apr 29, 2020

@ashepherd I'd like to work with you on this to update this gist example to test some of these approaches..

https://gist.githubusercontent.com/fils/8738793069ae18fc368f04b2ace7118d/raw/7de6217d8f67e28f8593ac132c76954399c89307/spatialtest.jsonld

Currently that will frame correctly:
https://fence.gleaner.io/frame?url=https%3A%2F%2Fgist.githubusercontent.com%2Ffils%2F8738793069ae18fc368f04b2ace7118d%2Fraw%2F7de6217d8f67e28f8593ac132c76954399c89307%2Fspatialtest.jsonld&frame=spatial

But I can't frame out the spatial elements in your examples above. So I'd like to resolve that with you on this issue.

@smrgeoinfo
Copy link
Contributor

smrgeoinfo commented Apr 30, 2020

hmm, https://schema.org/GeoShape documentation has:
sdo:box type: text, "A box is the area enclosed by the rectangle formed by two points. The first point is the lower corner, the second point is the upper corner. A box is expressed as two points separated by a space character."
and on sdo:GeoShape "A GeoShape can be described using several properties whose values are based on latitude/longitude pairs. Either whitespace or commas can be used to separate latitude and longitude; whitespace should be used when writing a list of several such points."

they say 'latitude/longitude' pair, which would appear to imply Y X coordinate ordering. The given documentation apparently allows
Y1,X1 Y2,X2 OR
Y1 X1 Y2 X2
where Y1 X1 is 'lower corner' and Y2 X2 is 'upper corner'
As is typical of Schema.org documentation, there are various questions left unanswered-- is Y1 X1 the lower Eastern or Western corner??? Are longitudes (X values) reported as east or west longitude? Are values decimal degrees, or DMS? What spheroid is assumed for the lat/long values.

The documentation for sdo:GeoCoordinates asserts that lat and long should use WGS84 spheroid. It would be logical to propagate this to the convention for points in sdo:GeoShape

note discussion at schemaorg/schemaorg#1538

@ahayes
Copy link

ahayes commented May 4, 2020

Looking at https://schema.org/box

A box is the area enclosed by the rectangle formed by two points. The first point is the lower corner, the second point is the upper corner. A box is expressed as two points separated by a space character.

Lower and upper must be referring to numerical values. i.e. min and max values. So {minX minY "point"}{space character}{maxX maxY "point"}. This fits well with many geo software stack formats.

Point doesn't seem to be defined, but https://schema.org/latitude is stated as

The latitude of a location. For example 37.42242 (WGS 84).

https://schema.org/longitude is similarily

The longitude of a location. For example -122.08585 (WGS 84).

And as @smrgeoinfo mentioned, https://schema.org/GeoShape is

The geographic shape of a place. A GeoShape can be described using several properties whose values are based on latitude/longitude pairs. Either whitespace or commas can be used to separate latitude and longitude; whitespace should be used when writing a list of several such points.

So I think we can safely say that both "minX minY maxX maxY" and "minX,minY maxX,maxY" should be acceptable formats for box. For human readability it would probably be better to use the later although it would have been much nicer to use spaces between elements with fixed numbers of things (lon and lat forming a point, for example) and commas between elements when you can have an arbitrary number of them (N coordinates forming a shape, for example.) But I guess it is too late?

@ahayes
Copy link

ahayes commented May 4, 2020

Rereading, I think @ashepherd suggests Google may be using minY minX maxY maxX. That would be awkwardly different from tons of software systems out there that use lon (X) lat (Y) rather than lat lon.

@smrgeoinfo
Copy link
Contributor

smrgeoinfo commented May 8, 2020

@ahayes yes-- it certainly looks like google is doing Y, X not X, Y
Slight revision of the suggested language from schemaorg/schemaorg#1538

A box defines a rectangular area on the surface of the earth defined by point locations of the southwest corner and northeast corner of the rectangle in latitude-longitude coordinates. The coordinate reference system is assumed to be WGS 84. Point locations are comma-delimited tuples of {latitude, east longitude} (y, x), following WKT and GeoSparQL conventions; the two corner coordinate points are separated by a space. 'East longitude' means positive longitude values are east of the prime (Greenwich) meridian.
example:
"box": "-17.45,-149.8727 34.407,-64.6353"

NOTES: Special care must be taken for bounding boxes that cross the 180 longitude meridian; many spatial data processors will not correctly interpret the bounding coordinates even if they follow the southwest, northeast corner convention, resulting in boxes that span the circumference of the Earth, excluding the actual area of interest. For applications operating with data in the vicinity of longitude 180, testing is strongly recommended to determine if it works for bounding boxes crossing longitude 180; an alternative is to define two bounding boxes, one on each side of 180.

East longitude values can be reported 0 <= X <= 360 or -180 <= X <= 180. Some applications will fail under one or the other of these conventions. Recommendation is to use -180 <= X <= 180.

Following this recommendation, bounding boxes that cross the antimeridian at ±180° longitude, the West longitude value will be numerically greater than the East longitude value. For example, to describe Fiji the box might be
"box": "-19,176 -15,-178"

One other thing to be aware of is that the coordinate ordering used by Google -- Y (lat), X (long) is the reverse of various other commonly used conventions.

@ashepherd
Copy link
Member Author

@smrgeoinfo @ahayes if there's more about geospatial that we want to change, can we create a new issue for that and link to this discussion from there?

I think if we can update our guidelines to specify that Google Dataset search only display an in-set map when you follow their (unconventional, and not well-documented) format, that would help our adopters in the short-term. But I think you both have raised some good points for more inspection if you like?

@njarboe
Copy link
Collaborator

njarboe commented Nov 3, 2020

MagIC is describing boxes with spaces (latMin lonMin latMax lonMax):
Screen Shot 2020-11-02 at 4 08 00 PM

Google dataset search does not display the box. Only a single marker:
https://datasetsearch.research.google.com/search?query=10.1038%2FS41561-019-0443-2&docid=wO1WA0NjD92Edr%2F2AAAAAA%3D%3D
Screen Shot 2020-11-02 at 4 26 17 PM

@smrgeoinfo
Copy link
Contributor

smrgeoinfo commented Nov 3, 2020

Some updates--

Google dataset search displays bounding boxes encoded as two points, lower left (SW) then upper right (NE); points are encoded as decimal degrees lat long (Y X), presented as a space delimited list. e.g. This works in Google Dataset Search to display the bbox in the search result:
{"@type":"GeoShape","box": "26 -175 62 -42"}
{"@type":"GeoShape","box": "19.27 -158.14 21.37 -155.05"} hawaii
{"@type":"GeoShape","box":"-34.67479971 17.24863596 -30.16987477 21.41831817","elevation":"11.5m/41.1m"} south africa
{"@type":"GeoShape","box": "-126.0551, 30.0957 -111.6698, 44.3858"} west coast N. America
{"@type":"GeoShape","box": "-110.125, 32 -110, 32.125"} Arizona ??? works for X,Y comma separated???
{"@type":"GeoShape","box": "37.3, -27.8 37.3, -27.8"} east of south africa

doesn't display a bounding box in Google dataset search:
{"@type":"GeoShape","box":"-38.5 129 -26 141"} australia -- this appears to be correct Y X, SW then NE...
{"@type":"GeoShape","box":"-23.44915 151.89883 -23.43232 151.924951"} Heron Reef (Australia)
{"@type":"GeoShape", "box": "-82.0000 -57.9364 -24.0000 3.9272" } Peru, Bolivia

GeoRSS The original schema.org design was apparently adopted from GeoRSS (see this github discussion). The OGC GeoRSS spec states: "A bounding box is a rectangular region, often used to define the extents of a map or a rough area of interest. A box SHALL contain two space separate latitude-longitude coordinate pairs, with each pair separated by whitespace. The first pair defines the lower left corner of the box and the second pair defines the upper right corner of the box. example: <georss:box>42.943 -71.032 43.039 -69.856</georss:box>."

GeoJSON spec for position states "A position is an [JSON] array of numbers. There MUST be two or more elements. The first two elements are longitude and latitude, or easting and northing, precisely in that order and using decimal numbers."... for bbox: "The value of the bbox member MUST be an array of length 2*n where n is the number of dimensions represented in the contained geometries, with all axes of the most southwesterly point followed by all axes of the more northeasterly point."

THIS is opposite of Google order.

WKT: from OGC Spec 18-010r7: Requirement: The WKT representation of a shall be: <geographic bounding box> ::= <geographic bounding box keyword> <left delimiter> <**lower left latitude**> <wkt separator> <**lower left longitude**> <wkt separator> <**upper right latitude**> <wkt separator> <**upper right longitude**> <right delimiter>
"bounding box latitude coordinates shall be given in decimal degrees in the range -90 to +90, longitude coordinates shall be given in decimal degrees in the range -180 to +180 relative to the international reference meridian."

EXAMPLE 1 Geographic bounding box enveloping offshore Netherlands: BBOX[51.43,2.54,55.77,6.40]

EXAMPLE 2 Geographic bounding box enveloping offshore New Zealand and crossing the 180° meridian: BBOX[-55.95,160.60,-25.88,-171.20]

WKT : (from MS Bing docs) "The coordinates in a WKT shape are ordered as “longitude latitude” which is important to remember as this is the opposite convention used by Bing Maps. "

SO-- the Microsoft statement is not consistent with the OGC Spec, but Bing Maps apparently is. Google dataset search is consistent with OGC Spec 18-010r7 (WKT) and with OGC spec 17-002r1 (GeoRSS).

@smrgeoinfo
Copy link
Contributor

smrgeoinfo commented Nov 3, 2020

@njarboe -- very interesting. The point located on the map is not in any of the lat long locations in the data. Closest are the corners of box 4: 24.5 95.6 24.6 95.7

@smrgeoinfo
Copy link
Contributor

smrgeoinfo commented Nov 3, 2020

My conclusion:
use the OGC 18-010r7 WKT convention, with a space delimited list: Y1 X1 Y2 X2 where first point is SW or lower left, second point is NE or upper right.

This might not work in the southern hemisphere. I couldn't find anything in Google Data search that showed a map south of the equator...

@ashepherd
Copy link
Member Author

@njarboe that's interesting. Looking at the shadow DOM, the map is defined using a div with an attribute of:

data-geojson-string="{"type":"Feature","bbox":[94.6698,23.5466,96.6698,25.5466],"geometry":{"type":"GeometryCollection","geometries":[{"type":"Point","coordinates":[95.6698,24.5466]}]}}"

@njarboe
Copy link
Collaborator

njarboe commented Nov 3, 2020

Here is the link to the MagIC page for this contribution that has the schema.org header, if you want to see our full schema.org current layout. https://www2.earthref.org/MagIC/doi/10.1038/S41561-019-0443-2

@ashepherd That is a bit strange as 23.5466 is not in the our schema.org header for that data contribution.

@ahayes
Copy link

ahayes commented Nov 3, 2020

@njarboe Google calculating a centroid maybe?

@smrgeoinfo Sadly, OGC can't even agree across their specs. WMS and related web services specify bounding boxes as min_x min_y max_x max_y. Same idea with SW/NE corners but X Y rather than Y X. This is also how browsers handle screen coordinates so it makes some sense given the web focus.

I don't see any reason why it wouldn't work in the southern hemisphere unless Google was sloppy on their implementation. Other places that more commonly break in various implementations are boxes near/enclosing a pole or spanning 180 longitude. They pose a problem for people trying to come up with bounding boxes to type in too.

@dr-shorthair
Copy link
Collaborator

Other places that more commonly break in various implementations are boxes near/enclosing a pole or spanning 180 longitude.

Which is why the lower-left/upper-right approach is best, as it is unambiguous.

Re OGC confusion - yeah, OGC is a bottom-up organization, and the coordinating layer was not very effective 20 years ago.

@rduerr
Copy link
Collaborator

rduerr commented Nov 4, 2020 via email

@dr-shorthair
Copy link
Collaborator

Haha - yeah you are right @rduerr . I was only thinking about the 180 meridian.

Is there a good rule for bounding-box whose edges are meridians and parallels, that accommodates pole-crossing?

@rduerr
Copy link
Collaborator

rduerr commented Nov 4, 2020 via email

@dr-shorthair
Copy link
Collaborator

In that case it looks like 'bounding box' just can't be used for polar-crossing data.
Bounding boxes have edges that are meridians and parallels by definition.
If this doesn't apply, then your use-case is out of scope.

Yeah - I know that looks blunt, but this standard style of bounding box is so common and so useful everywhere else, I don't think a change to accommodate polar people is at all practical. .

@rduerr
Copy link
Collaborator

rduerr commented Nov 5, 2020 via email

@smrgeoinfo
Copy link
Contributor

smrgeoinfo commented Nov 5, 2020

I found a couple datasets in google data search that show bounding boxes in southern hemisphere and added to the list above. My empirical conclusion is that Google dataset search will plot a box:

  1. with a space delimited list: {Y1 X1 Y2 X2} where first point is SW or lower left, second point is NE or upper right,
  2. with a comma delimited pair of points {X1 Y1, X2 Y2} where first point is SW or lower left, second point is NE or upper right.

(X is longitude, Y is latitude)

If you're in a polar region, use a polygon to delimit the extent; Google dataset search won't display the GeoShape.
If you're south of the equator boxes in Australia and S. America didn't display; boxes in Africa did.... WTF?
No examples for boxes that cross the antimeridian.

I'm updating the text in #104 to align with these observations.

@smrgeoinfo smrgeoinfo changed the title GeoShape box format unlcear GeoShape box format unclear Nov 5, 2020
@mbjones
Copy link
Collaborator

mbjones commented Jan 28, 2021

These bounding box issues seem important, but we don't yet seem to have reached full conclusions on what to recommend. Especially given the polar BB discussion. And there is no PR yet with a proposed ADR and revisions yet, So, I am punting this to v1.3, but speak up if you'd like to finish it off now.

@mbjones mbjones modified the milestones: v1.2, v1.3 (possibly 2.0) Jan 28, 2021
@dr-shorthair
Copy link
Collaborator

@rduerr in practice what are the 'bounding box' requirements for polar regions?

Is it just the triangle with a pole as one vertex, a parallel as the opposite side, and two meridians?
i.e. three numbers plus a pole.

The classic bbox requires four numbers - usually lower-left, upper right.
The 'polar bbox' may just need a convention for the 'top right' (north pole) or 'bottom left' (south pole)
Does this work?

(50, 170) , (90, -130) is the triangle that would catch most of Alaska and crosses the 180 meridian.
(-90, 150) , (-60, -40) is the triangle that includes West Antactica and the peninsula, and crosses the antimeridian.

@smrgeoinfo
Copy link
Contributor

@mbjones see pull request #104

@smrgeoinfo
Copy link
Contributor

@rduerr @dr-shorthair in PR #104, suggestion is "For bounding boxes that include the north or south pole, schema:box will not work. Recommended practice is to use a schema:polygon to describe spatial location extents that include the poles. "

smrgeoinfo added a commit that referenced this issue Jan 28, 2021
@ashepherd ashepherd mentioned this issue Jan 29, 2021
8 tasks
@ashepherd
Copy link
Member Author

ashepherd commented Aug 28, 2021

Notes from 8/26/2021 meeting:

The strategy used by UN with Ocean InfoHub:

  1. use schema.org/spatialCoverage if you care about Google [ and other major search engines]

  2. For geoscience harvesters, use either:
    a. a Well-Known Text value with the predicate geosparql:asWKT with datatype of geosparql:wktLiteral
    b. a GeoJSON with a predicate of geosparql:asGeoSJON with datatype of geosparql:geoJSONLiteral
    c. for more complex geometries that would blow up the page weight of a schema.org snippet, there's a proposal to reference a GeoJSON document by URL. NOTE: This could follow the SOSO recommendation for linking metadata and supplemental documents. Though this assumes any GeoJSON document attached in this way is meant to serve as the spatial coverage, so we should think about this some more. Is there a more explicit predicate for linking a schema.org/Dataset to a GeoJSON document for its spatial coverage?

     :dataset-01 schema:subjectOf :geojson-01 .
     :geojson-01 a schema:DataDownload ;
         schema:encodingFormat "application/geo+json" ;
         schema:url "https://example.com/url/to/spatial-coverage-of-dataset-01.geojson" .
    

References:

@ashepherd
Copy link
Member Author

Merged in #104

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working good first issue Good for newcomers Update Documentation updates to the guidance docs
Projects
None yet
Development

No branches or pull requests

8 participants