[Dev] ElasticSearch's use of Spatial4j

classic Classic list List threaded Threaded
3 messages Options
Reply | Threaded
Open this post in threaded view
|

[Dev] ElasticSearch's use of Spatial4j

dsmiley
Hi Florian,

I've been looking at ElasticSearch a little bit to see how it uses Spatial4j and Lucene-spatial.  I've noticed you are heavily involved over there; I invite you to join the Spatial4j list (very low traffic) -- CC'ed to this email.  Your input on the API is really valued.  I've been hard at work improving Spatial4j for a 0.4 release in about a week or so.  In this point release, the APIs are largely unchanged from a backwards-compatibility point of view -- I'm leaving more disruptive changes to a subsequent release.  This release is mostly about adding a customizable WKT parser that doesn't use JTS's WKTReader.  There are also some related polygon handling improvements, and a lot of it is now configurable.  

I noticed that ES's BasePolygonBuilder is doing dateline fixing/adjustment on polygons.  I briefly reviewed it and, it looks somewhat similar to what Spatial4j's JtsGeometry does but perhaps it only shifts the polygon without slicing it into -180 to +180 segments as JtsGeometry does -- meaning when searching you need to shift, and it probably won't work if the polygon is so wide it wraps the dateline multiple times (i.e. a snake that wraps the pole multiple times) although that is admittedly practically speaking unlikely.  Can you comment?  Has JtsGeometry's algorithm ever not worked for you or someone?  Why is ElasticSearch doing its own approach here when, AFAIK it's already solved?

p.s. Spatial4j is at the pending/proposal stage of joining LocationTech: https://locationtech.org/proposals/spatial4j

~ David

_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] ElasticSearch's use of Spatial4j

Florian Schilling
Hi David,

I'm glad you notice my work at Elasticsearch and I like to join your mailing list (I just subscribed). Also I'm looking forward to the latest changes on your API. I already saw some improvement I like.
It's nice you mentioned the BasePolygonBuilder. I think my approach is a little different form yours since I was interested in the shape it self. So the algorithm I implemented decomposes the Polygons at the dateline and re-composes them to a multipolygon, if it's necessary. Note that this is done without shifting. So after this step we have a shape that not needs to be verified and shifted anymore and saves us a little performance in some cases. But this was just the first step of some improvements we planned. I think one next greater steps will be work on some internal support structures like triangulations and somehow loading geometries partially only.
I guess there are lots of things to discuss and ideas both side take benefit from.

-- Florian


On Jan 3, 2014, at 2:17 AM, [hidden email] wrote:

Hi Florian,

I've been looking at ElasticSearch a little bit to see how it uses Spatial4j and Lucene-spatial.  I've noticed you are heavily involved over there; I invite you to join the Spatial4j list (very low traffic) -- CC'ed to this email.  Your input on the API is really valued.  I've been hard at work improving Spatial4j for a 0.4 release in about a week or so.  In this point release, the APIs are largely unchanged from a backwards-compatibility point of view -- I'm leaving more disruptive changes to a subsequent release.  This release is mostly about adding a customizable WKT parser that doesn't use JTS's WKTReader.  There are also some related polygon handling improvements, and a lot of it is now configurable.  

I noticed that ES's BasePolygonBuilder is doing dateline fixing/adjustment on polygons.  I briefly reviewed it and, it looks somewhat similar to what Spatial4j's JtsGeometry does but perhaps it only shifts the polygon without slicing it into -180 to +180 segments as JtsGeometry does -- meaning when searching you need to shift, and it probably won't work if the polygon is so wide it wraps the dateline multiple times (i.e. a snake that wraps the pole multiple times) although that is admittedly practically speaking unlikely.  Can you comment?  Has JtsGeometry's algorithm ever not worked for you or someone?  Why is ElasticSearch doing its own approach here when, AFAIK it's already solved?

p.s. Spatial4j is at the pending/proposal stage of joining LocationTech: https://locationtech.org/proposals/spatial4j

~ David


_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] ElasticSearch's use of Spatial4j

dsmiley
Florian,

So you're saying you're doing this at the E.S. level not because it handles cases Spatial4j's JtsGeometry doesn't handle, but because your code is faster?  If so I think this would be a valuable contribution upstream into Spatial4j so that E.S. needn't get into these kind of low-level spatial details, letting it focus on the integration with search / information-retrieval / indexing.  My goal/focus with Spatial4j is for it to handle the spatial-centric code/algorithms so that Lucene-spatial and any other consumers need not worry about how to have a shape that spans the dateline or to compute how a rectangle intersects a circle, etc.

I'd like to review with you some additions to 0.4.  I'm trying to add a lot of configurability to Spatial4j because there are many ways apps may wish to configure the WKT handling.  Looking at the new options in SpatialContextFactory and JtsSpatialContextFactory example:
(I'm highlighting options that apply to WKT but they are also intended to be applied by code using Spatial4j that receives externally provided shapes, such as GeoJson and/or whatever format ElasticSearch receives)
* normWrapLongitude:  v0.3 used to always wrap latitudes and longitudes but I now believe this was a mistake.  By default coordinates outside of the standard geodetic range get an exception, but enabling this boolean will wrap a longitude.  coordinate normalization happens via the new SpatialContext.normX and normY methods.
* precisionModel and precisionScale:  You can now configure the JTS PrecisionModel choice, and it will get applied via normX & normY methods.
* dateLineRule:  (width180, ccwRect, or none).  See the DatelineRule enum at the bottom of the file.  v0.3 used to use ccwRect but that confused users as most spatial code doesn't honor the WKT spec on that issue, and so the default in 0.4 is width180.
* allowMultiOverlap: v0.3 used to union() geometry collections because it resolved some issues that I now believe were bad/questionable data Spatial4j's countries test data set.  in v0.4 that's disabled by default, and it turns out that v0.4's new ShapeCollection (a replacement for JTS's GeometryCollection) is more graceful here so the issue has gone away.
* validationRule: (See ValidationRule enum at the bottom of the file).  Validation can be turned off and it can be repaired using multiple approaches if it fails validation.
* autoPrepare: JtsGeometry will build a PreparedGeometry and use it to calculate intersections.  I haven't performance-tested it but I believe this will be a worthwhile trade-off.

So ideally, ElasticSearch would create a Map of String->String configuration options that it reads from a configuration file and feed it to SpatialContextFactory.makeSpatialContext(map, classLoader) to get the context.  Currently I see it referring to JtsSpatialContext.GEO (technically an old constructor with a boolean for geo which is now removed) but this means ES's context isn't configurable which is a shame. Upon reading x & y coordinates received externally, it should call SpatialContext.normX(x) and normY(y) which conditionally applies normalization based on the normWrapLongitude & precisionModel options (this is new to v0.4.).  Then, it should call SpatialContext.verifyX(x) & verifyY(y) which throw exceptions if it's out of range of the world boundaries.  At the point that ES has a JTS geometry and needs to create a JtsGeometry, it should probably more or less do what JtsWktShapeParser.makeShapeFromGeometry(geom) does.  Maybe I should make that method public but you could at least extend this class to re-use that method -- it applies datelineRule, allowMultiOverlap, validationRule, autoPrepare logic.  Additionally, if you process a polygon that might be a rectangle then you might want to call makeRectFromPoly(geom) which applies the datelineRule algorithm.

Looking to the near future (possibly today in fact), I want to add a Spatial4j abstraction of an internal binary shape codec -- not one designed for standard interoperability (like WKB) but something intended to only be read & written by Spatial4j.  The intention is that a Shape could be serialized to bytes and then Lucene-spatial could put it in DocValues.  In fact Spatial4j 0.3 has something here -- JtsShapeReadWriter has a couple methods that do this (not written by me; I suspect Ryan McKinley if not Chris Male).  I want to revamp this a little and critically add an abstraction that doesn't require JTS to reference the class and read/write non-JTS shapes.  Ultimately, one day it would be cool if it could choose a much more compact number of bytes that only uses the number of bytes needed per coordinate for a desired precision level, maybe with a delta encoding, thereby making the spatial data smaller and more likely to fit in the OS's disk cache. 

Florian, in what bigger picture do you imagine the triangulations and partial-geometry loading you speak of being applied?

~ David




On Fri, Jan 3, 2014 at 7:28 AM, Florian Schilling <[hidden email]> wrote:
Hi David,

I'm glad you notice my work at Elasticsearch and I like to join your mailing list (I just subscribed). Also I'm looking forward to the latest changes on your API. I already saw some improvement I like.
It's nice you mentioned the BasePolygonBuilder. I think my approach is a little different form yours since I was interested in the shape it self. So the algorithm I implemented decomposes the Polygons at the dateline and re-composes them to a multipolygon, if it's necessary. Note that this is done without shifting. So after this step we have a shape that not needs to be verified and shifted anymore and saves us a little performance in some cases. But this was just the first step of some improvements we planned. I think one next greater steps will be work on some internal support structures like triangulations and somehow loading geometries partially only.
I guess there are lots of things to discuss and ideas both side take benefit from.

-- Florian


On Jan 3, 2014, at 2:17 AM, [hidden email] wrote:

Hi Florian,

I've been looking at ElasticSearch a little bit to see how it uses Spatial4j and Lucene-spatial.  I've noticed you are heavily involved over there; I invite you to join the Spatial4j list (very low traffic) -- CC'ed to this email.  Your input on the API is really valued.  I've been hard at work improving Spatial4j for a 0.4 release in about a week or so.  In this point release, the APIs are largely unchanged from a backwards-compatibility point of view -- I'm leaving more disruptive changes to a subsequent release.  This release is mostly about adding a customizable WKT parser that doesn't use JTS's WKTReader.  There are also some related polygon handling improvements, and a lot of it is now configurable.  

I noticed that ES's BasePolygonBuilder is doing dateline fixing/adjustment on polygons.  I briefly reviewed it and, it looks somewhat similar to what Spatial4j's JtsGeometry does but perhaps it only shifts the polygon without slicing it into -180 to +180 segments as JtsGeometry does -- meaning when searching you need to shift, and it probably won't work if the polygon is so wide it wraps the dateline multiple times (i.e. a snake that wraps the pole multiple times) although that is admittedly practically speaking unlikely.  Can you comment?  Has JtsGeometry's algorithm ever not worked for you or someone?  Why is ElasticSearch doing its own approach here when, AFAIK it's already solved?

p.s. Spatial4j is at the pending/proposal stage of joining LocationTech: https://locationtech.org/proposals/spatial4j

~ David



_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com