[Dev] WKT, Buffered LineStrings, Circles, and back-compat

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|

[Dev] WKT, Buffered LineStrings, Circles, and back-compat

dsmiley
I've been working on the new WktShapeParser and I think it's in pretty good shape.  It's extensible, but not too complicated.  There are a few remaining issues related to it:
(sorry in advance, this is kinda long)

* EMPTY shapes are kinda experimental. More testing is needed but honestly I care very little about it; I just wanted the WKT parser to accept it.  

* Bufferring.  It'd be pretty nice to add a a pseudo-shape syntax that's really an operator on a shape it contains.  BUFFER(POINT(X Y), 2.34). This would correspond to a new method on Shape.  Then there's a question of how to handle the corners.  JTS's API shows a variety of buffering parameters BufferParameters  I've played with them a bit.  cap-style applies to line strings, joint-style applies to polygons.  By default, everything is effectively rounded which seems fine, so maybe not worry about these options now.   FYI BufferedLine (a Shape I added you may be unfamiliar with) is square-ended.

* In a similar vein to BUFFER, BBOX(shape) could be added.  It'd be nice some day to also add a bounding-circle.  Note JTS has an algorithm for this: c.v.j.algorithm.MinimumBoundingCircle

* Circles.  By circle, I don't mean a circular boundary / line, I mean the area encompassed by a circle.  The way to officially specify a circle in WKT is kinda crazy -- it involves something like this: CURVEPOLYGON(CIRCULARSTRING( .... 4 points... )).  And IMO it's problematic to specify it this way in a spherical model because a circle on a sphere has its farthest left & right points at an altitude slightly above the middle altitude between its top & bottom -- if the circle is centered somewhere above the equator.  Likewise flip that for below the equator.  Anyway that's a tangent; bottom line is there should be a simpler way.  To keep it simple, maybe just CIRCLE((X Y),D).    That ends up being equivalent to the buffer example earlier. I know we've talked about this before.  Higher up the stack (e.g. Solr / ES) I surmise it would be rare to even use such syntax because, say in Solr's case, there's a query parser, geofilt that directly has the notion of a circle as part of its parameters.  

* Error-checking shapes:  I plan on creating a ShapeFactory interface; many of the methods on context are implementations of that interface.  The context will provide a non-validating ShapeFactory instance, and one that wraps that one to add error checking. The context methods will delegate to the error checking one.  The NonValidating implementation could be used by, say, Rectangle.getCenter() and some other potential places where the consumer is certain it's right.  WKT should validate by default.  It may be really helpful to users to have a validating-mode for polygons such that if it isn't valid (likely due to difference in projection from the original polygon), that it will buffer(0) it -- a known JTS method in its FAQ to correct invalid polygons.

* WKT writing:  There is none, right now :-/   problem?  Even with a WKT writer, this brings up a related question/concern:  The round-trip-ability of reading a shape and then writing it.  Theoretically, if you read a shape then emit it, the output should be (nearly) identical to the original, except perhaps for case and # significant digits, etc.  But if some amount of optimization happens when a shape is created, say converting a single-element ShapeCollection to it's single component, or converting a 0-distance circle to its center (neither of these are done now, BTW), then the output isn't going to look like the original.  Similar is the modifications done to a Geometry that crosses the dateline.  Not necessarily a problem but if that were deemed "bad" to do automatically, then presumably there would be some optimize() method that returns an optimized version.  And FYI presently we don't "prepare" the JTS geometry in a JtsGeometry; we should probably do that.

* Backwards-compatibility:  The WkShapeParser is basically supposed to be the replacement for Shape Read/Writer.  ShapeReader handles "x y" and "y, x" and the legacy circle syntax, and the legacy rectangle syntax.  I think we've already agreed that Spatial4j isn't going to do that anymore; it'll just do WKT (plus WKT-looking extensions).  We could create a class to contain that syntax, LegacyShapeReader, and then clients (e.g. Solr) could use it.

* Longer term I like the idea of reading/writing Shapes to a binary stream.  It wouldn't work quite like WKT Shape parsing is done though... for example reading the binary shape wouldn't by default "validate" (i.e. do error checking) because it was already assumed to be valid when written.  That could easily be flipped with a switch though.  And the data written would be the optimized shapes, so that for example the bounding box wouldn't need to be re-computed.  And the numbers could all be offsets from a corner of the bounding-box, thereby reducing the precision (# bytes) needed to write many shapes).

~ David

_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] WKT, Buffered LineStrings, Circles, and back-compat

Chris Male
Our of the ether I will chime in here since bizarrely I still work with this stuff.


On Tue, Nov 12, 2013 at 5:02 AM, [hidden email] <[hidden email]> wrote:
I've been working on the new WktShapeParser and I think it's in pretty good shape.  It's extensible, but not too complicated.  There are a few remaining issues related to it:
(sorry in advance, this is kinda long)

It's looking great.


* EMPTY shapes are kinda experimental. More testing is needed but honestly I care very little about it; I just wanted the WKT parser to accept it.  

Seems low priority.
 

* Bufferring.  It'd be pretty nice to add a a pseudo-shape syntax that's really an operator on a shape it contains.  BUFFER(POINT(X Y), 2.34). This would correspond to a new method on Shape.  Then there's a question of how to handle the corners.  JTS's API shows a variety of buffering parameters BufferParameters  I've played with them a bit.  cap-style applies to line strings, joint-style applies to polygons.  By default, everything is effectively rounded which seems fine, so maybe not worry about these options now.   FYI BufferedLine (a Shape I added you may be unfamiliar with) is square-ended.

The actual building of the shape is done in the ShapeFactory right? So we could have a configuration property on the Factory which chose how it does the buffering
 
* In a similar vein to BUFFER, BBOX(shape) could be added.  It'd be nice some day to also add a bounding-circle.  Note JTS has an algorithm for this: c.v.j.algorithm.MinimumBoundingCircle

* Circles.  By circle, I don't mean a circular boundary / line, I mean the area encompassed by a circle.  The way to officially specify a circle in WKT is kinda crazy -- it involves something like this: CURVEPOLYGON(CIRCULARSTRING( .... 4 points... )).  And IMO it's problematic to specify it this way in a spherical model because a circle on a sphere has its farthest left & right points at an altitude slightly above the middle altitude between its top & bottom -- if the circle is centered somewhere above the equator.  Likewise flip that for below the equator.  Anyway that's a tangent; bottom line is there should be a simpler way.  To keep it simple, maybe just CIRCLE((X Y),D).    That ends up being equivalent to the buffer example earlier. I know we've talked about this before.  Higher up the stack (e.g. Solr / ES) I surmise it would be rare to even use such syntax because, say in Solr's case, there's a query parser, geofilt that directly has the notion of a circle as part of its parameters.  

I say stay away from Circles.
 

* Error-checking shapes:  I plan on creating a ShapeFactory interface; many of the methods on context are implementations of that interface.  The context will provide a non-validating ShapeFactory instance, and one that wraps that one to add error checking. The context methods will delegate to the error checking one.  The NonValidating implementation could be used by, say, Rectangle.getCenter() and some other potential places where the consumer is certain it's right.  WKT should validate by default.  It may be really helpful to users to have a validating-mode for polygons such that if it isn't valid (likely due to difference in projection from the original polygon), that it will buffer(0) it -- a known JTS method in its FAQ to correct invalid polygons.

+1
 

* WKT writing:  There is none, right now :-/   problem?  Even with a WKT writer, this brings up a related question/concern:  The round-trip-ability of reading a shape and then writing it.  Theoretically, if you read a shape then emit it, the output should be (nearly) identical to the original, except perhaps for case and # significant digits, etc.  But if some amount of optimization happens when a shape is created, say converting a single-element ShapeCollection to it's single component, or converting a 0-distance circle to its center (neither of these are done now, BTW), then the output isn't going to look like the original.  Similar is the modifications done to a Geometry that crosses the dateline.  Not necessarily a problem but if that were deemed "bad" to do automatically, then presumably there would be some optimize() method that returns an optimized version.  And FYI presently we don't "prepare" the JTS geometry in a JtsGeometry; we should probably do that.

I don't think we need to support a roundtrip.  Like Lucene we munge the data after its ingested into Spatial4J.
 

* Backwards-compatibility:  The WkShapeParser is basically supposed to be the replacement for Shape Read/Writer.  ShapeReader handles "x y" and "y, x" and the legacy circle syntax, and the legacy rectangle syntax.  I think we've already agreed that Spatial4j isn't going to do that anymore; it'll just do WKT (plus WKT-looking extensions).  We could create a class to contain that syntax, LegacyShapeReader, and then clients (e.g. Solr) could use it.

+1 to dumping the old non-standard behavior.
 

* Longer term I like the idea of reading/writing Shapes to a binary stream.  It wouldn't work quite like WKT Shape parsing is done though... for example reading the binary shape wouldn't by default "validate" (i.e. do error checking) because it was already assumed to be valid when written.  That could easily be flipped with a switch though.  And the data written would be the optimized shapes, so that for example the bounding box wouldn't need to be re-computed.  And the numbers could all be offsets from a corner of the bounding-box, thereby reducing the precision (# bytes) needed to write many shapes).

Could we use the shapefile binary format? It's a little clunky, but I have a parser for it already.
 

~ David

_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com



_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] WKT, Buffered LineStrings, Circles, and back-compat

dsmiley
Hi Chris!

On Mon, Nov 11, 2013 at 4:23 PM, Chris Male <[hidden email]> wrote:

 

* Longer term I like the idea of reading/writing Shapes to a binary stream.  It wouldn't work quite like WKT Shape parsing is done though... for example reading the binary shape wouldn't by default "validate" (i.e. do error checking) because it was already assumed to be valid when written.  That could easily be flipped with a switch though.  And the data written would be the optimized shapes, so that for example the bounding box wouldn't need to be re-computed.  And the numbers could all be offsets from a corner of the bounding-box, thereby reducing the precision (# bytes) needed to write many shapes).

Could we use the shapefile binary format? It's a little clunky, but I have a parser for it already.


Disclaimer: I haven't studied the shapefile binary format.  An obvious benefit of choosing a known binary format is interoperability but the intention I have with a binary I/O stream as articulated above is a purely internal/optimized representation.  One that has formats for shapes that are unlikely in a shapefile/WKB (e.g. circle, buffered-linestring).  Ultimately I plan to use it in Lucene DocValues akin to Spatial Solr Sandbox / LSE's "JtsGeometryStrategy".  That one uses WKB right now.  The field of use for this is definitely not interoperability with other spatial systems -- that clearly calls for a shapefile, I realize.  This isn't an either-or; there's totally room for both.  

~ David

_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] WKT, Buffered LineStrings, Circles, and back-compat

Chris Male
Obviously introducing new shapes (which I think is cool btw) presents a challenge when using an established format.  If you need any help with the shapefile format, we use it extensively at Palantir and I did some work with it for ElasticSearch.


On Wed, Nov 13, 2013 at 4:31 AM, [hidden email] <[hidden email]> wrote:
Hi Chris!

On Mon, Nov 11, 2013 at 4:23 PM, Chris Male <[hidden email]> wrote:

 

* Longer term I like the idea of reading/writing Shapes to a binary stream.  It wouldn't work quite like WKT Shape parsing is done though... for example reading the binary shape wouldn't by default "validate" (i.e. do error checking) because it was already assumed to be valid when written.  That could easily be flipped with a switch though.  And the data written would be the optimized shapes, so that for example the bounding box wouldn't need to be re-computed.  And the numbers could all be offsets from a corner of the bounding-box, thereby reducing the precision (# bytes) needed to write many shapes).

Could we use the shapefile binary format? It's a little clunky, but I have a parser for it already.


Disclaimer: I haven't studied the shapefile binary format.  An obvious benefit of choosing a known binary format is interoperability but the intention I have with a binary I/O stream as articulated above is a purely internal/optimized representation.  One that has formats for shapes that are unlikely in a shapefile/WKB (e.g. circle, buffered-linestring).  Ultimately I plan to use it in Lucene DocValues akin to Spatial Solr Sandbox / LSE's "JtsGeometryStrategy".  That one uses WKB right now.  The field of use for this is definitely not interoperability with other spatial systems -- that clearly calls for a shapefile, I realize.  This isn't an either-or; there's totally room for both.  

~ David

_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com



_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com