createIndexableFields takes too long

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

createIndexableFields takes too long

danieldar
Hello,
In we have a software that does indexing over time and we ran into an issue with the createIndexableFields method.

We are indexing a large number of items and each items takes around 50 ms to create the fields.

we created a small test to reproduce this:

public void shouldNotTakeSoLong() {
    long time = 0;

    JtsSpatialContext ctx = new JtsSpatialContext(true);
    SpatialStrategy strategy = new RecursivePrefixTreeStrategy(new GeohashPrefixTree(ctx, GeohashPrefixTree.getMaxLevelsPossible()), "Poly");
    Shape poly = new JtsShapeReadWriter(ctx).readShape("POLYGON ((51.3601528748574 35.6391295403325, 51.36114826413 35.6391295403325, 51.3611482590921 35.6383205844678, 51.3601528798953 35.6383205844678, 51.3601528748574 35.6391295403325))");

    for (int i=0;i<10;i++) { // running this several times to alleviate the initialization costs
      long start = System.currentTimeMillis();
      strategy.createIndexableFields(poly);
      long end = System.currentTimeMillis();
      time = end - start;
    }

    if(time > 5) // check last time
        throw new RuntimeException("Should not take so long but took " + time + " ms");
  }

Any ideas how to improve this?

With thanks,
Daniel
Reply | Threaded
Open this post in threaded view
|

Re: createIndexableFields takes too long

danieldar
I include the test file to make it easierPolygonIndexingPerfTest.java
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] createIndexableFields takes too long

dsmiley
In reply to this post by danieldar
Hi Daniel,

First of all, this is the Spatial4j list, not the Lucene list.  Your question is about the spatial module in the latter.  Well, unless it comes down to how fast JtsGeometry.relate(rectangle) performs, which *might* be the slowest part.

The first thing I observed in your code is this:  When you use GeohashPrefixTree.getMaxLevelsPossible(), you are using the maximum level of precision possible, which is ridiculously precise -- it's basically the precision that a double gets you when used for the latitude and longitude.  Instead use something like 11 which gets you better than a meter precision.  The prefixTree has methods to help you calculate a suitable value based on your requirements.  There is a direct correlation to the amount of indexed terms and the length of the prefix tree for point data. And depending on how small "distErrPct" on PrefixTreeStrategy, it can effect small non-point shapes too.

Also, understand that non-point shapes are sort of pixelated to a precision based on the size of the shape, calculated with distErrPct.  Using the "spatial-demo" module in https://github.com/ryantxu/spatial-solr-sandbox will allow you to generate a KML file of this pretty easily and visualize it in Google Earth.  distErrPct is 2.5% of the bbox diagonal radius of the shape.  If that's more than you need, then raise it -- maybe 15% is fine for your needs?  Only you can know.

But to get to the bottom of it, yeah, it's probably pretty slow for polygons.  I'm sure indexing a rectangle of similar size would have a similar number of resulting indexed terms and yet take a fraction of the time.

If I had time I'd love to explore where the hot spots are but I'm swamped with work for months.

~ David

On Jan 28, 2013, at 6:10 AM, danieldar <[hidden email]> wrote:

> Hello,
> In we have a software that does indexing over time and we ran into an issue
> with the createIndexableFields method.
>
> We are indexing a large number of items and each items takes around 50 ms to
> create the fields.
>
> we created a small test to reproduce this:
>
> public void shouldNotTakeSoLong() {
>    long time = 0;
>
>    JtsSpatialContext ctx = new JtsSpatialContext(true);
>    SpatialStrategy strategy = new RecursivePrefixTreeStrategy(new
> GeohashPrefixTree(ctx, GeohashPrefixTree.getMaxLevelsPossible()), "Poly");
>    Shape poly = new JtsShapeReadWriter(ctx).readShape("POLYGON
> ((51.3601528748574 35.6391295403325, 51.36114826413 35.6391295403325,
> 51.3611482590921 35.6383205844678, 51.3601528798953 35.6383205844678,
> 51.3601528748574 35.6391295403325))");
>
>    for (int i=0;i<10;i++) { // running this several times to alleviate the
> initialization costs
>      long start = System.currentTimeMillis();
>      strategy.createIndexableFields(poly);
>      long end = System.currentTimeMillis();
>      time = end - start;
>    }
>
>    if(time > 5) // check last time
>        throw new RuntimeException("Should not take so long but took " +
> time + " ms");
>  }
>
> Any ideas how to improve this?
>
> With thanks,
> Daniel
>
>
>
> --
> View this message in context: http://spatial4j.16575.n6.nabble.com/createIndexableFields-takes-too-long-tp5001129.html
> Sent from the Spatial4j mailing list archive at Nabble.com.
> _______________________________________________
> dev mailing list
> [hidden email]
> http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com

_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] createIndexableFields takes too long

danieldar
Thank you very much for the quick replay.

It indeed seems to be an issue with the level.
Is there a table somewhere which lists the precision with each number of levels selected?
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] createIndexableFields takes too long

dsmiley
There's the table at Wikipedia: http://en.wikipedia.org/wiki/Geohash#Worked_example   I've got a local excel sheet that I used to show more data.

Instead of using a table, I suggest having Lucene spatial calculate a suitable level based on your requirements: grid.getLevelForDistance(distErr)


On Mon, Jan 28, 2013 at 10:27 AM, danieldar <[hidden email]> wrote:
Thank you very much for the quick replay.

It indeed seems to be an issue with the level.
Is there a table somewhere which lists the precision with each number of
levels selected?



--
View this message in context: http://spatial4j.16575.n6.nabble.com/createIndexableFields-takes-too-long-tp5001129p5001132.html
Sent from the Spatial4j mailing list archive at Nabble.com.
_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com


_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] createIndexableFields takes too long

danieldar
thank you very much