Re: [Dev] Spatial4n

classic Classic list List threaded Threaded
16 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

dsmiley@mitre.org
Administrator

On Apr 24, 2012, at 5:16 PM, Itamar Syn-Hershko wrote:

> Hi there,
>
> First, let me congratulate you on the work you did for Spatial search support for Lucene. I was just spending an hour or two reading about it, and it sure smells good.
>
> I'm a core developer for RavenDB, and we are having issues with spatial search, and this is what led me to your project. We use Lucene.NET which currently conforms to the 2.9.4 API. It is my intention to try and port Spatial4j to .NET and plug it as the spatial module for Lucene.NET to hopefully fix the issues we have with it.
>
> Before going ahead with this I was wondering: Are there any 3.x+ specific API calls you are making I'm going to need to work around while porting to the 2.9.4 API? Any other gotchas?

First of all, Spatial4j is supposed to be just about the shapes, not about indexing techniques for the shapes in a system such as Lucene.  That said, the code in Lucene 4's new spatial module originated here but is not in the current source tree any longer.  The Solr piece, as you can see is still here until it is transitioned as well.

For help in transitioning the code to Lucene 2.9.4, you may want to look at a utility class in SOLR-2155 that was used to port Lucene 4's  TermsEnum to Lucene 3 TermEnum:  https://github.com/dsmiley/SOLR-2155/blob/master/src/main/java/solr2155/lucene/TermsEnumCompatibility.java
I'm not sure what's in Lucene 2.9 in this regard.  But this utility class is really simple and it merely helped me port the code -- it's not required for you to port to 2.9.x.  You just may have to tredge through it, one piece at a time.  At least there are tests.

~ David

_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

Itamar Syn-Hershko
Thanks, going to have a look at this tomorrow.

On Wed, Apr 25, 2012 at 1:08 AM, Smiley, David W. <[hidden email]> wrote:

On Apr 24, <a href="tel:2012" value="+9722012">2012, at 5:16 PM, Itamar Syn-Hershko wrote:

> Hi there,
>
> First, let me congratulate you on the work you did for Spatial search support for Lucene. I was just spending an hour or two reading about it, and it sure smells good.
>
> I'm a core developer for RavenDB, and we are having issues with spatial search, and this is what led me to your project. We use Lucene.NET which currently conforms to the 2.9.4 API. It is my intention to try and port Spatial4j to .NET and plug it as the spatial module for Lucene.NET to hopefully fix the issues we have with it.
>
> Before going ahead with this I was wondering: Are there any 3.x+ specific API calls you are making I'm going to need to work around while porting to the 2.9.4 API? Any other gotchas?

First of all, Spatial4j is supposed to be just about the shapes, not about indexing techniques for the shapes in a system such as Lucene.  That said, the code in Lucene 4's new spatial module originated here but is not in the current source tree any longer.  The Solr piece, as you can see is still here until it is transitioned as well.

For help in transitioning the code to Lucene 2.9.4, you may want to look at a utility class in SOLR-<a href="tel:2155" value="+9722155">2155 that was used to port Lucene 4's  TermsEnum to Lucene 3 TermEnum:  https://github.com/dsmiley/SOLR-2155/blob/master/src/main/java/solr2155/lucene/TermsEnumCompatibility.java
I'm not sure what's in Lucene 2.9 in this regard.  But this utility class is really simple and it merely helped me port the code -- it's not required for you to port to 2.9.x.  You just may have to tredge through it, one piece at a time.  At least there are tests.

~ David




_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

Itamar Syn-Hershko
In reply to this post by dsmiley@mitre.org
Alright, I got Spatial4j ported to .NET and available at  https://github.com/synhershko/spatial4n . The tests should be ported shortly, but in the meantime I'm working on porting the Lucene integration.


I need your advice on a few issues regarding the Lucene spatial module. I'm asking here because the original authors are more likely to have good answers...

1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

2. What is the FunctionValues class? couldn't find it anywhere yet it is being returned from various ValueSource implementations.

3. Any other compatibility class you have available? :)

Thanks


On Wed, Apr 25, 2012 at 1:08 AM, Smiley, David W. <[hidden email]> wrote:

On Apr 24, <a href="tel:2012" value="+9722012">2012, at 5:16 PM, Itamar Syn-Hershko wrote:

> Hi there,
>
> First, let me congratulate you on the work you did for Spatial search support for Lucene. I was just spending an hour or two reading about it, and it sure smells good.
>
> I'm a core developer for RavenDB, and we are having issues with spatial search, and this is what led me to your project. We use Lucene.NET which currently conforms to the 2.9.4 API. It is my intention to try and port Spatial4j to .NET and plug it as the spatial module for Lucene.NET to hopefully fix the issues we have with it.
>
> Before going ahead with this I was wondering: Are there any 3.x+ specific API calls you are making I'm going to need to work around while porting to the 2.9.4 API? Any other gotchas?

First of all, Spatial4j is supposed to be just about the shapes, not about indexing techniques for the shapes in a system such as Lucene.  That said, the code in Lucene 4's new spatial module originated here but is not in the current source tree any longer.  The Solr piece, as you can see is still here until it is transitioned as well.

For help in transitioning the code to Lucene 2.9.4, you may want to look at a utility class in SOLR-<a href="tel:2155" value="+9722155">2155 that was used to port Lucene 4's  TermsEnum to Lucene 3 TermEnum:  https://github.com/dsmiley/SOLR-2155/blob/master/src/main/java/solr2155/lucene/TermsEnumCompatibility.java
I'm not sure what's in Lucene 2.9 in this regard.  But this utility class is really simple and it merely helped me port the code -- it's not required for you to port to 2.9.x.  You just may have to tredge through it, one piece at a time.  At least there are tests.

~ David




_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

dsmiley@mitre.org
Administrator
On Apr 27, 2012, at 7:27 AM, Itamar Syn-Hershko wrote:

Alright, I got Spatial4j ported to .NET and available at  https://github.com/synhershko/spatial4n . The tests should be ported shortly, but in the meantime I'm working on porting the Lucene integration.


I need your advice on a few issues regarding the Lucene spatial module. I'm asking here because the original authors are more likely to have good answers...

Of course; no prob.

1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

The old Lucene API for this stuff is to simply use a String.  The new API uses a ByteRef, which is a simple construct holding the byte array an offsets.  So you'll have to modify it accordingly; that part should be relatively easy compared to other issues.

2. What is the FunctionValues class? couldn't find it anywhere yet it is being returned from various ValueSource implementations.

That stuff gets complicated; it would help to explain the context.  In order to do sorting, all the values of what are sorted need to be in memory.  Lucene has the FieldCache but it doesn't support multi-value per field.  The ShapeFieldCache is a sub-part of the Lucene spatial model that keeps an in-memory cache of Points.  ValueSource & FunctionValues is a piece of the new Lucene API, originating from Solr, that return a primitive value (float, int, etc.) for a given document.  A ValueSource/FunctionValues might be a simple view of the indexed data for a document, or it may be computed based on other value sources, such as calculating geospatial distance.  A FunctionQuery bridges the gap between Queries as you know them and ValueSource to provide a score value.  You'll see this hooked up RecursivePrefixTreeStrategy.makeQuery().

I am not expert in ValueSource / FunctionValue / FunctionQuery -- I find it quite confusing and so I might be explaining it wrong.  But I am correct in that the objective of this subset of the codebase you are looking at is ultimately to support multi-value sort.

3. Any other compatibility class you have available? :)

No.

I'm sure this is all quite a bit of work.  Unfortunately for you, the Lucene spatial module code going forward is surely going to be restructured a bit; same for Spatial4j.

~ David


_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

Itamar Syn-Hershko
inline

I have a few more comments and questions, I'll send them separately

On Fri, Apr 27, 2012 at 8:26 PM, Smiley, David W. <[hidden email]> wrote:


1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

The old Lucene API for this stuff is to simply use a String.  The new API uses a ByteRef, which is a simple construct holding the byte array an offsets.  So you'll have to modify it accordingly; that part should be relatively easy compared to other issues.

So just to confirm: can I safely change this to work off of the term string alone?

I can see all that data is being saved for each Node and is being lazily evaluated, so I guess I can just go ahead and change ByteRef to Term and pass Node the term text directly (Intern'ed perhaps)?
 
2. What is the FunctionValues class? couldn't find it anywhere yet it is being returned from various ValueSource implementations.

That stuff gets complicated; it would help to explain the context.  In order to do sorting, all the values of what are sorted need to be in memory.  Lucene has the FieldCache but it doesn't support multi-value per field.  The ShapeFieldCache is a sub-part of the Lucene spatial model that keeps an in-memory cache of Points.  ValueSource & FunctionValues is a piece of the new Lucene API, originating from Solr, that return a primitive value (float, int, etc.) for a given document.  A ValueSource/FunctionValues might be a simple view of the indexed data for a document, or it may be computed based on other value sources, such as calculating geospatial distance.  A FunctionQuery bridges the gap between Queries as you know them and ValueSource to provide a score value.  You'll see this hooked up RecursivePrefixTreeStrategy.makeQuery().

I am not expert in ValueSource / FunctionValue / FunctionQuery -- I find it quite confusing and so I might be explaining it wrong.  But I am correct in that the objective of this subset of the codebase you are looking at is ultimately to support multi-value sort.

Thanks, need to get used to that new API.

I just ported FunctionQuery and confirmed ValueSource exist in Lucene.NET. Any idea where I can find the sources for FunctionValues ?
 

3. Any other compatibility class you have available? :)

No.

I'm sure this is all quite a bit of work.  Unfortunately for you, the Lucene spatial module code going forward is surely going to be restructured a bit; same for Spatial4j.

That doesn't really matter, once we get the spatial module working for .NET we can hold with those improvements and restructuring until the .NET version catches up with the latest API.

_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

dsmiley

On Apr 28, 2012, at 7:16 PM, Itamar Syn-Hershko wrote:

inline

I have a few more comments and questions, I'll send them separately

On Fri, Apr 27, 2012 at 8:26 PM, Smiley, David W. <[hidden email]> wrote:


1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

The old Lucene API for this stuff is to simply use a String.  The new API uses a ByteRef, which is a simple construct holding the byte array an offsets.  So you'll have to modify it accordingly; that part should be relatively easy compared to other issues.

So just to confirm: can I safely change this to work off of the term string alone?

Yes.

I can see all that data is being saved for each Node and is being lazily evaluated, so I guess I can just go ahead and change ByteRef to Term and pass Node the term text directly (Intern'ed perhaps)?

No intern'ing necessary.

I just ported FunctionQuery and confirmed ValueSource exist in Lucene.NET. Any idea where I can find the sources for FunctionValues ? 

No; I dunno.  You might want to consider implementing this, a cache of data to sort on, in a way that makes sense to you.  Although the API is hard to work with (ValueSource, etc.) the task at hand is simple, and perhaps you should not bother trying to conform to Lucene's API if you don't have to for what you're doing in RavenDB.

By the way, I fully admit the cache structure, an array (of size maxDoc) of Lists of Points, is hardly space-efficient nor GC friendly.  It's something simple that works for a use case that was someone else's, not mine.  When I decide to improve this, which is unlikely any time soon, I'd do it quite differently.  Firstly, it'll be segment based instead of index based.  And I would do a 2-pass algorithm:  The first pass would build out an array of point-lengths per document -- an array of integers, the length of which is maxDoc.  I might even index this so I can just read it in instead of full-scanning the terms and incrementing counters per term.  With this information, I can build an array of latitudes and longitudes and an offset lookup array mapping docId to the lat-lon index of the first value.

~ David


_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

Itamar Syn-Hershko
Is there, by any chance, a previous version of this source code that worked with the 3.x API (when there was no FunctionValues class)? If so, was it making the same assumptions or was it taking a different approach?

On Sun, Apr 29, 2012 at 7:22 AM, David Smiley <[hidden email]> wrote:

On Apr 28, <a href="tel:2012" value="+9722012" target="_blank">2012, at 7:16 PM, Itamar Syn-Hershko wrote:

inline

I have a few more comments and questions, I'll send them separately

On Fri, Apr 27, <a href="tel:2012" value="+9722012" target="_blank">2012 at 8:26 PM, Smiley, David W. <[hidden email]> wrote:


1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

The old Lucene API for this stuff is to simply use a String.  The new API uses a ByteRef, which is a simple construct holding the byte array an offsets.  So you'll have to modify it accordingly; that part should be relatively easy compared to other issues.

So just to confirm: can I safely change this to work off of the term string alone?

Yes.

I can see all that data is being saved for each Node and is being lazily evaluated, so I guess I can just go ahead and change ByteRef to Term and pass Node the term text directly (Intern'ed perhaps)?

No intern'ing necessary.

I just ported FunctionQuery and confirmed ValueSource exist in Lucene.NET. Any idea where I can find the sources for FunctionValues ? 

No; I dunno.  You might want to consider implementing this, a cache of data to sort on, in a way that makes sense to you.  Although the API is hard to work with (ValueSource, etc.) the task at hand is simple, and perhaps you should not bother trying to conform to Lucene's API if you don't have to for what you're doing in RavenDB.

By the way, I fully admit the cache structure, an array (of size maxDoc) of Lists of Points, is hardly space-efficient nor GC friendly.  It's something simple that works for a use case that was someone else's, not mine.  When I decide to improve this, which is unlikely any time soon, I'd do it quite differently.  Firstly, it'll be segment based instead of index based.  And I would do a 2-pass algorithm:  The first pass would build out an array of point-lengths per document -- an array of integers, the length of which is maxDoc.  I might even index this so I can just read it in instead of full-scanning the terms and incrementing counters per term.  With this information, I can build an array of latitudes and longitudes and an offset lookup array mapping docId to the lat-lon index of the first value.

~ David



_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

Chris Male
FunctionValues was known as DocValues in previous versions of Lucene.  It was renamed to FunctionValues in trunk to differentiate it from Column Stride Values, which are also known as DocValues (and were renamed to IndexDocValues).  Confusing I know.

On Mon, Apr 30, 2012 at 9:27 PM, Itamar Syn-Hershko <[hidden email]> wrote:
Is there, by any chance, a previous version of this source code that worked with the 3.x API (when there was no FunctionValues class)? If so, was it making the same assumptions or was it taking a different approach?


On Sun, Apr 29, 2012 at 7:22 AM, David Smiley <[hidden email]> wrote:

On Apr 28, <a href="tel:2012" value="+9722012" target="_blank">2012, at 7:16 PM, Itamar Syn-Hershko wrote:

inline

I have a few more comments and questions, I'll send them separately

On Fri, Apr 27, <a href="tel:2012" value="+9722012" target="_blank">2012 at 8:26 PM, Smiley, David W. <[hidden email]> wrote:


1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

The old Lucene API for this stuff is to simply use a String.  The new API uses a ByteRef, which is a simple construct holding the byte array an offsets.  So you'll have to modify it accordingly; that part should be relatively easy compared to other issues.

So just to confirm: can I safely change this to work off of the term string alone?

Yes.

I can see all that data is being saved for each Node and is being lazily evaluated, so I guess I can just go ahead and change ByteRef to Term and pass Node the term text directly (Intern'ed perhaps)?

No intern'ing necessary.

I just ported FunctionQuery and confirmed ValueSource exist in Lucene.NET. Any idea where I can find the sources for FunctionValues ? 

No; I dunno.  You might want to consider implementing this, a cache of data to sort on, in a way that makes sense to you.  Although the API is hard to work with (ValueSource, etc.) the task at hand is simple, and perhaps you should not bother trying to conform to Lucene's API if you don't have to for what you're doing in RavenDB.

By the way, I fully admit the cache structure, an array (of size maxDoc) of Lists of Points, is hardly space-efficient nor GC friendly.  It's something simple that works for a use case that was someone else's, not mine.  When I decide to improve this, which is unlikely any time soon, I'd do it quite differently.  Firstly, it'll be segment based instead of index based.  And I would do a 2-pass algorithm:  The first pass would build out an array of point-lengths per document -- an array of integers, the length of which is maxDoc.  I might even index this so I can just read it in instead of full-scanning the terms and incrementing counters per term.  With this information, I can build an array of latitudes and longitudes and an offset lookup array mapping docId to the lat-lon index of the first value.

~ David



_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com




--
Chris Male | Software Developer | DutchWorks | www.dutchworks.nl

_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

Itamar Syn-Hershko
Yeah I figured that out, thanks :)

On Tue, May 1, 2012 at 3:52 AM, Chris Male <[hidden email]> wrote:
FunctionValues was known as DocValues in previous versions of Lucene.  It was renamed to FunctionValues in trunk to differentiate it from Column Stride Values, which are also known as DocValues (and were renamed to IndexDocValues).  Confusing I know.

On Mon, Apr 30, <a href="tel:2012" value="+9722012" target="_blank">2012 at 9:27 PM, Itamar Syn-Hershko <[hidden email]> wrote:
Is there, by any chance, a previous version of this source code that worked with the 3.x API (when there was no FunctionValues class)? If so, was it making the same assumptions or was it taking a different approach?


On Sun, Apr 29, <a href="tel:2012" value="+9722012" target="_blank">2012 at 7:22 AM, David Smiley <[hidden email]> wrote:

On Apr 28, <a href="tel:2012" value="+9722012" target="_blank">2012, at 7:16 PM, Itamar Syn-Hershko wrote:

inline

I have a few more comments and questions, I'll send them separately

On Fri, Apr 27, <a href="tel:2012" value="+9722012" target="_blank">2012 at 8:26 PM, Smiley, David W. <[hidden email]> wrote:


1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

The old Lucene API for this stuff is to simply use a String.  The new API uses a ByteRef, which is a simple construct holding the byte array an offsets.  So you'll have to modify it accordingly; that part should be relatively easy compared to other issues.

So just to confirm: can I safely change this to work off of the term string alone?

Yes.

I can see all that data is being saved for each Node and is being lazily evaluated, so I guess I can just go ahead and change ByteRef to Term and pass Node the term text directly (Intern'ed perhaps)?

No intern'ing necessary.

I just ported FunctionQuery and confirmed ValueSource exist in Lucene.NET. Any idea where I can find the sources for FunctionValues ? 

No; I dunno.  You might want to consider implementing this, a cache of data to sort on, in a way that makes sense to you.  Although the API is hard to work with (ValueSource, etc.) the task at hand is simple, and perhaps you should not bother trying to conform to Lucene's API if you don't have to for what you're doing in RavenDB.

By the way, I fully admit the cache structure, an array (of size maxDoc) of Lists of Points, is hardly space-efficient nor GC friendly.  It's something simple that works for a use case that was someone else's, not mine.  When I decide to improve this, which is unlikely any time soon, I'd do it quite differently.  Firstly, it'll be segment based instead of index based.  And I would do a 2-pass algorithm:  The first pass would build out an array of point-lengths per document -- an array of integers, the length of which is maxDoc.  I might even index this so I can just read it in instead of full-scanning the terms and incrementing counters per term.  With this information, I can build an array of latitudes and longitudes and an offset lookup array mapping docId to the lat-lon index of the first value.

~ David



_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com




--
Chris Male | Software Developer | DutchWorks | www.dutchworks.nl

_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com



_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

Itamar Syn-Hershko
In reply to this post by Itamar Syn-Hershko
I ended up doing some more hacks to try and keep as much as possible of the original API, I'm now porting the tests so I can see if it is actually working.


On another matter - I couldn't find SimpleSpatialContext and SimpleSpatialContextFactory anywhere in the sources, only in the original LSP homepage, although it is being referenced by the tests. I'll be porting it from there, but thought you should know.

On Mon, Apr 30, 2012 at 12:27 PM, Itamar Syn-Hershko <[hidden email]> wrote:
Is there, by any chance, a previous version of this source code that worked with the 3.x API (when there was no FunctionValues class)? If so, was it making the same assumptions or was it taking a different approach?


On Sun, Apr 29, <a href="tel:2012" value="+9722012" target="_blank">2012 at 7:22 AM, David Smiley <[hidden email]> wrote:

On Apr 28, <a href="tel:2012" value="+9722012" target="_blank">2012, at 7:16 PM, Itamar Syn-Hershko wrote:

inline

I have a few more comments and questions, I'll send them separately

On Fri, Apr 27, <a href="tel:2012" value="+9722012" target="_blank">2012 at 8:26 PM, Smiley, David W. <[hidden email]> wrote:


1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

The old Lucene API for this stuff is to simply use a String.  The new API uses a ByteRef, which is a simple construct holding the byte array an offsets.  So you'll have to modify it accordingly; that part should be relatively easy compared to other issues.

So just to confirm: can I safely change this to work off of the term string alone?

Yes.

I can see all that data is being saved for each Node and is being lazily evaluated, so I guess I can just go ahead and change ByteRef to Term and pass Node the term text directly (Intern'ed perhaps)?

No intern'ing necessary.

I just ported FunctionQuery and confirmed ValueSource exist in Lucene.NET. Any idea where I can find the sources for FunctionValues ? 

No; I dunno.  You might want to consider implementing this, a cache of data to sort on, in a way that makes sense to you.  Although the API is hard to work with (ValueSource, etc.) the task at hand is simple, and perhaps you should not bother trying to conform to Lucene's API if you don't have to for what you're doing in RavenDB.

By the way, I fully admit the cache structure, an array (of size maxDoc) of Lists of Points, is hardly space-efficient nor GC friendly.  It's something simple that works for a use case that was someone else's, not mine.  When I decide to improve this, which is unlikely any time soon, I'd do it quite differently.  Firstly, it'll be segment based instead of index based.  And I would do a 2-pass algorithm:  The first pass would build out an array of point-lengths per document -- an array of integers, the length of which is maxDoc.  I might even index this so I can just read it in instead of full-scanning the terms and incrementing counters per term.  With this information, I can build an array of latitudes and longitudes and an offset lookup array mapping docId to the lat-lon index of the first value.

~ David




_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

dsmiley
Simple* got refactored to its base class so it's just SpatialContext & Factory

On Mon, May 7, 2012 at 2:06 AM, Itamar Syn-Hershko <[hidden email]> wrote:
I ended up doing some more hacks to try and keep as much as possible of the original API, I'm now porting the tests so I can see if it is actually working.


On another matter - I couldn't find SimpleSpatialContext and SimpleSpatialContextFactory anywhere in the sources, only in the original LSP homepage, although it is being referenced by the tests. I'll be porting it from there, but thought you should know.

On Mon, Apr 30, 2012 at 12:27 PM, Itamar Syn-Hershko <[hidden email]> wrote:
Is there, by any chance, a previous version of this source code that worked with the 3.x API (when there was no FunctionValues class)? If so, was it making the same assumptions or was it taking a different approach?


On Sun, Apr 29, <a href="tel:2012" value="+9722012" target="_blank">2012 at 7:22 AM, David Smiley <[hidden email]> wrote:

On Apr 28, <a href="tel:2012" value="+9722012" target="_blank">2012, at 7:16 PM, Itamar Syn-Hershko wrote:

inline

I have a few more comments and questions, I'll send them separately

On Fri, Apr 27, <a href="tel:2012" value="+9722012" target="_blank">2012 at 8:26 PM, Smiley, David W. <[hidden email]> wrote:


1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

The old Lucene API for this stuff is to simply use a String.  The new API uses a ByteRef, which is a simple construct holding the byte array an offsets.  So you'll have to modify it accordingly; that part should be relatively easy compared to other issues.

So just to confirm: can I safely change this to work off of the term string alone?

Yes.

I can see all that data is being saved for each Node and is being lazily evaluated, so I guess I can just go ahead and change ByteRef to Term and pass Node the term text directly (Intern'ed perhaps)?

No intern'ing necessary.

I just ported FunctionQuery and confirmed ValueSource exist in Lucene.NET. Any idea where I can find the sources for FunctionValues ? 

No; I dunno.  You might want to consider implementing this, a cache of data to sort on, in a way that makes sense to you.  Although the API is hard to work with (ValueSource, etc.) the task at hand is simple, and perhaps you should not bother trying to conform to Lucene's API if you don't have to for what you're doing in RavenDB.

By the way, I fully admit the cache structure, an array (of size maxDoc) of Lists of Points, is hardly space-efficient nor GC friendly.  It's something simple that works for a use case that was someone else's, not mine.  When I decide to improve this, which is unlikely any time soon, I'd do it quite differently.  Firstly, it'll be segment based instead of index based.  And I would do a 2-pass algorithm:  The first pass would build out an array of point-lengths per document -- an array of integers, the length of which is maxDoc.  I might even index this so I can just read it in instead of full-scanning the terms and incrementing counters per term.  With this information, I can build an array of latitudes and longitudes and an offset lookup array mapping docId to the lat-lon index of the first value.

~ David





_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

Itamar Syn-Hershko
Ok, just finished porting everything. I have one test failing out of the 5 tests of the Lucene integration part. I'm checking now to see what's going on exactly, but basically I'm pretty much in the dark

I was wondering if you guys can add more tests - at least sanity checks. The previous spatial contrib has some fairly good tests, and this new implementation has very sparse tests

On Tue, May 8, 2012 at 6:16 AM, [hidden email] <[hidden email]> wrote:
Simple* got refactored to its base class so it's just SpatialContext & Factory


On Mon, May 7, <a href="tel:2012" value="+9722012" target="_blank">2012 at 2:06 AM, Itamar Syn-Hershko <[hidden email]> wrote:
I ended up doing some more hacks to try and keep as much as possible of the original API, I'm now porting the tests so I can see if it is actually working.


On another matter - I couldn't find SimpleSpatialContext and SimpleSpatialContextFactory anywhere in the sources, only in the original LSP homepage, although it is being referenced by the tests. I'll be porting it from there, but thought you should know.

On Mon, Apr 30, <a href="tel:2012" value="+9722012" target="_blank">2012 at 12:27 PM, Itamar Syn-Hershko <[hidden email]> wrote:
Is there, by any chance, a previous version of this source code that worked with the 3.x API (when there was no FunctionValues class)? If so, was it making the same assumptions or was it taking a different approach?


On Sun, Apr 29, <a href="tel:2012" value="+9722012" target="_blank">2012 at 7:22 AM, David Smiley <[hidden email]> wrote:

On Apr 28, <a href="tel:2012" value="+9722012" target="_blank">2012, at 7:16 PM, Itamar Syn-Hershko wrote:

inline

I have a few more comments and questions, I'll send them separately

On Fri, Apr 27, <a href="tel:2012" value="+9722012" target="_blank">2012 at 8:26 PM, Smiley, David W. <[hidden email]> wrote:


1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

The old Lucene API for this stuff is to simply use a String.  The new API uses a ByteRef, which is a simple construct holding the byte array an offsets.  So you'll have to modify it accordingly; that part should be relatively easy compared to other issues.

So just to confirm: can I safely change this to work off of the term string alone?

Yes.

I can see all that data is being saved for each Node and is being lazily evaluated, so I guess I can just go ahead and change ByteRef to Term and pass Node the term text directly (Intern'ed perhaps)?

No intern'ing necessary.

I just ported FunctionQuery and confirmed ValueSource exist in Lucene.NET. Any idea where I can find the sources for FunctionValues ? 

No; I dunno.  You might want to consider implementing this, a cache of data to sort on, in a way that makes sense to you.  Although the API is hard to work with (ValueSource, etc.) the task at hand is simple, and perhaps you should not bother trying to conform to Lucene's API if you don't have to for what you're doing in RavenDB.

By the way, I fully admit the cache structure, an array (of size maxDoc) of Lists of Points, is hardly space-efficient nor GC friendly.  It's something simple that works for a use case that was someone else's, not mine.  When I decide to improve this, which is unlikely any time soon, I'd do it quite differently.  Firstly, it'll be segment based instead of index based.  And I would do a 2-pass algorithm:  The first pass would build out an array of point-lengths per document -- an array of integers, the length of which is maxDoc.  I might even index this so I can just read it in instead of full-scanning the terms and incrementing counters per term.  With this information, I can build an array of latitudes and longitudes and an offset lookup array mapping docId to the lat-lon index of the first value.

~ David






_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

dsmiley
Itamar,

I'm glad the porting is going well.

RE tests, there are more tests you probably haven't seen at the Solr level because I found testing there easier, but I think I should have done those at the Lucene spatial level.  Still, of course, it could use more tests.  If something in particular comes to mind, then let me know.

~ David

On May 10, 2012, at 9:34 PM, Itamar Syn-Hershko wrote:

Ok, just finished porting everything. I have one test failing out of the 5 tests of the Lucene integration part. I'm checking now to see what's going on exactly, but basically I'm pretty much in the dark

I was wondering if you guys can add more tests - at least sanity checks. The previous spatial contrib has some fairly good tests, and this new implementation has very sparse tests

On Tue, May 8, 2012 at 6:16 AM, [hidden email] <[hidden email]> wrote:
Simple* got refactored to its base class so it's just SpatialContext & Factory


On Mon, May 7, <a href="tel:2012" value="+9722012" target="_blank">2012 at 2:06 AM, Itamar Syn-Hershko <[hidden email]> wrote:
I ended up doing some more hacks to try and keep as much as possible of the original API, I'm now porting the tests so I can see if it is actually working.


On another matter - I couldn't find SimpleSpatialContext and SimpleSpatialContextFactory anywhere in the sources, only in the original LSP homepage, although it is being referenced by the tests. I'll be porting it from there, but thought you should know.

On Mon, Apr 30, <a href="tel:2012" value="+9722012" target="_blank">2012 at 12:27 PM, Itamar Syn-Hershko <[hidden email]> wrote:
Is there, by any chance, a previous version of this source code that worked with the 3.x API (when there was no FunctionValues class)? If so, was it making the same assumptions or was it taking a different approach?


On Sun, Apr 29, <a href="tel:2012" value="+9722012" target="_blank">2012 at 7:22 AM, David Smiley <[hidden email]> wrote:

On Apr 28, <a href="tel:2012" value="+9722012" target="_blank">2012, at 7:16 PM, Itamar Syn-Hershko wrote:

inline

I have a few more comments and questions, I'll send them separately

On Fri, Apr 27, <a href="tel:2012" value="+9722012" target="_blank">2012 at 8:26 PM, Smiley, David W. <[hidden email]> wrote:


1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

The old Lucene API for this stuff is to simply use a String.  The new API uses a ByteRef, which is a simple construct holding the byte array an offsets.  So you'll have to modify it accordingly; that part should be relatively easy compared to other issues.

So just to confirm: can I safely change this to work off of the term string alone?

Yes.

I can see all that data is being saved for each Node and is being lazily evaluated, so I guess I can just go ahead and change ByteRef to Term and pass Node the term text directly (Intern'ed perhaps)?

No intern'ing necessary.

I just ported FunctionQuery and confirmed ValueSource exist in Lucene.NET. Any idea where I can find the sources for FunctionValues ? 

No; I dunno.  You might want to consider implementing this, a cache of data to sort on, in a way that makes sense to you.  Although the API is hard to work with (ValueSource, etc.) the task at hand is simple, and perhaps you should not bother trying to conform to Lucene's API if you don't have to for what you're doing in RavenDB.

By the way, I fully admit the cache structure, an array (of size maxDoc) of Lists of Points, is hardly space-efficient nor GC friendly.  It's something simple that works for a use case that was someone else's, not mine.  When I decide to improve this, which is unlikely any time soon, I'd do it quite differently.  Firstly, it'll be segment based instead of index based.  And I would do a 2-pass algorithm:  The first pass would build out an array of point-lengths per document -- an array of integers, the length of which is maxDoc.  I might even index this so I can just read it in instead of full-scanning the terms and incrementing counters per term.  With this information, I can build an array of latitudes and longitudes and an offset lookup array mapping docId to the lat-lon index of the first value.

~ David







_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

Itamar Syn-Hershko
Ok, I'm a bit confused

I assume you are referring to the spatial4j-solr folder which I indeed haven't ported. Are the files there needed to achieve spatial search with Lucene?

Any chance you'll be porting those solr tests to the lucene-solr spatial module as Lucene tests?

Tests that do come to mind are the old Spatial tests from the previous spatial module - they were quite handy in helping to figure out how to use the spatial module correctly, and also tested some real scenarios. I have a few other real scenarios that I'm planning on adding as well.

On Fri, May 11, <a href="tel:2012" value="+9722012" target="_blank">2012 at 6:20 AM, David Smiley <[hidden email]> wrote:
Itamar,

I'm glad the porting is going well.

RE tests, there are more tests you probably haven't seen at the Solr level because I found testing there easier, but I think I should have done those at the Lucene spatial level.  Still, of course, it could use more tests.  If something in particular comes to mind, then let me know.

~ David

On May 10, <a href="tel:2012" value="+9722012" target="_blank">2012, at 9:34 PM, Itamar Syn-Hershko wrote:

Ok, just finished porting everything. I have one test failing out of the 5 tests of the Lucene integration part. I'm checking now to see what's going on exactly, but basically I'm pretty much in the dark

I was wondering if you guys can add more tests - at least sanity checks. The previous spatial contrib has some fairly good tests, and this new implementation has very sparse tests

On Tue, May 8, <a href="tel:2012" value="+9722012" target="_blank">2012 at 6:16 AM, [hidden email] <[hidden email]> wrote:
Simple* got refactored to its base class so it's just SpatialContext & Factory


On Mon, May 7, <a href="tel:2012" value="+9722012" target="_blank">2012 at 2:06 AM, Itamar Syn-Hershko <[hidden email]> wrote:
I ended up doing some more hacks to try and keep as much as possible of the original API, I'm now porting the tests so I can see if it is actually working.


On another matter - I couldn't find SimpleSpatialContext and SimpleSpatialContextFactory anywhere in the sources, only in the original LSP homepage, although it is being referenced by the tests. I'll be porting it from there, but thought you should know.

On Mon, Apr 30, <a href="tel:2012" value="+9722012" target="_blank">2012 at 12:27 PM, Itamar Syn-Hershko <[hidden email]> wrote:
Is there, by any chance, a previous version of this source code that worked with the 3.x API (when there was no FunctionValues class)? If so, was it making the same assumptions or was it taking a different approach?


On Sun, Apr 29, <a href="tel:2012" value="+9722012" target="_blank">2012 at 7:22 AM, David Smiley <[hidden email]> wrote:

On Apr 28, <a href="tel:2012" value="+9722012" target="_blank">2012, at 7:16 PM, Itamar Syn-Hershko wrote:

inline

I have a few more comments and questions, I'll send them separately

On Fri, Apr 27, <a href="tel:2012" value="+9722012" target="_blank">2012 at 8:26 PM, Smiley, David W. <[hidden email]> wrote:


1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

The old Lucene API for this stuff is to simply use a String.  The new API uses a ByteRef, which is a simple construct holding the byte array an offsets.  So you'll have to modify it accordingly; that part should be relatively easy compared to other issues.

So just to confirm: can I safely change this to work off of the term string alone?

Yes.

I can see all that data is being saved for each Node and is being lazily evaluated, so I guess I can just go ahead and change ByteRef to Term and pass Node the term text directly (Intern'ed perhaps)?

No intern'ing necessary.

I just ported FunctionQuery and confirmed ValueSource exist in Lucene.NET. Any idea where I can find the sources for FunctionValues ? 

No; I dunno.  You might want to consider implementing this, a cache of data to sort on, in a way that makes sense to you.  Although the API is hard to work with (ValueSource, etc.) the task at hand is simple, and perhaps you should not bother trying to conform to Lucene's API if you don't have to for what you're doing in RavenDB.

By the way, I fully admit the cache structure, an array (of size maxDoc) of Lists of Points, is hardly space-efficient nor GC friendly.  It's something simple that works for a use case that was someone else's, not mine.  When I decide to improve this, which is unlikely any time soon, I'd do it quite differently.  Firstly, it'll be segment based instead of index based.  And I would do a 2-pass algorithm:  The first pass would build out an array of point-lengths per document -- an array of integers, the length of which is maxDoc.  I might even index this so I can just read it in instead of full-scanning the terms and incrementing counters per term.  With this information, I can build an array of latitudes and longitudes and an offset lookup array mapping docId to the lat-lon index of the first value.

~ David








_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

dsmiley


On Fri, May 11, 2012 at 7:01 AM, Itamar Syn-Hershko <[hidden email]> wrote:
Ok, I'm a bit confused

I assume you are referring to the spatial4j-solr folder which I indeed haven't ported. Are the files there needed to achieve spatial search with Lucene?

spatial4j-solr is indeed what I am referring to.  They are definitely *not* needed to achieve spatial search with Lucene.  It's comprised of fairly straight-forward Solr field-type adapters.
 

Any chance you'll be porting those solr tests to the lucene-solr spatial module as Lucene tests?

Well there's a strong chance I'll improve the tests; I'm not yet sure if the low-hanging fruit is to port tests or to create new ones.  Before either happens, next on the agenda is working towards getting spatial4j-solr into Solr trunk.  There's a bit of synchronizing of refactorings that Ryan has done first, though.
 
Tests that do come to mind are the old Spatial tests from the previous spatial module - they were quite handy in helping to figure out how to use the spatial module correctly, and also tested some real scenarios. I have a few other real scenarios that I'm planning on adding as well.

When I next look at improving the tests, I will first look at these old spatial4j-contrib tests to see if they are worth emulating.
 

On Fri, May 11, <a href="tel:2012" value="+9722012" target="_blank">2012 at 6:20 AM, David Smiley <[hidden email]> wrote:
Itamar,

I'm glad the porting is going well.

RE tests, there are more tests you probably haven't seen at the Solr level because I found testing there easier, but I think I should have done those at the Lucene spatial level.  Still, of course, it could use more tests.  If something in particular comes to mind, then let me know.

~ David

On May 10, <a href="tel:2012" value="+9722012" target="_blank">2012, at 9:34 PM, Itamar Syn-Hershko wrote:

Ok, just finished porting everything. I have one test failing out of the 5 tests of the Lucene integration part. I'm checking now to see what's going on exactly, but basically I'm pretty much in the dark

I was wondering if you guys can add more tests - at least sanity checks. The previous spatial contrib has some fairly good tests, and this new implementation has very sparse tests

On Tue, May 8, <a href="tel:2012" value="+9722012" target="_blank">2012 at 6:16 AM, [hidden email] <[hidden email]> wrote:
Simple* got refactored to its base class so it's just SpatialContext & Factory


On Mon, May 7, <a href="tel:2012" value="+9722012" target="_blank">2012 at 2:06 AM, Itamar Syn-Hershko <[hidden email]> wrote:
I ended up doing some more hacks to try and keep as much as possible of the original API, I'm now porting the tests so I can see if it is actually working.


On another matter - I couldn't find SimpleSpatialContext and SimpleSpatialContextFactory anywhere in the sources, only in the original LSP homepage, although it is being referenced by the tests. I'll be porting it from there, but thought you should know.

On Mon, Apr 30, <a href="tel:2012" value="+9722012" target="_blank">2012 at 12:27 PM, Itamar Syn-Hershko <[hidden email]> wrote:
Is there, by any chance, a previous version of this source code that worked with the 3.x API (when there was no FunctionValues class)? If so, was it making the same assumptions or was it taking a different approach?


On Sun, Apr 29, <a href="tel:2012" value="+9722012" target="_blank">2012 at 7:22 AM, David Smiley <[hidden email]> wrote:

On Apr 28, <a href="tel:2012" value="+9722012" target="_blank">2012, at 7:16 PM, Itamar Syn-Hershko wrote:

inline

I have a few more comments and questions, I'll send them separately

On Fri, Apr 27, <a href="tel:2012" value="+9722012" target="_blank">2012 at 8:26 PM, Smiley, David W. <[hidden email]> wrote:


1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

The old Lucene API for this stuff is to simply use a String.  The new API uses a ByteRef, which is a simple construct holding the byte array an offsets.  So you'll have to modify it accordingly; that part should be relatively easy compared to other issues.

So just to confirm: can I safely change this to work off of the term string alone?

Yes.

I can see all that data is being saved for each Node and is being lazily evaluated, so I guess I can just go ahead and change ByteRef to Term and pass Node the term text directly (Intern'ed perhaps)?

No intern'ing necessary.

I just ported FunctionQuery and confirmed ValueSource exist in Lucene.NET. Any idea where I can find the sources for FunctionValues ? 

No; I dunno.  You might want to consider implementing this, a cache of data to sort on, in a way that makes sense to you.  Although the API is hard to work with (ValueSource, etc.) the task at hand is simple, and perhaps you should not bother trying to conform to Lucene's API if you don't have to for what you're doing in RavenDB.

By the way, I fully admit the cache structure, an array (of size maxDoc) of Lists of Points, is hardly space-efficient nor GC friendly.  It's something simple that works for a use case that was someone else's, not mine.  When I decide to improve this, which is unlikely any time soon, I'd do it quite differently.  Firstly, it'll be segment based instead of index based.  And I would do a 2-pass algorithm:  The first pass would build out an array of point-lengths per document -- an array of integers, the length of which is maxDoc.  I might even index this so I can just read it in instead of full-scanning the terms and incrementing counters per term.  With this information, I can build an array of latitudes and longitudes and an offset lookup array mapping docId to the lat-lon index of the first value.

~ David









_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com
Reply | Threaded
Open this post in threaded view
|

Re: [Dev] Spatial4n

Itamar Syn-Hershko
I finished porting the module and all tests, and am now verifying it using some real-world scenarios we had troubles with using the previous implementation. So far it looks good, but we do have one failing test which I'll send in a new thread.

Thanks for your help, hopefully you'll be able to add more thorough tests soon.

On Fri, May 11, 2012 at 5:01 PM, [hidden email] <[hidden email]> wrote:


On Fri, May 11, <a href="tel:2012" value="+9722012" target="_blank">2012 at 7:01 AM, Itamar Syn-Hershko <[hidden email]> wrote:
Ok, I'm a bit confused

I assume you are referring to the spatial4j-solr folder which I indeed haven't ported. Are the files there needed to achieve spatial search with Lucene?

spatial4j-solr is indeed what I am referring to.  They are definitely *not* needed to achieve spatial search with Lucene.  It's comprised of fairly straight-forward Solr field-type adapters.
 

Any chance you'll be porting those solr tests to the lucene-solr spatial module as Lucene tests?

Well there's a strong chance I'll improve the tests; I'm not yet sure if the low-hanging fruit is to port tests or to create new ones.  Before either happens, next on the agenda is working towards getting spatial4j-solr into Solr trunk.  There's a bit of synchronizing of refactorings that Ryan has done first, though.
 
Tests that do come to mind are the old Spatial tests from the previous spatial module - they were quite handy in helping to figure out how to use the spatial module correctly, and also tested some real scenarios. I have a few other real scenarios that I'm planning on adding as well.

When I next look at improving the tests, I will first look at these old spatial4j-contrib tests to see if they are worth emulating.
 

On Fri, May 11, <a href="tel:2012" value="+9722012" target="_blank">2012 at 6:20 AM, David Smiley <[hidden email]> wrote:
Itamar,

I'm glad the porting is going well.

RE tests, there are more tests you probably haven't seen at the Solr level because I found testing there easier, but I think I should have done those at the Lucene spatial level.  Still, of course, it could use more tests.  If something in particular comes to mind, then let me know.

~ David

On May 10, <a href="tel:2012" value="+9722012" target="_blank">2012, at 9:34 PM, Itamar Syn-Hershko wrote:

Ok, just finished porting everything. I have one test failing out of the 5 tests of the Lucene integration part. I'm checking now to see what's going on exactly, but basically I'm pretty much in the dark

I was wondering if you guys can add more tests - at least sanity checks. The previous spatial contrib has some fairly good tests, and this new implementation has very sparse tests

On Tue, May 8, <a href="tel:2012" value="+9722012" target="_blank">2012 at 6:16 AM, [hidden email] <[hidden email]> wrote:
Simple* got refactored to its base class so it's just SpatialContext & Factory


On Mon, May 7, <a href="tel:2012" value="+9722012" target="_blank">2012 at 2:06 AM, Itamar Syn-Hershko <[hidden email]> wrote:
I ended up doing some more hacks to try and keep as much as possible of the original API, I'm now porting the tests so I can see if it is actually working.


On another matter - I couldn't find SimpleSpatialContext and SimpleSpatialContextFactory anywhere in the sources, only in the original LSP homepage, although it is being referenced by the tests. I'll be porting it from there, but thought you should know.

On Mon, Apr 30, <a href="tel:2012" value="+9722012" target="_blank">2012 at 12:27 PM, Itamar Syn-Hershko <[hidden email]> wrote:
Is there, by any chance, a previous version of this source code that worked with the 3.x API (when there was no FunctionValues class)? If so, was it making the same assumptions or was it taking a different approach?


On Sun, Apr 29, <a href="tel:2012" value="+9722012" target="_blank">2012 at 7:22 AM, David Smiley <[hidden email]> wrote:

On Apr 28, <a href="tel:2012" value="+9722012" target="_blank">2012, at 7:16 PM, Itamar Syn-Hershko wrote:

inline

I have a few more comments and questions, I'll send them separately

On Fri, Apr 27, <a href="tel:2012" value="+9722012" target="_blank">2012 at 8:26 PM, Smiley, David W. <[hidden email]> wrote:


1. ShapeFieldCacheProvider.readShape accepts a ByteRef parameter which is being returned from TermEnum - this I assume is the new Flex API of Lucene, but unfortunately it is not available in the API I'm working against (2.9.4 or 3.0.3). How do you propose to work around that? I have the payloads API at my disposal, but I'm hoping for something more elegant

Given this code snippet:

protected override Point ReadShape(BytesRef term)
{
scanCell = grid.GetNode(term.Bytes, term.Offset, term.Length, scanCell);
return scanCell.IsLeaf() ? scanCell.GetShape().GetCenter() : null;
}

I only have the old Term object exposing the field name and term text.

The old Lucene API for this stuff is to simply use a String.  The new API uses a ByteRef, which is a simple construct holding the byte array an offsets.  So you'll have to modify it accordingly; that part should be relatively easy compared to other issues.

So just to confirm: can I safely change this to work off of the term string alone?

Yes.

I can see all that data is being saved for each Node and is being lazily evaluated, so I guess I can just go ahead and change ByteRef to Term and pass Node the term text directly (Intern'ed perhaps)?

No intern'ing necessary.

I just ported FunctionQuery and confirmed ValueSource exist in Lucene.NET. Any idea where I can find the sources for FunctionValues ? 

No; I dunno.  You might want to consider implementing this, a cache of data to sort on, in a way that makes sense to you.  Although the API is hard to work with (ValueSource, etc.) the task at hand is simple, and perhaps you should not bother trying to conform to Lucene's API if you don't have to for what you're doing in RavenDB.

By the way, I fully admit the cache structure, an array (of size maxDoc) of Lists of Points, is hardly space-efficient nor GC friendly.  It's something simple that works for a use case that was someone else's, not mine.  When I decide to improve this, which is unlikely any time soon, I'd do it quite differently.  Firstly, it'll be segment based instead of index based.  And I would do a 2-pass algorithm:  The first pass would build out an array of point-lengths per document -- an array of integers, the length of which is maxDoc.  I might even index this so I can just read it in instead of full-scanning the terms and incrementing counters per term.  With this information, I can build an array of latitudes and longitudes and an offset lookup array mapping docId to the lat-lon index of the first value.

~ David










_______________________________________________
dev mailing list
[hidden email]
http://lists.spatial4j.com/listinfo.cgi/dev-spatial4j.com