Re Questions about the SIFT algorithm
Message-ID:<6dd494bd-7dc0-4868-93e5-8b361f140049@l77g2000hse.googlegroups.com>
Subject:
Re: Questions about the SIFT algorithm
Date:Thu, 23 Oct 2008 10:25:02 +0100
Hi. On Oct 22, 1:37=A0am, elite_2...@126.com wrote: > I am a postgraduate student , and working on research of SIFT > algorithm ,but there are some concepts which are unclear to me... > =A0 =A0> How to make full understanding of scale invariant in the name of > "SIFT" . It basically means that the algorithm is invariant to changes in object scales (size). In other words, you can take the original and resize it and you'll still be able to match SIFT descriptors generated from the derived picture to descriptors derived from the original. > =A0 =A0> "we choose to divide each octave of scale space into an integer > number ,s, of intervals .we must produce s+3 images in the stack of > blurred images for each octave,so that final extrema detection covers > a complete octave" ,in this sentence,why must produce s+3 images ,and > how to understand "final extrema detection covers a complete octave". > =A0 =A0would you give me some ideas or advice,many thanks in advance. Now, I may be wrong here but this is my understanding. What you do in each octave is 1) blur the images with incrementing sigmas, 2) take the difference of gaussian (DoG) of the blurred layers and then 3) compare one layer to its two adjacent layers to detect local extremas. I don't know exactly what the purpose of that s is but I'd say that if you choose s=3D0 you'll end up with 3 blurred images which would enable you to produce 2 DoGs which won't be enough to do a proper extrema detection (you need to compare a pixel in one layer to its two adjacent layers and with 2 DoGs you don't have a layer which has layers on "each side"). If s=3D1 you get 4 blurred images enabling 3 DoGs which means that you are getting 1 "proper" layer to perform extrema detection on (with adjacent layers that are both less and more blurred). This is my understanding of this but just for full disclosure, I've never implemented SIFT myself so I may be wrong about this. I just hope that maybe it gives you some food for thought at least and doesn't confuse you even more. Best regards, Stefan.
Message-ID:<4204b25f-c2ed-4b8f-a436-c2afab4c3059@g17g2000prg.googlegroups.com>
Subject:
Re: Questions about the SIFT algorithm
Date:Fri, 24 Oct 2008 10:53:54 +0100
Hello On 10=D4=C223=C8=D5, =CF=C2=CE=E75=CA=B125=B7=D6, Stef=A8=A2n Freyrwrote: > Hi. > > On Oct 22, 1:37 am, elite_2...@126.com wrote: > > > I am a postgraduate student , and working on research of SIFT > > algorithm ,but there are some concepts which are unclear to me... > > > How to make full understanding of scale invariant in the name of > > "SIFT" . > > It basically means that the algorithm is invariant to changes in > object scales (size). In other words, you can take the original and > resize it and you'll still be able to match SIFT descriptors generated > from the derived picture to descriptors derived from the original. > > > > "we choose to divide each octave of scale space into an integer > > number ,s, of intervals .we must produce s+3 images in the stack of > > blurred images for each octave,so that final extrema detection covers > > a complete octave" ,in this sentence,why must produce s+3 images ,and > > how to understand "final extrema detection covers a complete octave". > > would you give me some ideas or advice,many thanks in advance. > > Now, I may be wrong here but this is my understanding. What you do in > each octave is 1) blur the images with incrementing sigmas, 2) take > the difference of gaussian (DoG) of the blurred layers and then 3) > compare one layer to its two adjacent layers to detect local extremas. > > I don't know exactly what the purpose of that s is but I'd say that if > you choose s=3D0 you'll end up with 3 blurred images which would enable > you to produce 2 DoGs which won't be enough to do a proper extrema > detection (you need to compare a pixel in one layer to its two > adjacent layers and with 2 DoGs you don't have a layer which has > layers on "each side"). If s=3D1 you get 4 blurred images enabling 3 > DoGs which means that you are getting 1 "proper" layer to perform > extrema detection on (with adjacent layers that are both less and more > blurred). > > This is my understanding of this but just for full disclosure, I've > never implemented SIFT myself so I may be wrong about this. I just > hope that maybe it gives you some food for thought at least and > doesn't confuse you even more. > > Best regards, Stefan. I am very glad to receive your responses.After reading your words,now I had made a further understanding of the "s+3",it means that if we need to generate 3 DoG levels, this will need 3+1 smoothed images.And I am reading some literatures about the scale space theory (the work of Tony Lindeberg) so that I can have a clear idea about the "SIFT" . thinks a lot once more, your new friend



RSS News Feed