Because if you follow my line of reasoning in my previous messages and the rest of this message, I'm advising that you implement what basically will be Lengyel's algorithm on the CPU.
>
- I highly suggest you stick to float internally [...]
I still think `float' is a better option for generating SDF, especially in lower resolution glyphs where fixed-point produces
kind of straight lines instead of smooth curves(which is not noticable if you look at it briefly). But my concern is that
FreeType doesn't use floats at the moment and I don't think it will be a good idea to add support for floats just for the
sake of this project.
It's a tradeoff between one thing or the other and I can't decide which would be the best considering the current state of
library. I would say that I'm inclined on using fixed-point integers just because FreeType doesn't use them.
Also, why doesn't FreeType use floats? Is it just because of platform which doesn't have floating point type? or are there
more reasons? This question has been in my mind for quite some time.
I addressed that in my reply to Werner a few minutes ago.
>
Have you measured performance? I'm fairly sure the float can be made both more robust and faster.
I just did, here are the results with compiler optimization turned on using chrono library:
A) Line Segment: ~0.026 microseconds
B) Conic Bezier: ~0.168
microseconds
C) Cubic Bezier: ~0.469
microseconds
[I have also attached the gprof output in case you are interested. Note
that the gprof output is without any compiler optimization]
To compare it to fixed-point check here:
So basically you already showed that using fixed is slower and introduces artifacts. To me that was unnecessary as both were very well-understood to anyone who has worked in this field. But now that you have, I strongly advise you stick to float.
> - Your Newton-Raphson is solid and your performance numbers look
amazing. I think you should stick with this approach instead of
subdividing.
> As was suggested by others, do experiment with Raphson on
your quadratic as well.
Yes, will try to use
Raphson on quadratic, although I don't think it will be better than solving the cubic equation. And I will stick to it
until I find something even faster.
I take that back for now. It might be workable, but stick with what you have. We can find other ways to make your cubic-solving faster. More about my position-reversal on Raphson below.
>
* Currently you abandon as soon as factor falls outside of [0,1].
That's wrong. Factor might go out but come back in in the next
iteration.
I was doing that initially, but I saw that the factor goes `out' and when it come back `in' it has the same value as the previous `in' value.
This causes 2 extra passes of a fairly expensive iteration, so I decided to break if it goes outside the range. But, yeah I will see if
that is wrong and decide accordingly after further testing.
I thought about that a lot and did some research. This is a good summary:
Basically, if the solution is pulled out of [0,1] range, you can clamp them. If they keep pulling out again right after clamping, you can discard. That post also suggests that what you are seeing was caused by your fixed-point limitations. Were you testing with float or fixed? This is the part I'm referring to:
"Occasionally, it is helpful to remember that Newton's method exhibits one sided convergence in the limit, i.e. if the root is a simple, then the residuals 𝑓(𝑥𝑛)f(xn) eventually have the same sign. Deviation from this behavior indicates that you are pushing against the limitations imposed by floating point arithmetic."
That's highly likely given that you are dividing by the derivative without any near-zero checks on that.
>
* I've convinced myself, but probably can't be proved, that MAX_DIVISIONS=2 is enough for always finding the closest point.
> That would speed up significantly.
It will certainly speed up the process a lot. But in the current implementation here are the results:
Maybe if I don't break out when the factor falls outside [0.0, 1.0],
MAX_DIVISIONS=2 might work.
But currently there are issues when using
MAX_DIVISIONS=2.
I take that back. Moreover, I'm convinced with 4 you still get very noticeable artefacts with correctly-constructed fonts.
When I was thinking through it last time I was mistakenly thinking that there are three roots, whereas there are five. Moreover, the positions of the roots can be arbitrarily close to each other. Moreover, there's no fast criteria to know the quality of the Raphson output or know when to terminate. As such, I believe the method cannot be made correct (as in not producing artefacts) and fast at the same time. So I go back to my suggestion earlier on: that you convert cubics to a series of quadratics. Note how that conversion needs to be done once only, not per pixel.
For the conversion I suggest lifting the core ideas from cu2qu:
ideally withe cusp issue handled:
I'm happy to discuss that. But I also recognize that the algorithm can change independently of integrating the work into FreeType.
So if you convert cubics to quadratics, and divide the bitmap into coarse regions for speedup, that is basically Langyel's approach from a high-level, although he rasterizes instead of generating SDF. For such a coarse grid, I suggest 8x8 samples. That would lend itself well to SIMD and feels like the right size to me.
[Note that you can do your Newton-Raphson on 8 t's together with SIMD as well. But I don't think you should explore that.]
> - Your quadratic code always adds t=0 and t=1 as roots. You don't need to. Only clamp the real roots to [0,1] range and remove duplicates.
That is done to also check the endpoints. There are cases where there are:
A) no real roots of the cubic equation (coefficient of x^3 = 0)
B) all real roots lie within the range [0.0, 1.0]
In these cases it becomes necessary to check the endpoints of the bezier curve, hence t = 0 and t = 1.
I understand why you are doing it. I believe it's not necessary. Let me try again. See if you can prove the following for yourself. I have proved it and can share. (Only if we had a mathematician around ;)).
Lemma: if the closest point on curve [0,1] is to the endpoint at t=1 and the cubic equation has no real root at t=1, the cubic equation must have at least one real root at some t > 1. Similarly, if the closest point on curve [0,1] is to the endpoint at t=0 and the cubic equation has no real root at t=0, the cubic equation must have at least one real root at some t > 1.
As such, you just need to compute all real roots, clamp them to [0,1] and remove duplicates.
> - I still think an option to get A8 output is desired, since that's enough for a respectable rendering in most
> situations and is 50% more efficient than a 16-bit. Also, most GPU hardware doesn't support 16-bit textures.
Sure, it won't be a problem. I can add two output formats.
Isn't this `DXGI_FORMAT_R16_SNORM' a 16-bit texture? I don't know about hardware, but in shaders I think you
can easily get 16-bit and even 32-bit textures and shaders these days. Please correct me if I'm wrong.
Emphasis on "these days". If you're designing for what's available "these days", then banning float calculation makes zero sense. I take back "most GPU hardware" and that point. Memory consideration stands.
> * The final normalizing you are doing based on max-distance is nonsensical. Makes the SDF unusable for rasterization.
In this they use normalized values, but now I realize that the rendering API (OpenGL, DirectX) can automatically
normalize the values. So, just to fit in a 2.14 fixed point I am normalizing the values which can very well be changed
by a flag or completely removed.
Right. Normalizing by a fixed factor is fine. Normalizing each glyph differently makes it unusable for anything.
Cheers,