Compare commits

..

29 Commits

Author SHA1 Message Date
Recep Aslantas
b4efcefe7f drop glm__memcpy, glm__memset and glm__memzero
* implement mat3_zero and mat4_zero functions
* copy matrix items manually in ucopy functions
2019-02-13 10:14:53 +03:00
Recep Aslantas
0d2e5a996a docs: add SSE3 and SSE4 dot product options 2019-02-13 10:13:06 +03:00
Recep Aslantas
2b1eece9ac mat3: add rmc for mat3 2019-02-13 10:12:49 +03:00
Recep Aslantas
c8b8f4f6f0 now working on v0.5.3 2019-02-13 10:00:57 +03:00
Recep Aslantas
1a34ffcf4b Merge pull request #72 from recp/simd-update
SIMD update (NEON, SSE3, SSE4) + Features
2019-02-03 17:18:54 +03:00
Recep Aslantas
af088a1059 Merge branch 'master' into simd-update 2019-02-02 15:58:57 +03:00
Recep Aslantas
18f06743ed build: make automake build slient (less-verbose) 2019-02-02 15:54:09 +03:00
Recep Aslantas
60cfc87009 remove bezier_solve for now 2019-02-02 15:30:05 +03:00
Recep Aslantas
4e5879497e update docs 2019-02-02 15:29:48 +03:00
Recep Aslantas
7848dda1dd curve: cubic hermite intrpolation 2019-01-29 22:17:44 +03:00
Recep Aslantas
1e121a4855 mat4: fix rmc multiplication 2019-01-29 22:11:04 +03:00
Recep Aslantas
0f223db7d3 Merge pull request #74 from ccworld1000/patch-1
Update cglm.podspec
2019-01-29 14:48:46 +03:00
CC
a4e2c39c1d Update cglm.podspec
update pod version
2019-01-29 16:54:02 +08:00
Recep Aslantas
c22231f296 curve: de casteljau implementation for solving cubic bezier 2019-01-28 15:52:42 +03:00
Recep Aslantas
730cb1e9f7 add bezier helpers 2019-01-28 15:32:24 +03:00
Recep Aslantas
b0e48a56ca test: rename test_rand_angle() to test_rand() 2019-01-28 15:31:03 +03:00
Recep Aslantas
11a6e4471e fix vec4_cubic function 2019-01-28 14:26:02 +03:00
Recep Aslantas
60cb4beb0a curve: helper for calculate result of SMC multiplication 2019-01-26 18:06:26 +03:00
Recep Aslantas
32ddf49756 mat4: helper for row * matrix * column 2019-01-26 18:05:05 +03:00
Recep Aslantas
807d5589b4 call: add missing end guard to call headers 2019-01-26 16:05:11 +03:00
Recep Aslantas
59b9e54879 vec4: helper to fill vec4 as [S^3, S^2, S, 1] 2019-01-26 15:54:10 +03:00
Recep Aslantas
fc7f958167 simd: remove re-load in SSE4 and SSE3 2019-01-25 21:56:17 +03:00
Recep Aslantas
31bb303c55 simd: organise SIMD-functions
* optimize dot product
2019-01-24 10:17:49 +03:00
Recep Aslantas
be6aa9a89a simd: optimize some mat4 operations with neon 2019-01-22 09:39:57 +03:00
Recep Aslantas
f65f1d491b simd: optimize vec4_distance with sse and neon 2019-01-22 09:23:51 +03:00
Recep Aslantas
f0c2a2984e simd, neon: add missing neon support for vec4 2019-01-22 09:05:38 +03:00
Recep Aslantas
b117f3bf80 neon: add neon support for most vec4 operations 2019-01-21 23:14:04 +03:00
Recep Aslantas
07e60bd098 cam: extend frustum's far distance helper (#71)
* this will help to implement zoom easily
2019-01-16 14:59:58 +03:00
Recep Aslantas
e3d3cd8ab8 now working on v0.5.2 2019-01-15 12:08:54 +03:00
50 changed files with 1169 additions and 174 deletions

1
.gitignore vendored
View File

@@ -69,3 +69,4 @@ win/cglm_test_*
win/x64 win/x64
win/x85 win/x85
win/Debug win/Debug
cglm-test-ios*

View File

@@ -52,3 +52,12 @@ https://gamedev.stackexchange.com/questions/28395/rotating-vector3-by-a-quaterni
9. Sphere AABB intersect 9. Sphere AABB intersect
https://github.com/erich666/GraphicsGems/blob/master/gems/BoxSphere.c https://github.com/erich666/GraphicsGems/blob/master/gems/BoxSphere.c
10. Horizontal add
https://stackoverflow.com/questions/6996764/fastest-way-to-do-horizontal-float-vector-sum-on-x86
11. de casteljau implementation and comments
https://forums.khronos.org/showthread.php/10264-Animations-in-1-4-1-release-notes-revision-A/page2?highlight=bezier
https://forums.khronos.org/showthread.php/10644-Animation-Bezier-interpolation
https://forums.khronos.org/showthread.php/10387-2D-Tangents-in-Bezier-Splines?p=34164&viewfull=1#post34164
https://forums.khronos.org/showthread.php/10651-Animation-TCB-Spline-Interpolation-in-COLLADA?highlight=bezier

View File

@@ -82,7 +82,11 @@ Currently *cglm* uses default clip space configuration (-1, 1) for camera functi
- inline or pre-compiled function call - inline or pre-compiled function call
- frustum (extract view frustum planes, corners...) - frustum (extract view frustum planes, corners...)
- bounding box (AABB in Frustum (culling), crop, merge...) - bounding box (AABB in Frustum (culling), crop, merge...)
- bounding sphere
- project, unproject - project, unproject
- easing functions
- curves
- curve interpolation helpers (S*M*C, deCasteljau...)
- and other... - and other...
<hr /> <hr />

View File

@@ -2,7 +2,7 @@ Pod::Spec.new do |s|
# Description # Description
s.name = "cglm" s.name = "cglm"
s.version = "0.4.6" s.version = "0.5.1"
s.summary = "📽 Optimized OpenGL/Graphics Math (glm) for C" s.summary = "📽 Optimized OpenGL/Graphics Math (glm) for C"
s.description = <<-DESC s.description = <<-DESC
cglm is math library for graphics programming for C. It is similar to original glm but it is written for C instead of C++ (you can use here too). See the documentation or README for all features. cglm is math library for graphics programming for C. It is similar to original glm but it is written for C instead of C++ (you can use here too). See the documentation or README for all features.

View File

@@ -7,7 +7,7 @@
#***************************************************************************** #*****************************************************************************
AC_PREREQ([2.69]) AC_PREREQ([2.69])
AC_INIT([cglm], [0.5.1], [info@recp.me]) AC_INIT([cglm], [0.5.3], [info@recp.me])
AM_INIT_AUTOMAKE([-Wall -Werror foreign subdir-objects]) AM_INIT_AUTOMAKE([-Wall -Werror foreign subdir-objects])
AC_CONFIG_MACRO_DIR([m4]) AC_CONFIG_MACRO_DIR([m4])
@@ -29,6 +29,7 @@ LT_INIT
# Checks for libraries. # Checks for libraries.
AC_CHECK_LIB([m], [floor]) AC_CHECK_LIB([m], [floor])
m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
AC_SYS_LARGEFILE AC_SYS_LARGEFILE
# Checks for header files. # Checks for header files.

View File

@@ -46,3 +46,5 @@ Follow the :doc:`build` documentation for this
io io
call call
sphere sphere
curve
bezier

89
docs/source/bezier.rst Normal file
View File

@@ -0,0 +1,89 @@
.. default-domain:: C
Bezier
================================================================================
Header: cglm/bezier.h
Common helpers for cubic bezier and similar curves.
Table of contents (click to go):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions:
1. :c:func:`glm_bezier`
2. :c:func:`glm_hermite`
3. :c:func:`glm_decasteljau`
Functions documentation
~~~~~~~~~~~~~~~~~~~~~~~
.. c:function:: float glm_bezier(float s, float p0, float c0, float c1, float p1)
| cubic bezier interpolation
| formula:
.. code-block:: text
B(s) = P0*(1-s)^3 + 3*C0*s*(1-s)^2 + 3*C1*s^2*(1-s) + P1*s^3
| similar result using matrix:
.. code-block:: text
B(s) = glm_smc(t, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1})
| glm_eq(glm_smc(...), glm_bezier(...)) should return TRUE
Parameters:
| *[in]* **s** parameter between 0 and 1
| *[in]* **p0** begin point
| *[in]* **c0** control point 1
| *[in]* **c1** control point 2
| *[in]* **p1** end point
Returns:
B(s)
.. c:function:: float glm_hermite(float s, float p0, float t0, float t1, float p1)
| cubic hermite interpolation
| formula:
.. code-block:: text
H(s) = P0*(2*s^3 - 3*s^2 + 1) + T0*(s^3 - 2*s^2 + s) + P1*(-2*s^3 + 3*s^2) + T1*(s^3 - s^2)
| similar result using matrix:
.. code-block:: text
H(s) = glm_smc(t, GLM_HERMITE_MAT, (vec4){p0, p1, c0, c1})
| glm_eq(glm_smc(...), glm_hermite(...)) should return TRUE
Parameters:
| *[in]* **s** parameter between 0 and 1
| *[in]* **p0** begin point
| *[in]* **t0** tangent 1
| *[in]* **t1** tangent 2
| *[in]* **p1** end point
Returns:
B(s)
.. c:function:: float glm_decasteljau(float prm, float p0, float c0, float c1, float p1)
| iterative way to solve cubic equation
Parameters:
| *[in]* **prm** parameter between 0 and 1
| *[in]* **p0** begin point
| *[in]* **c0** control point 1
| *[in]* **c1** control point 2
| *[in]* **p1** end point
Returns:
parameter to use in cubic equation

View File

@@ -36,6 +36,7 @@ Functions:
#. :c:func:`glm_ortho_default` #. :c:func:`glm_ortho_default`
#. :c:func:`glm_ortho_default_s` #. :c:func:`glm_ortho_default_s`
#. :c:func:`glm_perspective` #. :c:func:`glm_perspective`
#. :c:func:`glm_persp_move_far`
#. :c:func:`glm_perspective_default` #. :c:func:`glm_perspective_default`
#. :c:func:`glm_perspective_resize` #. :c:func:`glm_perspective_resize`
#. :c:func:`glm_lookat` #. :c:func:`glm_lookat`
@@ -145,6 +146,16 @@ Functions documentation
| *[in]* **farVal** far clipping planes | *[in]* **farVal** far clipping planes
| *[out]* **dest** result matrix | *[out]* **dest** result matrix
.. c:function:: void glm_persp_move_far(mat4 proj, float deltaFar)
| extend perspective projection matrix's far distance
| this function does not guarantee far >= near, be aware of that!
Parameters:
| *[in, out]* **proj** projection matrix to extend
| *[in]* **deltaFar** distance from existing far (negative to shink)
.. c:function:: void glm_perspective_default(float aspect, mat4 dest) .. c:function:: void glm_perspective_default(float aspect, mat4 dest)
| set up perspective projection matrix with default near/far | set up perspective projection matrix with default near/far

View File

@@ -62,9 +62,9 @@ author = u'Recep Aslantas'
# built documents. # built documents.
# #
# The short X.Y version. # The short X.Y version.
version = u'0.5.1' version = u'0.5.3'
# The full version, including alpha/beta/rc tags. # The full version, including alpha/beta/rc tags.
release = u'0.5.1' release = u'0.5.3'
# The language for content autogenerated by Sphinx. Refer to documentation # The language for content autogenerated by Sphinx. Refer to documentation
# for a list of supported languages. # for a list of supported languages.

41
docs/source/curve.rst Normal file
View File

@@ -0,0 +1,41 @@
.. default-domain:: C
Curve
================================================================================
Header: cglm/curve.h
Common helpers for common curves. For specific curve see its header/doc
e.g bezier
Table of contents (click to go):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions:
1. :c:func:`glm_smc`
Functions documentation
~~~~~~~~~~~~~~~~~~~~~~~
.. c:function:: float glm_smc(float s, mat4 m, vec4 c)
| helper function to calculate **S** * **M** * **C** multiplication for curves
| this function does not encourage you to use SMC, instead it is a helper if you use SMC.
| if you want to specify S as vector then use more generic glm_mat4_rmc() func.
| Example usage:
.. code-block:: c
Bs = glm_smc(s, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1})
Parameters:
| *[in]* **s** parameter between 0 and 1 (this will be [s3, s2, s, 1])
| *[in]* **m** basis matrix
| *[out]* **c** position/control vector
Returns:
scalar value e.g. Bs

View File

@@ -21,6 +21,7 @@ Functions:
1. :c:func:`glm_mat3_copy` 1. :c:func:`glm_mat3_copy`
#. :c:func:`glm_mat3_identity` #. :c:func:`glm_mat3_identity`
#. :c:func:`glm_mat3_identity_array` #. :c:func:`glm_mat3_identity_array`
#. :c:func:`glm_mat3_zero`
#. :c:func:`glm_mat3_mul` #. :c:func:`glm_mat3_mul`
#. :c:func:`glm_mat3_transpose_to` #. :c:func:`glm_mat3_transpose_to`
#. :c:func:`glm_mat3_transpose` #. :c:func:`glm_mat3_transpose`
@@ -32,6 +33,7 @@ Functions:
#. :c:func:`glm_mat3_trace` #. :c:func:`glm_mat3_trace`
#. :c:func:`glm_mat3_swap_col` #. :c:func:`glm_mat3_swap_col`
#. :c:func:`glm_mat3_swap_row` #. :c:func:`glm_mat3_swap_row`
#. :c:func:`glm_mat3_rmc`
Functions documentation Functions documentation
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
@@ -59,6 +61,13 @@ Functions documentation
| *[in,out]* **mat** matrix array (must be aligned (16/32) if alignment is not disabled) | *[in,out]* **mat** matrix array (must be aligned (16/32) if alignment is not disabled)
| *[in]* **count** count of matrices | *[in]* **count** count of matrices
.. c:function:: void glm_mat3_zero(mat3 mat)
make given matrix zero
Parameters:
| *[in,out]* **mat** matrix to
.. c:function:: void glm_mat3_mul(mat3 m1, mat3 m2, mat3 dest) .. c:function:: void glm_mat3_mul(mat3 m1, mat3 m2, mat3 dest)
multiply m1 and m2 to dest multiply m1 and m2 to dest
@@ -161,3 +170,20 @@ Functions documentation
| *[in, out]* **mat** matrix | *[in, out]* **mat** matrix
| *[in]* **row1** row1 | *[in]* **row1** row1
| *[in]* **row2** row2 | *[in]* **row2** row2
.. c:function:: float glm_mat3_rmc(vec3 r, mat3 m, vec3 c)
| **rmc** stands for **Row** * **Matrix** * **Column**
| helper for R (row vector) * M (matrix) * C (column vector)
| the result is scalar because R * M = Matrix1x3 (row vector),
| then Matrix1x3 * Vec3 (column vector) = Matrix1x1 (Scalar)
Parameters:
| *[in]* **r** row vector or matrix1x3
| *[in]* **m** matrix3x3
| *[in]* **c** column vector or matrix3x1
Returns:
scalar value e.g. Matrix1x1

View File

@@ -26,6 +26,7 @@ Functions:
#. :c:func:`glm_mat4_copy` #. :c:func:`glm_mat4_copy`
#. :c:func:`glm_mat4_identity` #. :c:func:`glm_mat4_identity`
#. :c:func:`glm_mat4_identity_array` #. :c:func:`glm_mat4_identity_array`
#. :c:func:`glm_mat4_zero`
#. :c:func:`glm_mat4_pick3` #. :c:func:`glm_mat4_pick3`
#. :c:func:`glm_mat4_pick3t` #. :c:func:`glm_mat4_pick3t`
#. :c:func:`glm_mat4_ins3` #. :c:func:`glm_mat4_ins3`
@@ -45,6 +46,7 @@ Functions:
#. :c:func:`glm_mat4_inv_fast` #. :c:func:`glm_mat4_inv_fast`
#. :c:func:`glm_mat4_swap_col` #. :c:func:`glm_mat4_swap_col`
#. :c:func:`glm_mat4_swap_row` #. :c:func:`glm_mat4_swap_row`
#. :c:func:`glm_mat4_rmc`
Functions documentation Functions documentation
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
@@ -80,6 +82,13 @@ Functions documentation
| *[in,out]* **mat** matrix array (must be aligned (16/32) if alignment is not disabled) | *[in,out]* **mat** matrix array (must be aligned (16/32) if alignment is not disabled)
| *[in]* **count** count of matrices | *[in]* **count** count of matrices
.. c:function:: void glm_mat4_zero(mat4 mat)
make given matrix zero
Parameters:
| *[in,out]* **mat** matrix to
.. c:function:: void glm_mat4_pick3(mat4 mat, mat3 dest) .. c:function:: void glm_mat4_pick3(mat4 mat, mat3 dest)
copy upper-left of mat4 to mat3 copy upper-left of mat4 to mat3
@@ -270,3 +279,20 @@ Functions documentation
| *[in, out]* **mat** matrix | *[in, out]* **mat** matrix
| *[in]* **row1** row1 | *[in]* **row1** row1
| *[in]* **row2** row2 | *[in]* **row2** row2
.. c:function:: float glm_mat4_rmc(vec4 r, mat4 m, vec4 c)
| **rmc** stands for **Row** * **Matrix** * **Column**
| helper for R (row vector) * M (matrix) * C (column vector)
| the result is scalar because R * M = Matrix1x4 (row vector),
| then Matrix1x4 * Vec4 (column vector) = Matrix1x1 (Scalar)
Parameters:
| *[in]* **r** row vector or matrix1x4
| *[in]* **m** matrix4x4
| *[in]* **c** column vector or matrix4x1
Returns:
scalar value e.g. Matrix1x1

View File

@@ -40,3 +40,13 @@ SSE and SSE2 Shuffle Option
**_mm_shuffle_ps** generates **shufps** instruction even if registers are same. **_mm_shuffle_ps** generates **shufps** instruction even if registers are same.
You can force it to generate **pshufd** instruction by defining You can force it to generate **pshufd** instruction by defining
**CGLM_USE_INT_DOMAIN** macro. As default it is not defined. **CGLM_USE_INT_DOMAIN** macro. As default it is not defined.
SSE3 and SSE4 Dot Product Options
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
You have to extra options for dot product: **CGLM_SSE4_DOT** and **CGLM_SSE3_DOT**.
- If **SSE4** is enabled then you can define **CGLM_SSE4_DOT** to force cglm to use **_mm_dp_ps** instruction.
- If **SSE3** is enabled then you can define **CGLM_SSE3_DOT** to force cglm to use **_mm_hadd_ps** instructions.
otherwise cglm will use custom cglm's hadd functions which are optimized too.

View File

@@ -58,11 +58,7 @@ Functions:
#. :c:func:`glm_vec4_minv` #. :c:func:`glm_vec4_minv`
#. :c:func:`glm_vec4_clamp` #. :c:func:`glm_vec4_clamp`
#. :c:func:`glm_vec4_lerp` #. :c:func:`glm_vec4_lerp`
#. :c:func:`glm_vec4_isnan` #. :c:func:`glm_vec4_cubic`
#. :c:func:`glm_vec4_isinf`
#. :c:func:`glm_vec4_isvalid`
#. :c:func:`glm_vec4_sign`
#. :c:func:`glm_vec4_sqrt`
Functions documentation Functions documentation
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
@@ -401,3 +397,11 @@ Functions documentation
| *[in]* **to** to value | *[in]* **to** to value
| *[in]* **t** interpolant (amount) clamped between 0 and 1 | *[in]* **t** interpolant (amount) clamped between 0 and 1
| *[out]* **dest** destination | *[out]* **dest** destination
.. c:function:: void glm_vec4_cubic(float s, vec4 dest)
helper to fill vec4 as [S^3, S^2, S, 1]
Parameters:
| *[in]* **s** parameter
| *[out]* **dest** destination

152
include/cglm/bezier.h Normal file
View File

@@ -0,0 +1,152 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#ifndef cglm_bezier_h
#define cglm_bezier_h
#define GLM_BEZIER_MAT_INIT {{-1.0f, 3.0f, -3.0f, 1.0f}, \
{ 3.0f, -6.0f, 3.0f, 0.0f}, \
{-3.0f, 3.0f, 0.0f, 0.0f}, \
{ 1.0f, 0.0f, 0.0f, 0.0f}}
#define GLM_HERMITE_MAT_INIT {{ 2.0f, -3.0f, 0.0f, 1.0f}, \
{-2.0f, 3.0f, 0.0f, 0.0f}, \
{ 1.0f, -2.0f, 1.0f, 0.0f}, \
{ 1.0f, -1.0f, 0.0f, 0.0f}}
/* for C only */
#define GLM_BEZIER_MAT ((mat4)GLM_BEZIER_MAT_INIT)
#define GLM_HERMITE_MAT ((mat4)GLM_HERMITE_MAT_INIT)
#define CGLM_DECASTEL_EPS 1e-9
#define CGLM_DECASTEL_MAX 1000
#define CGLM_DECASTEL_SMALL 1e-20
/*!
* @brief cubic bezier interpolation
*
* Formula:
* B(s) = P0*(1-s)^3 + 3*C0*s*(1-s)^2 + 3*C1*s^2*(1-s) + P1*s^3
*
* similar result using matrix:
* B(s) = glm_smc(t, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1})
*
* glm_eq(glm_smc(...), glm_bezier(...)) should return TRUE
*
* @param[in] s parameter between 0 and 1
* @param[in] p0 begin point
* @param[in] c0 control point 1
* @param[in] c1 control point 2
* @param[in] p1 end point
*
* @return B(s)
*/
CGLM_INLINE
float
glm_bezier(float s, float p0, float c0, float c1, float p1) {
float x, xx, ss, xs3, a;
x = 1.0f - s;
xx = x * x;
ss = s * s;
xs3 = (s - ss) * 3.0f;
a = p0 * xx + c0 * xs3;
return a + s * (c1 * xs3 + p1 * ss - a);
}
/*!
* @brief cubic hermite interpolation
*
* Formula:
* H(s) = P0*(2*s^3 - 3*s^2 + 1) + T0*(s^3 - 2*s^2 + s)
* + P1*(-2*s^3 + 3*s^2) + T1*(s^3 - s^2)
*
* similar result using matrix:
* H(s) = glm_smc(t, GLM_HERMITE_MAT, (vec4){p0, p1, c0, c1})
*
* glm_eq(glm_smc(...), glm_hermite(...)) should return TRUE
*
* @param[in] s parameter between 0 and 1
* @param[in] p0 begin point
* @param[in] t0 tangent 1
* @param[in] t1 tangent 2
* @param[in] p1 end point
*
* @return H(s)
*/
CGLM_INLINE
float
glm_hermite(float s, float p0, float t0, float t1, float p1) {
float ss, d, a, b, c, e, f;
ss = s * s;
a = ss + ss;
c = a + ss;
b = a * s;
d = s * ss;
f = d - ss;
e = b - c;
return p0 * (e + 1.0f) + t0 * (f - ss + s) + t1 * f - p1 * e;
}
/*!
* @brief iterative way to solve cubic equation
*
* @param[in] prm parameter between 0 and 1
* @param[in] p0 begin point
* @param[in] c0 control point 1
* @param[in] c1 control point 2
* @param[in] p1 end point
*
* @return parameter to use in cubic equation
*/
CGLM_INLINE
float
glm_decasteljau(float prm, float p0, float c0, float c1, float p1) {
float u, v, a, b, c, d, e, f;
int i;
if (prm - p0 < CGLM_DECASTEL_SMALL)
return 0.0f;
if (p1 - prm < CGLM_DECASTEL_SMALL)
return 1.0f;
u = 0.0f;
v = 1.0f;
for (i = 0; i < CGLM_DECASTEL_MAX; i++) {
/* de Casteljau Subdivision */
a = (p0 + c0) * 0.5f;
b = (c0 + c1) * 0.5f;
c = (c1 + p1) * 0.5f;
d = (a + b) * 0.5f;
e = (b + c) * 0.5f;
f = (d + e) * 0.5f; /* this one is on the curve! */
/* The curve point is close enough to our wanted t */
if (fabsf(f - prm) < CGLM_DECASTEL_EPS)
return glm_clamp_zo((u + v) * 0.5f);
/* dichotomy */
if (f < prm) {
p0 = f;
c0 = e;
c1 = c;
u = (u + v) * 0.5f;
} else {
c0 = a;
c1 = d;
p1 = f;
v = (u + v) * 0.5f;
}
}
return glm_clamp_zo((u + v) * 0.5f);
}
#endif /* cglm_bezier_h */

View File

@@ -27,6 +27,8 @@ extern "C" {
#include "call/project.h" #include "call/project.h"
#include "call/sphere.h" #include "call/sphere.h"
#include "call/ease.h" #include "call/ease.h"
#include "call/curve.h"
#include "call/bezier.h"
#ifdef __cplusplus #ifdef __cplusplus
} }

View File

@@ -0,0 +1,31 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#ifndef cglmc_bezier_h
#define cglmc_bezier_h
#ifdef __cplusplus
extern "C" {
#endif
#include "../cglm.h"
CGLM_EXPORT
float
glmc_bezier(float s, float p0, float c0, float c1, float p1);
CGLM_EXPORT
float
glmc_hermite(float s, float p0, float t0, float t1, float p1);
CGLM_EXPORT
float
glmc_decasteljau(float prm, float p0, float c0, float c1, float p1);
#ifdef __cplusplus
}
#endif
#endif /* cglmc_bezier_h */

View File

@@ -61,6 +61,10 @@ glmc_perspective(float fovy,
float farVal, float farVal,
mat4 dest); mat4 dest);
CGLM_EXPORT
void
glmc_persp_move_far(mat4 proj, float deltaFar);
CGLM_EXPORT CGLM_EXPORT
void void
glmc_perspective_default(float aspect, mat4 dest); glmc_perspective_default(float aspect, mat4 dest);

23
include/cglm/call/curve.h Normal file
View File

@@ -0,0 +1,23 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#ifndef cglmc_curve_h
#define cglmc_curve_h
#ifdef __cplusplus
extern "C" {
#endif
#include "../cglm.h"
CGLM_EXPORT
float
glmc_smc(float s, mat4 m, vec4 c);
#ifdef __cplusplus
}
#endif
#endif /* cglmc_curve_h */

View File

@@ -137,4 +137,7 @@ CGLM_EXPORT
float float
glmc_ease_bounce_inout(float t); glmc_ease_bounce_inout(float t);
#ifdef __cplusplus
}
#endif
#endif /* cglmc_ease_h */ #endif /* cglmc_ease_h */

View File

@@ -72,6 +72,10 @@ CGLM_EXPORT
void void
glmc_mat3_swap_row(mat3 mat, int row1, int row2); glmc_mat3_swap_row(mat3 mat, int row1, int row2);
CGLM_EXPORT
float
glmc_mat3_rmc(vec3 r, mat3 m, vec3 c);
#ifdef __cplusplus #ifdef __cplusplus
} }
#endif #endif

View File

@@ -113,6 +113,10 @@ CGLM_EXPORT
void void
glmc_mat4_swap_row(mat4 mat, int row1, int row2); glmc_mat4_swap_row(mat4 mat, int row1, int row2);
CGLM_EXPORT
float
glmc_mat4_rmc(vec4 r, mat4 m, vec4 c);
#ifdef __cplusplus #ifdef __cplusplus
} }
#endif #endif

View File

@@ -33,4 +33,7 @@ CGLM_EXPORT
bool bool
glmc_sphere_point(vec4 s, vec3 point); glmc_sphere_point(vec4 s, vec3 point);
#ifdef __cplusplus
}
#endif
#endif /* cglmc_sphere_h */ #endif /* cglmc_sphere_h */

View File

@@ -153,6 +153,10 @@ CGLM_EXPORT
void void
glmc_vec4_lerp(vec4 from, vec4 to, float t, vec4 dest); glmc_vec4_lerp(vec4 from, vec4 to, float t, vec4 dest);
CGLM_EXPORT
void
glmc_vec4_cubic(float s, vec4 dest);
/* ext */ /* ext */
CGLM_EXPORT CGLM_EXPORT

View File

@@ -84,7 +84,7 @@ glm_frustum(float left,
mat4 dest) { mat4 dest) {
float rl, tb, fn, nv; float rl, tb, fn, nv;
glm__memzero(float, dest, sizeof(mat4)); glm_mat4_zero(dest);
rl = 1.0f / (right - left); rl = 1.0f / (right - left);
tb = 1.0f / (top - bottom); tb = 1.0f / (top - bottom);
@@ -122,7 +122,7 @@ glm_ortho(float left,
mat4 dest) { mat4 dest) {
float rl, tb, fn; float rl, tb, fn;
glm__memzero(float, dest, sizeof(mat4)); glm_mat4_zero(dest);
rl = 1.0f / (right - left); rl = 1.0f / (right - left);
tb = 1.0f / (top - bottom); tb = 1.0f / (top - bottom);
@@ -259,7 +259,7 @@ glm_perspective(float fovy,
mat4 dest) { mat4 dest) {
float f, fn; float f, fn;
glm__memzero(float, dest, sizeof(mat4)); glm_mat4_zero(dest);
f = 1.0f / tanf(fovy * 0.5f); f = 1.0f / tanf(fovy * 0.5f);
fn = 1.0f / (nearVal - farVal); fn = 1.0f / (nearVal - farVal);
@@ -271,6 +271,30 @@ glm_perspective(float fovy,
dest[3][2] = 2.0f * nearVal * farVal * fn; dest[3][2] = 2.0f * nearVal * farVal * fn;
} }
/*!
* @brief extend perspective projection matrix's far distance
*
* this function does not guarantee far >= near, be aware of that!
*
* @param[in, out] proj projection matrix to extend
* @param[in] deltaFar distance from existing far (negative to shink)
*/
CGLM_INLINE
void
glm_persp_move_far(mat4 proj, float deltaFar) {
float fn, farVal, nearVal, p22, p32;
p22 = proj[2][2];
p32 = proj[3][2];
nearVal = p32 / (p22 - 1.0f);
farVal = p32 / (p22 + 1.0f) + deltaFar;
fn = 1.0f / (nearVal - farVal);
proj[2][2] = (nearVal + farVal) * fn;
proj[3][2] = 2.0f * nearVal * farVal * fn;
}
/*! /*!
* @brief set up perspective projection matrix with default near/far * @brief set up perspective projection matrix with default near/far
* and angle values * and angle values

View File

@@ -26,5 +26,7 @@
#include "project.h" #include "project.h"
#include "sphere.h" #include "sphere.h"
#include "ease.h" #include "ease.h"
#include "curve.h"
#include "bezier.h"
#endif /* cglm_h */ #endif /* cglm_h */

View File

@@ -26,34 +26,6 @@
# define CGLM_INLINE static inline __attribute((always_inline)) # define CGLM_INLINE static inline __attribute((always_inline))
#endif #endif
#define glm__memcpy(type, dest, src, size) \
do { \
type *srci; \
type *srci_end; \
type *desti; \
\
srci = (type *)src; \
srci_end = (type *)((char *)srci + size); \
desti = (type *)dest; \
\
while (srci != srci_end) \
*desti++ = *srci++; \
} while (0)
#define glm__memset(type, dest, size, val) \
do { \
type *desti; \
type *desti_end; \
\
desti = (type *)dest; \
desti_end = (type *)((char *)desti + size); \
\
while (desti != desti_end) \
*desti++ = val; \
} while (0)
#define glm__memzero(type, dest, size) glm__memset(type, dest, size, 0)
#include "types.h" #include "types.h"
#include "simd/intrin.h" #include "simd/intrin.h"

40
include/cglm/curve.h Normal file
View File

@@ -0,0 +1,40 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#ifndef cglm_curve_h
#define cglm_curve_h
#include "common.h"
#include "vec4.h"
#include "mat4.h"
/*!
* @brief helper function to calculate S*M*C multiplication for curves
*
* This function does not encourage you to use SMC,
* instead it is a helper if you use SMC.
*
* if you want to specify S as vector then use more generic glm_mat4_rmc() func.
*
* Example usage:
* B(s) = glm_smc(s, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1})
*
* @param[in] s parameter between 0 and 1 (this will be [s3, s2, s, 1])
* @param[in] m basis matrix
* @param[in] c position/control vector
*
* @return B(s)
*/
CGLM_INLINE
float
glm_smc(float s, mat4 m, vec4 c) {
vec4 vs;
glm_vec4_cubic(s, vs);
return glm_mat4_rmc(vs, m, c);
}
#endif /* cglm_curve_h */

View File

@@ -17,16 +17,19 @@
CGLM_INLINE void glm_mat3_copy(mat3 mat, mat3 dest); CGLM_INLINE void glm_mat3_copy(mat3 mat, mat3 dest);
CGLM_INLINE void glm_mat3_identity(mat3 mat); CGLM_INLINE void glm_mat3_identity(mat3 mat);
CGLM_INLINE void glm_mat3_identity_array(mat3 * restrict mat, size_t count); CGLM_INLINE void glm_mat3_identity_array(mat3 * restrict mat, size_t count);
CGLM_INLINE void glm_mat3_zero(mat3 mat);
CGLM_INLINE void glm_mat3_mul(mat3 m1, mat3 m2, mat3 dest); CGLM_INLINE void glm_mat3_mul(mat3 m1, mat3 m2, mat3 dest);
CGLM_INLINE void glm_mat3_transpose_to(mat3 m, mat3 dest); CGLM_INLINE void glm_mat3_transpose_to(mat3 m, mat3 dest);
CGLM_INLINE void glm_mat3_transpose(mat3 m); CGLM_INLINE void glm_mat3_transpose(mat3 m);
CGLM_INLINE void glm_mat3_mulv(mat3 m, vec3 v, vec3 dest); CGLM_INLINE void glm_mat3_mulv(mat3 m, vec3 v, vec3 dest);
CGLM_INLINE float glm_mat3_trace(mat3 m); CGLM_INLINE float glm_mat3_trace(mat3 m);
CGLM_INLINE void glm_mat3_quat(mat3 m, versor dest);
CGLM_INLINE void glm_mat3_scale(mat3 m, float s); CGLM_INLINE void glm_mat3_scale(mat3 m, float s);
CGLM_INLINE float glm_mat3_det(mat3 mat); CGLM_INLINE float glm_mat3_det(mat3 mat);
CGLM_INLINE void glm_mat3_inv(mat3 mat, mat3 dest); CGLM_INLINE void glm_mat3_inv(mat3 mat, mat3 dest);
CGLM_INLINE void glm_mat3_swap_col(mat3 mat, int col1, int col2); CGLM_INLINE void glm_mat3_swap_col(mat3 mat, int col1, int col2);
CGLM_INLINE void glm_mat3_swap_row(mat3 mat, int row1, int row2); CGLM_INLINE void glm_mat3_swap_row(mat3 mat, int row1, int row2);
CGLM_INLINE float glm_mat3_rmc(vec3 r, mat3 m, vec3 c);
*/ */
#ifndef cglm_mat3_h #ifndef cglm_mat3_h
@@ -63,7 +66,17 @@
CGLM_INLINE CGLM_INLINE
void void
glm_mat3_copy(mat3 mat, mat3 dest) { glm_mat3_copy(mat3 mat, mat3 dest) {
glm__memcpy(float, dest, mat, sizeof(mat3)); dest[0][0] = mat[0][0];
dest[0][1] = mat[0][1];
dest[0][2] = mat[0][2];
dest[1][0] = mat[1][0];
dest[1][1] = mat[1][1];
dest[1][2] = mat[1][2];
dest[2][0] = mat[2][0];
dest[2][1] = mat[2][1];
dest[2][2] = mat[2][2];
} }
/*! /*!
@@ -106,6 +119,18 @@ glm_mat3_identity_array(mat3 * __restrict mat, size_t count) {
} }
} }
/*!
* @brief make given matrix zero.
*
* @param[in, out] mat matrix
*/
CGLM_INLINE
void
glm_mat3_zero(mat3 mat) {
CGLM_ALIGN_MAT mat3 t = GLM_MAT3_ZERO_INIT;
glm_mat3_copy(t, mat);
}
/*! /*!
* @brief multiply m1 and m2 to dest * @brief multiply m1 and m2 to dest
* *
@@ -372,4 +397,26 @@ glm_mat3_swap_row(mat3 mat, int row1, int row2) {
mat[2][row2] = tmp[2]; mat[2][row2] = tmp[2];
} }
/*!
* @brief helper for R (row vector) * M (matrix) * C (column vector)
*
* rmc stands for Row * Matrix * Column
*
* the result is scalar because R * M = Matrix1x3 (row vector),
* then Matrix1x3 * Vec3 (column vector) = Matrix1x1 (Scalar)
*
* @param[in] r row vector or matrix1x3
* @param[in] m matrix3x3
* @param[in] c column vector or matrix3x1
*
* @return scalar value e.g. Matrix1x1
*/
CGLM_INLINE
float
glm_mat3_rmc(vec3 r, mat3 m, vec3 c) {
vec3 tmp;
glm_mat3_mulv(m, c, tmp);
return glm_vec3_dot(r, tmp);
}
#endif /* cglm_mat3_h */ #endif /* cglm_mat3_h */

View File

@@ -22,6 +22,7 @@
CGLM_INLINE void glm_mat4_copy(mat4 mat, mat4 dest); CGLM_INLINE void glm_mat4_copy(mat4 mat, mat4 dest);
CGLM_INLINE void glm_mat4_identity(mat4 mat); CGLM_INLINE void glm_mat4_identity(mat4 mat);
CGLM_INLINE void glm_mat4_identity_array(mat4 * restrict mat, size_t count); CGLM_INLINE void glm_mat4_identity_array(mat4 * restrict mat, size_t count);
CGLM_INLINE void glm_mat4_zero(mat4 mat);
CGLM_INLINE void glm_mat4_pick3(mat4 mat, mat3 dest); CGLM_INLINE void glm_mat4_pick3(mat4 mat, mat3 dest);
CGLM_INLINE void glm_mat4_pick3t(mat4 mat, mat3 dest); CGLM_INLINE void glm_mat4_pick3t(mat4 mat, mat3 dest);
CGLM_INLINE void glm_mat4_ins3(mat3 mat, mat4 dest); CGLM_INLINE void glm_mat4_ins3(mat3 mat, mat4 dest);
@@ -31,6 +32,7 @@
CGLM_INLINE void glm_mat4_mulv3(mat4 m, vec3 v, vec3 dest); CGLM_INLINE void glm_mat4_mulv3(mat4 m, vec3 v, vec3 dest);
CGLM_INLINE float glm_mat4_trace(mat4 m); CGLM_INLINE float glm_mat4_trace(mat4 m);
CGLM_INLINE float glm_mat4_trace3(mat4 m); CGLM_INLINE float glm_mat4_trace3(mat4 m);
CGLM_INLINE void glm_mat4_quat(mat4 m, versor dest) ;
CGLM_INLINE void glm_mat4_transpose_to(mat4 m, mat4 dest); CGLM_INLINE void glm_mat4_transpose_to(mat4 m, mat4 dest);
CGLM_INLINE void glm_mat4_transpose(mat4 m); CGLM_INLINE void glm_mat4_transpose(mat4 m);
CGLM_INLINE void glm_mat4_scale_p(mat4 m, float s); CGLM_INLINE void glm_mat4_scale_p(mat4 m, float s);
@@ -40,6 +42,7 @@
CGLM_INLINE void glm_mat4_inv_fast(mat4 mat, mat4 dest); CGLM_INLINE void glm_mat4_inv_fast(mat4 mat, mat4 dest);
CGLM_INLINE void glm_mat4_swap_col(mat4 mat, int col1, int col2); CGLM_INLINE void glm_mat4_swap_col(mat4 mat, int col1, int col2);
CGLM_INLINE void glm_mat4_swap_row(mat4 mat, int row1, int row2); CGLM_INLINE void glm_mat4_swap_row(mat4 mat, int row1, int row2);
CGLM_INLINE float glm_mat4_rmc(vec4 r, mat4 m, vec4 c);
*/ */
#ifndef cglm_mat_h #ifndef cglm_mat_h
@@ -98,7 +101,15 @@
CGLM_INLINE CGLM_INLINE
void void
glm_mat4_ucopy(mat4 mat, mat4 dest) { glm_mat4_ucopy(mat4 mat, mat4 dest) {
glm__memcpy(float, dest, mat, sizeof(mat4)); dest[0][0] = mat[0][0]; dest[1][0] = mat[1][0];
dest[0][1] = mat[0][1]; dest[1][1] = mat[1][1];
dest[0][2] = mat[0][2]; dest[1][2] = mat[1][2];
dest[0][3] = mat[0][3]; dest[1][3] = mat[1][3];
dest[2][0] = mat[2][0]; dest[3][0] = mat[3][0];
dest[2][1] = mat[2][1]; dest[3][1] = mat[3][1];
dest[2][2] = mat[2][2]; dest[3][2] = mat[3][2];
dest[2][3] = mat[2][3]; dest[3][3] = mat[3][3];
} }
/*! /*!
@@ -118,6 +129,11 @@ glm_mat4_copy(mat4 mat, mat4 dest) {
glmm_store(dest[1], glmm_load(mat[1])); glmm_store(dest[1], glmm_load(mat[1]));
glmm_store(dest[2], glmm_load(mat[2])); glmm_store(dest[2], glmm_load(mat[2]));
glmm_store(dest[3], glmm_load(mat[3])); glmm_store(dest[3], glmm_load(mat[3]));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest[0], vld1q_f32(mat[0]));
vst1q_f32(dest[1], vld1q_f32(mat[1]));
vst1q_f32(dest[2], vld1q_f32(mat[2]));
vst1q_f32(dest[3], vld1q_f32(mat[3]));
#else #else
glm_mat4_ucopy(mat, dest); glm_mat4_ucopy(mat, dest);
#endif #endif
@@ -163,6 +179,18 @@ glm_mat4_identity_array(mat4 * __restrict mat, size_t count) {
} }
} }
/*!
* @brief make given matrix zero.
*
* @param[in, out] mat matrix
*/
CGLM_INLINE
void
glm_mat4_zero(mat4 mat) {
CGLM_ALIGN_MAT mat4 t = GLM_MAT4_ZERO_INIT;
glm_mat4_copy(t, mat);
}
/*! /*!
* @brief copy upper-left of mat4 to mat3 * @brief copy upper-left of mat4 to mat3
* *
@@ -252,7 +280,7 @@ glm_mat4_mul(mat4 m1, mat4 m2, mat4 dest) {
glm_mat4_mul_avx(m1, m2, dest); glm_mat4_mul_avx(m1, m2, dest);
#elif defined( __SSE__ ) || defined( __SSE2__ ) #elif defined( __SSE__ ) || defined( __SSE2__ )
glm_mat4_mul_sse2(m1, m2, dest); glm_mat4_mul_sse2(m1, m2, dest);
#elif defined( __ARM_NEON_FP ) #elif defined(CGLM_NEON_FP)
glm_mat4_mul_neon(m1, m2, dest); glm_mat4_mul_neon(m1, m2, dest);
#else #else
float a00 = m1[0][0], a01 = m1[0][1], a02 = m1[0][2], a03 = m1[0][3], float a00 = m1[0][0], a01 = m1[0][1], a02 = m1[0][2], a03 = m1[0][3],
@@ -469,10 +497,8 @@ glm_mat4_transpose(mat4 m) {
glm_mat4_transp_sse2(m, m); glm_mat4_transp_sse2(m, m);
#else #else
mat4 d; mat4 d;
glm_mat4_transpose_to(m, d); glm_mat4_transpose_to(m, d);
glm_mat4_ucopy(d, m);
glm__memcpy(float, m, d, sizeof(mat4));
#endif #endif
} }
@@ -506,6 +532,13 @@ void
glm_mat4_scale(mat4 m, float s) { glm_mat4_scale(mat4 m, float s) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glm_mat4_scale_sse2(m, s); glm_mat4_scale_sse2(m, s);
#elif defined(CGLM_NEON_FP)
float32x4_t v0;
v0 = vdupq_n_f32(s);
vst1q_f32(m[0], vmulq_f32(vld1q_f32(m[0]), v0));
vst1q_f32(m[1], vmulq_f32(vld1q_f32(m[1]), v0));
vst1q_f32(m[2], vmulq_f32(vld1q_f32(m[2]), v0));
vst1q_f32(m[3], vmulq_f32(vld1q_f32(m[3]), v0));
#else #else
glm_mat4_scale_p(m, s); glm_mat4_scale_p(m, s);
#endif #endif
@@ -665,4 +698,26 @@ glm_mat4_swap_row(mat4 mat, int row1, int row2) {
mat[3][row2] = tmp[3]; mat[3][row2] = tmp[3];
} }
/*!
* @brief helper for R (row vector) * M (matrix) * C (column vector)
*
* rmc stands for Row * Matrix * Column
*
* the result is scalar because R * M = Matrix1x4 (row vector),
* then Matrix1x4 * Vec4 (column vector) = Matrix1x1 (Scalar)
*
* @param[in] r row vector or matrix1x4
* @param[in] m matrix4x4
* @param[in] c column vector or matrix4x1
*
* @return scalar value e.g. B(s)
*/
CGLM_INLINE
float
glm_mat4_rmc(vec4 r, mat4 m, vec4 c) {
vec4 tmp;
glm_mat4_mulv(m, c, tmp);
return glm_vec4_dot(r, tmp);
}
#endif /* cglm_mat_h */ #endif /* cglm_mat_h */

View File

@@ -218,7 +218,7 @@ glm_quat_normalize_to(versor q, versor dest) {
float dot; float dot;
x0 = glmm_load(q); x0 = glmm_load(q);
xdot = glmm_dot(x0, x0); xdot = glmm_vdot(x0, x0);
dot = _mm_cvtss_f32(xdot); dot = _mm_cvtss_f32(xdot);
if (dot <= 0.0f) { if (dot <= 0.0f) {

41
include/cglm/simd/arm.h Normal file
View File

@@ -0,0 +1,41 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#ifndef cglm_simd_arm_h
#define cglm_simd_arm_h
#include "intrin.h"
#ifdef CGLM_SIMD_ARM
#define glmm_load(p) vld1q_f32(p)
#define glmm_store(p, a) vst1q_f32(p, a)
static inline
float
glmm_hadd(float32x4_t v) {
#if defined(__aarch64__)
return vaddvq_f32(v);
#else
v = vaddq_f32(v, vrev64q_f32(v));
v = vaddq_f32(v, vcombine_f32(vget_high_f32(v), vget_low_f32(v)));
return vgetq_lane_f32(v, 0);
#endif
}
static inline
float
glmm_dot(float32x4_t a, float32x4_t b) {
return glmm_hadd(vmulq_f32(a, b));
}
static inline
float
glmm_norm(float32x4_t a) {
return sqrtf(glmm_dot(a, a));
}
#endif
#endif /* cglm_simd_arm_h */

View File

@@ -27,90 +27,64 @@
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
# include <xmmintrin.h> # include <xmmintrin.h>
# include <emmintrin.h> # include <emmintrin.h>
/* OPTIONAL: You may save some instructions but latency (not sure) */
#ifdef CGLM_USE_INT_DOMAIN
# define glmm_shuff1(xmm, z, y, x, w) \
_mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(xmm), \
_MM_SHUFFLE(z, y, x, w)))
#else
# define glmm_shuff1(xmm, z, y, x, w) \
_mm_shuffle_ps(xmm, xmm, _MM_SHUFFLE(z, y, x, w))
#endif
#define glmm_shuff1x(xmm, x) glmm_shuff1(xmm, x, x, x, x)
#define glmm_shuff2(a, b, z0, y0, x0, w0, z1, y1, x1, w1) \
glmm_shuff1(_mm_shuffle_ps(a, b, _MM_SHUFFLE(z0, y0, x0, w0)), \
z1, y1, x1, w1)
static inline
__m128
glmm_dot(__m128 a, __m128 b) {
__m128 x0;
x0 = _mm_mul_ps(a, b);
x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
return _mm_add_ps(x0, glmm_shuff1(x0, 0, 1, 0, 1));
}
static inline
__m128
glmm_norm(__m128 a) {
return _mm_sqrt_ps(glmm_dot(a, a));
}
static inline
__m128
glmm_load3(float v[3]) {
__m128i xy;
__m128 z;
xy = _mm_loadl_epi64((const __m128i *)v);
z = _mm_load_ss(&v[2]);
return _mm_movelh_ps(_mm_castsi128_ps(xy), z);
}
static inline
void
glmm_store3(__m128 vx, float v[3]) {
_mm_storel_pi((__m64 *)&v[0], vx);
_mm_store_ss(&v[2], glmm_shuff1(vx, 2, 2, 2, 2));
}
#ifdef CGLM_ALL_UNALIGNED
# define glmm_load(p) _mm_loadu_ps(p)
# define glmm_store(p, a) _mm_storeu_ps(p, a)
#else
# define glmm_load(p) _mm_load_ps(p)
# define glmm_store(p, a) _mm_store_ps(p, a)
#endif
#endif
/* x86, x64 */
#if defined( __SSE__ ) || defined( __SSE2__ )
# define CGLM_SSE_FP 1 # define CGLM_SSE_FP 1
# ifndef CGLM_SIMD_x86
# define CGLM_SIMD_x86
# endif
#endif
#if defined(__SSE3__)
# include <x86intrin.h>
# ifndef CGLM_SIMD_x86
# define CGLM_SIMD_x86
# endif
#endif
#if defined(__SSE4_1__)
# include <smmintrin.h>
# ifndef CGLM_SIMD_x86
# define CGLM_SIMD_x86
# endif
#endif
#if defined(__SSE4_2__)
# include <nmmintrin.h>
# ifndef CGLM_SIMD_x86
# define CGLM_SIMD_x86
# endif
#endif #endif
#ifdef __AVX__ #ifdef __AVX__
# include <immintrin.h>
# define CGLM_AVX_FP 1 # define CGLM_AVX_FP 1
# ifndef CGLM_SIMD_x86
#ifdef CGLM_ALL_UNALIGNED # define CGLM_SIMD_x86
# define glmm_load256(p) _mm256_loadu_ps(p) # endif
# define glmm_store256(p, a) _mm256_storeu_ps(p, a)
#else
# define glmm_load256(p) _mm256_load_ps(p)
# define glmm_store256(p, a) _mm256_store_ps(p, a)
#endif
#endif #endif
/* ARM Neon */ /* ARM Neon */
#if defined(__ARM_NEON) && defined(__ARM_NEON_FP) #if defined(__ARM_NEON)
# include <arm_neon.h> # include <arm_neon.h>
# define CGLM_NEON_FP 1 # if defined(__ARM_NEON_FP)
#else # define CGLM_NEON_FP 1
# undef CGLM_NEON_FP # ifndef CGLM_SIMD_ARM
# define CGLM_SIMD_ARM
# endif
# endif
#endif
#if defined(CGLM_SIMD_x86) || defined(CGLM_NEON_FP)
# ifndef CGLM_SIMD
# define CGLM_SIMD
# endif
#endif
#if defined(CGLM_SIMD_x86)
# include "x86.h"
#endif
#if defined(CGLM_SIMD_ARM)
# include "arm.h"
#endif #endif
#endif /* cglm_intrin_h */ #endif /* cglm_intrin_h */

136
include/cglm/simd/x86.h Normal file
View File

@@ -0,0 +1,136 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#ifndef cglm_simd_x86_h
#define cglm_simd_x86_h
#include "intrin.h"
#ifdef CGLM_SIMD_x86
#ifdef CGLM_ALL_UNALIGNED
# define glmm_load(p) _mm_loadu_ps(p)
# define glmm_store(p, a) _mm_storeu_ps(p, a)
#else
# define glmm_load(p) _mm_load_ps(p)
# define glmm_store(p, a) _mm_store_ps(p, a)
#endif
#ifdef CGLM_USE_INT_DOMAIN
# define glmm_shuff1(xmm, z, y, x, w) \
_mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(xmm), \
_MM_SHUFFLE(z, y, x, w)))
#else
# define glmm_shuff1(xmm, z, y, x, w) \
_mm_shuffle_ps(xmm, xmm, _MM_SHUFFLE(z, y, x, w))
#endif
#define glmm_shuff1x(xmm, x) glmm_shuff1(xmm, x, x, x, x)
#define glmm_shuff2(a, b, z0, y0, x0, w0, z1, y1, x1, w1) \
glmm_shuff1(_mm_shuffle_ps(a, b, _MM_SHUFFLE(z0, y0, x0, w0)), \
z1, y1, x1, w1)
#ifdef __AVX__
# ifdef CGLM_ALL_UNALIGNED
# define glmm_load256(p) _mm256_loadu_ps(p)
# define glmm_store256(p, a) _mm256_storeu_ps(p, a)
# else
# define glmm_load256(p) _mm256_load_ps(p)
# define glmm_store256(p, a) _mm256_store_ps(p, a)
# endif
#endif
static inline
__m128
glmm_vhadds(__m128 v) {
#if defined(__SSE3__)
__m128 shuf, sums;
shuf = _mm_movehdup_ps(v);
sums = _mm_add_ps(v, shuf);
shuf = _mm_movehl_ps(shuf, sums);
sums = _mm_add_ss(sums, shuf);
return sums;
#else
__m128 shuf, sums;
shuf = glmm_shuff1(v, 2, 3, 0, 1);
sums = _mm_add_ps(v, shuf);
shuf = _mm_movehl_ps(shuf, sums);
sums = _mm_add_ss(sums, shuf);
return sums;
#endif
}
static inline
float
glmm_hadd(__m128 v) {
return _mm_cvtss_f32(glmm_vhadds(v));
}
static inline
__m128
glmm_vdots(__m128 a, __m128 b) {
#if (defined(__SSE4_1__) || defined(__SSE4_2__)) && defined(CGLM_SSE4_DOT)
return _mm_dp_ps(a, b, 0xFF);
#elif defined(__SSE3__) && defined(CGLM_SSE3_DOT)
__m128 x0, x1;
x0 = _mm_mul_ps(a, b);
x1 = _mm_hadd_ps(x0, x0);
return _mm_hadd_ps(x1, x1);
#else
return glmm_vhadds(_mm_mul_ps(a, b));
#endif
}
static inline
__m128
glmm_vdot(__m128 a, __m128 b) {
#if (defined(__SSE4_1__) || defined(__SSE4_2__)) && defined(CGLM_SSE4_DOT)
return _mm_dp_ps(a, b, 0xFF);
#elif defined(__SSE3__) && defined(CGLM_SSE3_DOT)
__m128 x0, x1;
x0 = _mm_mul_ps(a, b);
x1 = _mm_hadd_ps(x0, x0);
return _mm_hadd_ps(x1, x1);
#else
__m128 x0;
x0 = _mm_mul_ps(a, b);
x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
return _mm_add_ps(x0, glmm_shuff1(x0, 0, 1, 0, 1));
#endif
}
static inline
float
glmm_dot(__m128 a, __m128 b) {
return _mm_cvtss_f32(glmm_vdots(a, b));
}
static inline
float
glmm_norm(__m128 a) {
return _mm_cvtss_f32(_mm_sqrt_ss(glmm_vhadds(_mm_mul_ps(a, a))));
}
static inline
__m128
glmm_load3(float v[3]) {
__m128i xy;
__m128 z;
xy = _mm_loadl_epi64((const __m128i *)v);
z = _mm_load_ss(&v[2]);
return _mm_movelh_ps(_mm_castsi128_ps(xy), z);
}
static inline
void
glmm_store3(__m128 vx, float v[3]) {
_mm_storel_pi((__m64 *)&v[0], vx);
_mm_store_ss(&v[2], glmm_shuff1(vx, 2, 2, 2, 2));
}
#endif
#endif /* cglm_simd_x86_h */

View File

@@ -122,6 +122,8 @@ void
glm_vec4_copy(vec4 v, vec4 dest) { glm_vec4_copy(vec4 v, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, glmm_load(v)); glmm_store(dest, glmm_load(v));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vld1q_f32(v));
#else #else
dest[0] = v[0]; dest[0] = v[0];
dest[1] = v[1]; dest[1] = v[1];
@@ -157,6 +159,8 @@ void
glm_vec4_zero(vec4 v) { glm_vec4_zero(vec4 v) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(v, _mm_setzero_ps()); glmm_store(v, _mm_setzero_ps());
#elif defined(CGLM_NEON_FP)
vst1q_f32(v, vdupq_n_f32(0.0f));
#else #else
v[0] = 0.0f; v[0] = 0.0f;
v[1] = 0.0f; v[1] = 0.0f;
@@ -175,6 +179,8 @@ void
glm_vec4_one(vec4 v) { glm_vec4_one(vec4 v) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(v, _mm_set1_ps(1.0f)); glmm_store(v, _mm_set1_ps(1.0f));
#elif defined(CGLM_NEON_FP)
vst1q_f32(v, vdupq_n_f32(1.0f));
#else #else
v[0] = 1.0f; v[0] = 1.0f;
v[1] = 1.0f; v[1] = 1.0f;
@@ -194,11 +200,8 @@ glm_vec4_one(vec4 v) {
CGLM_INLINE CGLM_INLINE
float float
glm_vec4_dot(vec4 a, vec4 b) { glm_vec4_dot(vec4 a, vec4 b) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined(CGLM_SIMD)
__m128 x0; return glmm_dot(glmm_load(a), glmm_load(b));
x0 = _mm_mul_ps(glmm_load(a), glmm_load(b));
x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
return _mm_cvtss_f32(_mm_add_ss(x0, glmm_shuff1(x0, 0, 1, 0, 1)));
#else #else
return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]; return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
#endif #endif
@@ -218,15 +221,7 @@ glm_vec4_dot(vec4 a, vec4 b) {
CGLM_INLINE CGLM_INLINE
float float
glm_vec4_norm2(vec4 v) { glm_vec4_norm2(vec4 v) {
#if defined( __SSE__ ) || defined( __SSE2__ ) return glm_vec4_dot(v, v);
__m128 x0;
x0 = glmm_load(v);
x0 = _mm_mul_ps(x0, x0);
x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
return _mm_cvtss_f32(_mm_add_ss(x0, glmm_shuff1(x0, 0, 1, 0, 1)));
#else
return v[0] * v[0] + v[1] * v[1] + v[2] * v[2] + v[3] * v[3];
#endif
} }
/*! /*!
@@ -239,12 +234,10 @@ glm_vec4_norm2(vec4 v) {
CGLM_INLINE CGLM_INLINE
float float
glm_vec4_norm(vec4 v) { glm_vec4_norm(vec4 v) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined(CGLM_SIMD)
__m128 x0; return glmm_norm(glmm_load(v));
x0 = glmm_load(v);
return _mm_cvtss_f32(_mm_sqrt_ss(glmm_dot(x0, x0)));
#else #else
return sqrtf(glm_vec4_norm2(v)); return sqrtf(glm_vec4_dot(v, v));
#endif #endif
} }
@@ -260,6 +253,8 @@ void
glm_vec4_add(vec4 a, vec4 b, vec4 dest) { glm_vec4_add(vec4 a, vec4 b, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_add_ps(glmm_load(a), glmm_load(b))); glmm_store(dest, _mm_add_ps(glmm_load(a), glmm_load(b)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(a), vld1q_f32(b)));
#else #else
dest[0] = a[0] + b[0]; dest[0] = a[0] + b[0];
dest[1] = a[1] + b[1]; dest[1] = a[1] + b[1];
@@ -280,6 +275,8 @@ void
glm_vec4_adds(vec4 v, float s, vec4 dest) { glm_vec4_adds(vec4 v, float s, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_add_ps(glmm_load(v), _mm_set1_ps(s))); glmm_store(dest, _mm_add_ps(glmm_load(v), _mm_set1_ps(s)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(v), vdupq_n_f32(s)));
#else #else
dest[0] = v[0] + s; dest[0] = v[0] + s;
dest[1] = v[1] + s; dest[1] = v[1] + s;
@@ -300,6 +297,8 @@ void
glm_vec4_sub(vec4 a, vec4 b, vec4 dest) { glm_vec4_sub(vec4 a, vec4 b, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_sub_ps(glmm_load(a), glmm_load(b))); glmm_store(dest, _mm_sub_ps(glmm_load(a), glmm_load(b)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vsubq_f32(vld1q_f32(a), vld1q_f32(b)));
#else #else
dest[0] = a[0] - b[0]; dest[0] = a[0] - b[0];
dest[1] = a[1] - b[1]; dest[1] = a[1] - b[1];
@@ -320,6 +319,8 @@ void
glm_vec4_subs(vec4 v, float s, vec4 dest) { glm_vec4_subs(vec4 v, float s, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_sub_ps(glmm_load(v), _mm_set1_ps(s))); glmm_store(dest, _mm_sub_ps(glmm_load(v), _mm_set1_ps(s)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vsubq_f32(vld1q_f32(v), vdupq_n_f32(s)));
#else #else
dest[0] = v[0] - s; dest[0] = v[0] - s;
dest[1] = v[1] - s; dest[1] = v[1] - s;
@@ -340,6 +341,8 @@ void
glm_vec4_mul(vec4 a, vec4 b, vec4 dest) { glm_vec4_mul(vec4 a, vec4 b, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_mul_ps(glmm_load(a), glmm_load(b))); glmm_store(dest, _mm_mul_ps(glmm_load(a), glmm_load(b)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vmulq_f32(vld1q_f32(a), vld1q_f32(b)));
#else #else
dest[0] = a[0] * b[0]; dest[0] = a[0] * b[0];
dest[1] = a[1] * b[1]; dest[1] = a[1] * b[1];
@@ -360,6 +363,8 @@ void
glm_vec4_scale(vec4 v, float s, vec4 dest) { glm_vec4_scale(vec4 v, float s, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_mul_ps(glmm_load(v), _mm_set1_ps(s))); glmm_store(dest, _mm_mul_ps(glmm_load(v), _mm_set1_ps(s)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vmulq_f32(vld1q_f32(v), vdupq_n_f32(s)));
#else #else
dest[0] = v[0] * s; dest[0] = v[0] * s;
dest[1] = v[1] * s; dest[1] = v[1] * s;
@@ -442,6 +447,10 @@ glm_vec4_addadd(vec4 a, vec4 b, vec4 dest) {
glmm_store(dest, _mm_add_ps(glmm_load(dest), glmm_store(dest, _mm_add_ps(glmm_load(dest),
_mm_add_ps(glmm_load(a), _mm_add_ps(glmm_load(a),
glmm_load(b)))); glmm_load(b))));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
vaddq_f32(vld1q_f32(a),
vld1q_f32(b))));
#else #else
dest[0] += a[0] + b[0]; dest[0] += a[0] + b[0];
dest[1] += a[1] + b[1]; dest[1] += a[1] + b[1];
@@ -466,6 +475,10 @@ glm_vec4_subadd(vec4 a, vec4 b, vec4 dest) {
glmm_store(dest, _mm_add_ps(glmm_load(dest), glmm_store(dest, _mm_add_ps(glmm_load(dest),
_mm_sub_ps(glmm_load(a), _mm_sub_ps(glmm_load(a),
glmm_load(b)))); glmm_load(b))));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
vsubq_f32(vld1q_f32(a),
vld1q_f32(b))));
#else #else
dest[0] += a[0] - b[0]; dest[0] += a[0] - b[0];
dest[1] += a[1] - b[1]; dest[1] += a[1] - b[1];
@@ -490,6 +503,10 @@ glm_vec4_muladd(vec4 a, vec4 b, vec4 dest) {
glmm_store(dest, _mm_add_ps(glmm_load(dest), glmm_store(dest, _mm_add_ps(glmm_load(dest),
_mm_mul_ps(glmm_load(a), _mm_mul_ps(glmm_load(a),
glmm_load(b)))); glmm_load(b))));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
vmulq_f32(vld1q_f32(a),
vld1q_f32(b))));
#else #else
dest[0] += a[0] * b[0]; dest[0] += a[0] * b[0];
dest[1] += a[1] * b[1]; dest[1] += a[1] * b[1];
@@ -514,6 +531,10 @@ glm_vec4_muladds(vec4 a, float s, vec4 dest) {
glmm_store(dest, _mm_add_ps(glmm_load(dest), glmm_store(dest, _mm_add_ps(glmm_load(dest),
_mm_mul_ps(glmm_load(a), _mm_mul_ps(glmm_load(a),
_mm_set1_ps(s)))); _mm_set1_ps(s))));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
vsubq_f32(vld1q_f32(a),
vdupq_n_f32(s))));
#else #else
dest[0] += a[0] * s; dest[0] += a[0] * s;
dest[1] += a[1] * s; dest[1] += a[1] * s;
@@ -538,6 +559,10 @@ glm_vec4_maxadd(vec4 a, vec4 b, vec4 dest) {
glmm_store(dest, _mm_add_ps(glmm_load(dest), glmm_store(dest, _mm_add_ps(glmm_load(dest),
_mm_max_ps(glmm_load(a), _mm_max_ps(glmm_load(a),
glmm_load(b)))); glmm_load(b))));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
vmaxq_f32(vld1q_f32(a),
vld1q_f32(b))));
#else #else
dest[0] += glm_max(a[0], b[0]); dest[0] += glm_max(a[0], b[0]);
dest[1] += glm_max(a[1], b[1]); dest[1] += glm_max(a[1], b[1]);
@@ -562,6 +587,10 @@ glm_vec4_minadd(vec4 a, vec4 b, vec4 dest) {
glmm_store(dest, _mm_add_ps(glmm_load(dest), glmm_store(dest, _mm_add_ps(glmm_load(dest),
_mm_min_ps(glmm_load(a), _mm_min_ps(glmm_load(a),
glmm_load(b)))); glmm_load(b))));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
vminq_f32(vld1q_f32(a),
vld1q_f32(b))));
#else #else
dest[0] += glm_min(a[0], b[0]); dest[0] += glm_min(a[0], b[0]);
dest[1] += glm_min(a[1], b[1]); dest[1] += glm_min(a[1], b[1]);
@@ -581,6 +610,8 @@ void
glm_vec4_negate_to(vec4 v, vec4 dest) { glm_vec4_negate_to(vec4 v, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_xor_ps(glmm_load(v), _mm_set1_ps(-0.0f))); glmm_store(dest, _mm_xor_ps(glmm_load(v), _mm_set1_ps(-0.0f)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, veorq_s32(vld1q_f32(v), vdupq_n_f32(-0.0f)));
#else #else
dest[0] = -v[0]; dest[0] = -v[0];
dest[1] = -v[1]; dest[1] = -v[1];
@@ -614,7 +645,7 @@ glm_vec4_normalize_to(vec4 v, vec4 dest) {
float dot; float dot;
x0 = glmm_load(v); x0 = glmm_load(v);
xdot = glmm_dot(x0, x0); xdot = glmm_vdot(x0, x0);
dot = _mm_cvtss_f32(xdot); dot = _mm_cvtss_f32(xdot);
if (dot == 0.0f) { if (dot == 0.0f) {
@@ -658,10 +689,25 @@ glm_vec4_normalize(vec4 v) {
CGLM_INLINE CGLM_INLINE
float float
glm_vec4_distance(vec4 a, vec4 b) { glm_vec4_distance(vec4 a, vec4 b) {
#if defined( __SSE__ ) || defined( __SSE2__ )
__m128 x0;
x0 = _mm_sub_ps(glmm_load(b), glmm_load(a));
x0 = _mm_mul_ps(x0, x0);
x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
return _mm_cvtss_f32(_mm_sqrt_ss(_mm_add_ss(x0,
glmm_shuff1(x0, 0, 1, 0, 1))));
#elif defined(CGLM_NEON_FP)
float32x4_t v0;
float32_t r;
v0 = vsubq_f32(vld1q_f32(a), vld1q_f32(b));
r = vaddvq_f32(vmulq_f32(v0, v0));
return sqrtf(r);
#else
return sqrtf(glm_pow2(b[0] - a[0]) return sqrtf(glm_pow2(b[0] - a[0])
+ glm_pow2(b[1] - a[1]) + glm_pow2(b[1] - a[1])
+ glm_pow2(b[2] - a[2]) + glm_pow2(b[2] - a[2])
+ glm_pow2(b[3] - a[3])); + glm_pow2(b[3] - a[3]));
#endif
} }
/*! /*!
@@ -676,6 +722,8 @@ void
glm_vec4_maxv(vec4 a, vec4 b, vec4 dest) { glm_vec4_maxv(vec4 a, vec4 b, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_max_ps(glmm_load(a), glmm_load(b))); glmm_store(dest, _mm_max_ps(glmm_load(a), glmm_load(b)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vmaxq_f32(vld1q_f32(a), vld1q_f32(b)));
#else #else
dest[0] = glm_max(a[0], b[0]); dest[0] = glm_max(a[0], b[0]);
dest[1] = glm_max(a[1], b[1]); dest[1] = glm_max(a[1], b[1]);
@@ -696,6 +744,8 @@ void
glm_vec4_minv(vec4 a, vec4 b, vec4 dest) { glm_vec4_minv(vec4 a, vec4 b, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_min_ps(glmm_load(a), glmm_load(b))); glmm_store(dest, _mm_min_ps(glmm_load(a), glmm_load(b)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vminq_f32(vld1q_f32(a), vld1q_f32(b)));
#else #else
dest[0] = glm_min(a[0], b[0]); dest[0] = glm_min(a[0], b[0]);
dest[1] = glm_min(a[1], b[1]); dest[1] = glm_min(a[1], b[1]);
@@ -717,6 +767,9 @@ glm_vec4_clamp(vec4 v, float minVal, float maxVal) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(v, _mm_min_ps(_mm_max_ps(glmm_load(v), _mm_set1_ps(minVal)), glmm_store(v, _mm_min_ps(_mm_max_ps(glmm_load(v), _mm_set1_ps(minVal)),
_mm_set1_ps(maxVal))); _mm_set1_ps(maxVal)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(v, vminq_f32(vmaxq_f32(vld1q_f32(v), vdupq_n_f32(minVal)),
vdupq_n_f32(maxVal)));
#else #else
v[0] = glm_clamp(v[0], minVal, maxVal); v[0] = glm_clamp(v[0], minVal, maxVal);
v[1] = glm_clamp(v[1], minVal, maxVal); v[1] = glm_clamp(v[1], minVal, maxVal);
@@ -747,4 +800,23 @@ glm_vec4_lerp(vec4 from, vec4 to, float t, vec4 dest) {
glm_vec4_add(from, v, dest); glm_vec4_add(from, v, dest);
} }
/*!
* @brief helper to fill vec4 as [S^3, S^2, S, 1]
*
* @param[in] s parameter
* @param[out] dest destination
*/
CGLM_INLINE
void
glm_vec4_cubic(float s, vec4 dest) {
float ss;
ss = s * s;
dest[0] = ss * s;
dest[1] = ss;
dest[2] = s;
dest[3] = 1.0f;
}
#endif /* cglm_vec4_h */ #endif /* cglm_vec4_h */

View File

@@ -10,6 +10,6 @@
#define CGLM_VERSION_MAJOR 0 #define CGLM_VERSION_MAJOR 0
#define CGLM_VERSION_MINOR 5 #define CGLM_VERSION_MINOR 5
#define CGLM_VERSION_PATCH 1 #define CGLM_VERSION_PATCH 3
#endif /* cglm_version_h */ #endif /* cglm_version_h */

View File

@@ -34,30 +34,32 @@ test_tests_CFLAGS = $(checkCFLAGS)
cglmdir=$(includedir)/cglm cglmdir=$(includedir)/cglm
cglm_HEADERS = include/cglm/version.h \ cglm_HEADERS = include/cglm/version.h \
include/cglm/cglm.h \ include/cglm/cglm.h \
include/cglm/call.h \ include/cglm/call.h \
include/cglm/cam.h \ include/cglm/cam.h \
include/cglm/io.h \ include/cglm/io.h \
include/cglm/mat4.h \ include/cglm/mat4.h \
include/cglm/mat3.h \ include/cglm/mat3.h \
include/cglm/types.h \ include/cglm/types.h \
include/cglm/common.h \ include/cglm/common.h \
include/cglm/affine.h \ include/cglm/affine.h \
include/cglm/vec3.h \ include/cglm/vec3.h \
include/cglm/vec3-ext.h \ include/cglm/vec3-ext.h \
include/cglm/vec4.h \ include/cglm/vec4.h \
include/cglm/vec4-ext.h \ include/cglm/vec4-ext.h \
include/cglm/euler.h \ include/cglm/euler.h \
include/cglm/util.h \ include/cglm/util.h \
include/cglm/quat.h \ include/cglm/quat.h \
include/cglm/affine-mat.h \ include/cglm/affine-mat.h \
include/cglm/plane.h \ include/cglm/plane.h \
include/cglm/frustum.h \ include/cglm/frustum.h \
include/cglm/box.h \ include/cglm/box.h \
include/cglm/color.h \ include/cglm/color.h \
include/cglm/project.h \ include/cglm/project.h \
include/cglm/sphere.h \ include/cglm/sphere.h \
include/cglm/ease.h include/cglm/ease.h \
include/cglm/curve.h \
include/cglm/bezier.h
cglm_calldir=$(includedir)/cglm/call cglm_calldir=$(includedir)/cglm/call
cglm_call_HEADERS = include/cglm/call/mat4.h \ cglm_call_HEADERS = include/cglm/call/mat4.h \
@@ -74,10 +76,14 @@ cglm_call_HEADERS = include/cglm/call/mat4.h \
include/cglm/call/box.h \ include/cglm/call/box.h \
include/cglm/call/project.h \ include/cglm/call/project.h \
include/cglm/call/sphere.h \ include/cglm/call/sphere.h \
include/cglm/call/ease.h include/cglm/call/ease.h \
include/cglm/call/curve.h \
include/cglm/call/bezier.h
cglm_simddir=$(includedir)/cglm/simd cglm_simddir=$(includedir)/cglm/simd
cglm_simd_HEADERS = include/cglm/simd/intrin.h cglm_simd_HEADERS = include/cglm/simd/intrin.h \
include/cglm/simd/x86.h \
include/cglm/simd/arm.h
cglm_simd_sse2dir=$(includedir)/cglm/simd/sse2 cglm_simd_sse2dir=$(includedir)/cglm/simd/sse2
cglm_simd_sse2_HEADERS = include/cglm/simd/sse2/affine.h \ cglm_simd_sse2_HEADERS = include/cglm/simd/sse2/affine.h \
@@ -107,7 +113,9 @@ libcglm_la_SOURCES=\
src/box.c \ src/box.c \
src/project.c \ src/project.c \
src/sphere.c \ src/sphere.c \
src/ease.c src/ease.c \
src/curve.c \
src/bezier.c
test_tests_SOURCES=\ test_tests_SOURCES=\
test/src/test_common.c \ test/src/test_common.c \
@@ -121,7 +129,8 @@ test_tests_SOURCES=\
test/src/test_vec4.c \ test/src/test_vec4.c \
test/src/test_vec3.c \ test/src/test_vec3.c \
test/src/test_mat3.c \ test/src/test_mat3.c \
test/src/test_affine.c test/src/test_affine.c \
test/src/test_bezier.c
all-local: all-local:
sh ./post-build.sh sh ./post-build.sh

27
src/bezier.c Normal file
View File

@@ -0,0 +1,27 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#include "../include/cglm/cglm.h"
#include "../include/cglm/call.h"
CGLM_EXPORT
float
glmc_bezier(float s, float p0, float c0, float c1, float p1) {
return glm_bezier(s, p0, c0, c1, p1);
}
CGLM_EXPORT
float
glmc_hermite(float s, float p0, float t0, float t1, float p1) {
return glm_hermite(s, p0, t0, t1, p1);
}
CGLM_EXPORT
float
glmc_decasteljau(float prm, float p0, float c0, float c1, float p1) {
return glm_decasteljau(prm, p0, c0, c1, p1);
}

View File

@@ -88,6 +88,12 @@ glmc_perspective(float fovy,
dest); dest);
} }
CGLM_EXPORT
void
glmc_persp_move_far(mat4 proj, float deltaFar) {
glm_persp_move_far(proj, deltaFar);
}
CGLM_EXPORT CGLM_EXPORT
void void
glmc_perspective_default(float aspect, mat4 dest) { glmc_perspective_default(float aspect, mat4 dest) {

15
src/curve.c Normal file
View File

@@ -0,0 +1,15 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#include "../include/cglm/cglm.h"
#include "../include/cglm/call.h"
CGLM_EXPORT
float
glmc_smc(float s, mat4 m, vec4 c) {
return glm_smc(s, m, c);
}

View File

@@ -91,3 +91,9 @@ void
glmc_mat3_swap_row(mat3 mat, int row1, int row2) { glmc_mat3_swap_row(mat3 mat, int row1, int row2) {
glm_mat3_swap_row(mat, row1, row2); glm_mat3_swap_row(mat, row1, row2);
} }
CGLM_EXPORT
float
glmc_mat3_rmc(vec3 r, mat3 m, vec3 c) {
return glm_mat3_rmc(r, m, c);
}

View File

@@ -151,3 +151,9 @@ void
glmc_mat4_swap_row(mat4 mat, int row1, int row2) { glmc_mat4_swap_row(mat4 mat, int row1, int row2) {
glm_mat4_swap_row(mat, row1, row2); glm_mat4_swap_row(mat, row1, row2);
} }
CGLM_EXPORT
float
glmc_mat4_rmc(vec4 r, mat4 m, vec4 c) {
return glm_mat4_rmc(r, m, c);
}

View File

@@ -206,6 +206,12 @@ glmc_vec4_lerp(vec4 from, vec4 to, float t, vec4 dest) {
glm_vec4_lerp(from, to, t, dest); glm_vec4_lerp(from, to, t, dest);
} }
CGLM_EXPORT
void
glmc_vec4_cubic(float s, vec4 dest) {
glm_vec4_cubic(s, dest);
}
/* ext */ /* ext */
CGLM_EXPORT CGLM_EXPORT

65
test/src/test_bezier.c Normal file
View File

@@ -0,0 +1,65 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#include "test_common.h"
CGLM_INLINE
float
test_bezier_plain(float s, float p0, float c0, float c1, float p1) {
float x, xx, xxx, ss, sss;
x = 1.0f - s;
xx = x * x;
xxx = xx * x;
ss = s * s;
sss = ss * s;
return p0 * xxx + 3.0f * (c0 * s * xx + c1 * ss * x) + p1 * sss;
}
CGLM_INLINE
float
test_hermite_plain(float s, float p0, float t0, float t1, float p1) {
float ss, sss;
ss = s * s;
sss = ss * s;
return p0 * (2.0f * sss - 3.0f * ss + 1.0f)
+ t0 * (sss - 2.0f * ss + s)
+ p1 * (-2.0f * sss + 3.0f * ss)
+ t1 * (sss - ss);
}
void
test_bezier(void **state) {
float s, p0, p1, c0, c1, smc, Bs, Bs_plain;
s = test_rand();
p0 = test_rand();
p1 = test_rand();
c0 = test_rand();
c1 = test_rand();
/* test cubic bezier */
smc = glm_smc(s, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1});
Bs = glm_bezier(s, p0, c0, c1, p1);
Bs_plain = test_bezier_plain(s, p0, c0, c1, p1);
assert_true(glm_eq(Bs, Bs_plain));
assert_true(glm_eq(smc, Bs_plain));
assert_true(glm_eq(Bs, smc));
/* test cubic hermite */
smc = glm_smc(s, GLM_HERMITE_MAT, (vec4){p0, p1, c0, c1});
Bs = glm_hermite(s, p0, c0, c1, p1);
Bs_plain = test_hermite_plain(s, p0, c0, c1, p1);
assert_true(glm_eq(Bs, Bs_plain));
assert_true(glm_eq(smc, Bs_plain));
assert_true(glm_eq(Bs, smc));
}

View File

@@ -58,7 +58,7 @@ test_rand_vec4(vec4 dest) {
} }
float float
test_rand_angle(void) { test_rand(void) {
srand((unsigned int)time(NULL)); srand((unsigned int)time(NULL));
return drand48(); return drand48();
} }

View File

@@ -59,7 +59,7 @@ void
test_rand_vec4(vec4 dest) ; test_rand_vec4(vec4 dest) ;
float float
test_rand_angle(void); test_rand(void);
void void
test_rand_quat(versor q); test_rand_quat(versor q);

View File

@@ -38,7 +38,10 @@ main(int argc, const char * argv[]) {
cmocka_unit_test(test_vec3), cmocka_unit_test(test_vec3),
/* affine */ /* affine */
cmocka_unit_test(test_affine) cmocka_unit_test(test_affine),
/* bezier */
cmocka_unit_test(test_bezier)
}; };
return cmocka_run_group_tests(tests, NULL, NULL); return cmocka_run_group_tests(tests, NULL, NULL);

View File

@@ -40,4 +40,7 @@ test_vec3(void **state);
void void
test_affine(void **state); test_affine(void **state);
void
test_bezier(void **state);
#endif /* test_tests_h */ #endif /* test_tests_h */

View File

@@ -20,8 +20,10 @@
</ItemGroup> </ItemGroup>
<ItemGroup> <ItemGroup>
<ClCompile Include="..\src\affine.c" /> <ClCompile Include="..\src\affine.c" />
<ClCompile Include="..\src\bezier.c" />
<ClCompile Include="..\src\box.c" /> <ClCompile Include="..\src\box.c" />
<ClCompile Include="..\src\cam.c" /> <ClCompile Include="..\src\cam.c" />
<ClCompile Include="..\src\curve.c" />
<ClCompile Include="..\src\dllmain.c" /> <ClCompile Include="..\src\dllmain.c" />
<ClCompile Include="..\src\ease.c" /> <ClCompile Include="..\src\ease.c" />
<ClCompile Include="..\src\euler.c" /> <ClCompile Include="..\src\euler.c" />
@@ -39,11 +41,14 @@
<ItemGroup> <ItemGroup>
<ClInclude Include="..\include\cglm\affine-mat.h" /> <ClInclude Include="..\include\cglm\affine-mat.h" />
<ClInclude Include="..\include\cglm\affine.h" /> <ClInclude Include="..\include\cglm\affine.h" />
<ClInclude Include="..\include\cglm\bezier.h" />
<ClInclude Include="..\include\cglm\box.h" /> <ClInclude Include="..\include\cglm\box.h" />
<ClInclude Include="..\include\cglm\call.h" /> <ClInclude Include="..\include\cglm\call.h" />
<ClInclude Include="..\include\cglm\call\affine.h" /> <ClInclude Include="..\include\cglm\call\affine.h" />
<ClInclude Include="..\include\cglm\call\bezier.h" />
<ClInclude Include="..\include\cglm\call\box.h" /> <ClInclude Include="..\include\cglm\call\box.h" />
<ClInclude Include="..\include\cglm\call\cam.h" /> <ClInclude Include="..\include\cglm\call\cam.h" />
<ClInclude Include="..\include\cglm\call\curve.h" />
<ClInclude Include="..\include\cglm\call\ease.h" /> <ClInclude Include="..\include\cglm\call\ease.h" />
<ClInclude Include="..\include\cglm\call\euler.h" /> <ClInclude Include="..\include\cglm\call\euler.h" />
<ClInclude Include="..\include\cglm\call\frustum.h" /> <ClInclude Include="..\include\cglm\call\frustum.h" />
@@ -60,6 +65,7 @@
<ClInclude Include="..\include\cglm\cglm.h" /> <ClInclude Include="..\include\cglm\cglm.h" />
<ClInclude Include="..\include\cglm\color.h" /> <ClInclude Include="..\include\cglm\color.h" />
<ClInclude Include="..\include\cglm\common.h" /> <ClInclude Include="..\include\cglm\common.h" />
<ClInclude Include="..\include\cglm\curve.h" />
<ClInclude Include="..\include\cglm\ease.h" /> <ClInclude Include="..\include\cglm\ease.h" />
<ClInclude Include="..\include\cglm\euler.h" /> <ClInclude Include="..\include\cglm\euler.h" />
<ClInclude Include="..\include\cglm\frustum.h" /> <ClInclude Include="..\include\cglm\frustum.h" />
@@ -69,6 +75,7 @@
<ClInclude Include="..\include\cglm\plane.h" /> <ClInclude Include="..\include\cglm\plane.h" />
<ClInclude Include="..\include\cglm\project.h" /> <ClInclude Include="..\include\cglm\project.h" />
<ClInclude Include="..\include\cglm\quat.h" /> <ClInclude Include="..\include\cglm\quat.h" />
<ClInclude Include="..\include\cglm\simd\arm.h" />
<ClInclude Include="..\include\cglm\simd\avx\affine.h" /> <ClInclude Include="..\include\cglm\simd\avx\affine.h" />
<ClInclude Include="..\include\cglm\simd\avx\mat4.h" /> <ClInclude Include="..\include\cglm\simd\avx\mat4.h" />
<ClInclude Include="..\include\cglm\simd\intrin.h" /> <ClInclude Include="..\include\cglm\simd\intrin.h" />
@@ -77,6 +84,7 @@
<ClInclude Include="..\include\cglm\simd\sse2\mat3.h" /> <ClInclude Include="..\include\cglm\simd\sse2\mat3.h" />
<ClInclude Include="..\include\cglm\simd\sse2\mat4.h" /> <ClInclude Include="..\include\cglm\simd\sse2\mat4.h" />
<ClInclude Include="..\include\cglm\simd\sse2\quat.h" /> <ClInclude Include="..\include\cglm\simd\sse2\quat.h" />
<ClInclude Include="..\include\cglm\simd\x86.h" />
<ClInclude Include="..\include\cglm\sphere.h" /> <ClInclude Include="..\include\cglm\sphere.h" />
<ClInclude Include="..\include\cglm\types.h" /> <ClInclude Include="..\include\cglm\types.h" />
<ClInclude Include="..\include\cglm\util.h" /> <ClInclude Include="..\include\cglm\util.h" />

View File

@@ -84,6 +84,12 @@
<ClCompile Include="..\src\ease.c"> <ClCompile Include="..\src\ease.c">
<Filter>src</Filter> <Filter>src</Filter>
</ClCompile> </ClCompile>
<ClCompile Include="..\src\curve.c">
<Filter>src</Filter>
</ClCompile>
<ClCompile Include="..\src\bezier.c">
<Filter>src</Filter>
</ClCompile>
</ItemGroup> </ItemGroup>
<ItemGroup> <ItemGroup>
<ClInclude Include="..\src\config.h"> <ClInclude Include="..\src\config.h">
@@ -233,5 +239,23 @@
<ClInclude Include="..\include\cglm\ease.h"> <ClInclude Include="..\include\cglm\ease.h">
<Filter>include\cglm</Filter> <Filter>include\cglm</Filter>
</ClInclude> </ClInclude>
<ClInclude Include="..\include\cglm\simd\arm.h">
<Filter>include\cglm\simd</Filter>
</ClInclude>
<ClInclude Include="..\include\cglm\simd\x86.h">
<Filter>include\cglm\simd</Filter>
</ClInclude>
<ClInclude Include="..\include\cglm\call\curve.h">
<Filter>include\cglm\call</Filter>
</ClInclude>
<ClInclude Include="..\include\cglm\curve.h">
<Filter>include\cglm</Filter>
</ClInclude>
<ClInclude Include="..\include\cglm\bezier.h">
<Filter>include\cglm</Filter>
</ClInclude>
<ClInclude Include="..\include\cglm\call\bezier.h">
<Filter>include\cglm\call</Filter>
</ClInclude>
</ItemGroup> </ItemGroup>
</Project> </Project>