Compare commits

..

23 Commits
simd ... v0.5.2

Author SHA1 Message Date
Recep Aslantas
1a34ffcf4b Merge pull request #72 from recp/simd-update
SIMD update (NEON, SSE3, SSE4) + Features
2019-02-03 17:18:54 +03:00
Recep Aslantas
af088a1059 Merge branch 'master' into simd-update 2019-02-02 15:58:57 +03:00
Recep Aslantas
18f06743ed build: make automake build slient (less-verbose) 2019-02-02 15:54:09 +03:00
Recep Aslantas
60cfc87009 remove bezier_solve for now 2019-02-02 15:30:05 +03:00
Recep Aslantas
4e5879497e update docs 2019-02-02 15:29:48 +03:00
Recep Aslantas
7848dda1dd curve: cubic hermite intrpolation 2019-01-29 22:17:44 +03:00
Recep Aslantas
1e121a4855 mat4: fix rmc multiplication 2019-01-29 22:11:04 +03:00
Recep Aslantas
0f223db7d3 Merge pull request #74 from ccworld1000/patch-1
Update cglm.podspec
2019-01-29 14:48:46 +03:00
CC
a4e2c39c1d Update cglm.podspec
update pod version
2019-01-29 16:54:02 +08:00
Recep Aslantas
c22231f296 curve: de casteljau implementation for solving cubic bezier 2019-01-28 15:52:42 +03:00
Recep Aslantas
730cb1e9f7 add bezier helpers 2019-01-28 15:32:24 +03:00
Recep Aslantas
b0e48a56ca test: rename test_rand_angle() to test_rand() 2019-01-28 15:31:03 +03:00
Recep Aslantas
11a6e4471e fix vec4_cubic function 2019-01-28 14:26:02 +03:00
Recep Aslantas
60cb4beb0a curve: helper for calculate result of SMC multiplication 2019-01-26 18:06:26 +03:00
Recep Aslantas
32ddf49756 mat4: helper for row * matrix * column 2019-01-26 18:05:05 +03:00
Recep Aslantas
807d5589b4 call: add missing end guard to call headers 2019-01-26 16:05:11 +03:00
Recep Aslantas
59b9e54879 vec4: helper to fill vec4 as [S^3, S^2, S, 1] 2019-01-26 15:54:10 +03:00
Recep Aslantas
fc7f958167 simd: remove re-load in SSE4 and SSE3 2019-01-25 21:56:17 +03:00
Recep Aslantas
31bb303c55 simd: organise SIMD-functions
* optimize dot product
2019-01-24 10:17:49 +03:00
Recep Aslantas
be6aa9a89a simd: optimize some mat4 operations with neon 2019-01-22 09:39:57 +03:00
Recep Aslantas
f65f1d491b simd: optimize vec4_distance with sse and neon 2019-01-22 09:23:51 +03:00
Recep Aslantas
f0c2a2984e simd, neon: add missing neon support for vec4 2019-01-22 09:05:38 +03:00
Recep Aslantas
b117f3bf80 neon: add neon support for most vec4 operations 2019-01-21 23:14:04 +03:00
39 changed files with 1008 additions and 534 deletions

1
.gitignore vendored
View File

@@ -69,3 +69,4 @@ win/cglm_test_*
win/x64 win/x64
win/x85 win/x85
win/Debug win/Debug
cglm-test-ios*

View File

@@ -52,3 +52,12 @@ https://gamedev.stackexchange.com/questions/28395/rotating-vector3-by-a-quaterni
9. Sphere AABB intersect 9. Sphere AABB intersect
https://github.com/erich666/GraphicsGems/blob/master/gems/BoxSphere.c https://github.com/erich666/GraphicsGems/blob/master/gems/BoxSphere.c
10. Horizontal add
https://stackoverflow.com/questions/6996764/fastest-way-to-do-horizontal-float-vector-sum-on-x86
11. de casteljau implementation and comments
https://forums.khronos.org/showthread.php/10264-Animations-in-1-4-1-release-notes-revision-A/page2?highlight=bezier
https://forums.khronos.org/showthread.php/10644-Animation-Bezier-interpolation
https://forums.khronos.org/showthread.php/10387-2D-Tangents-in-Bezier-Splines?p=34164&viewfull=1#post34164
https://forums.khronos.org/showthread.php/10651-Animation-TCB-Spline-Interpolation-in-COLLADA?highlight=bezier

View File

@@ -82,7 +82,11 @@ Currently *cglm* uses default clip space configuration (-1, 1) for camera functi
- inline or pre-compiled function call - inline or pre-compiled function call
- frustum (extract view frustum planes, corners...) - frustum (extract view frustum planes, corners...)
- bounding box (AABB in Frustum (culling), crop, merge...) - bounding box (AABB in Frustum (culling), crop, merge...)
- bounding sphere
- project, unproject - project, unproject
- easing functions
- curves
- curve interpolation helpers (S*M*C, deCasteljau...)
- and other... - and other...
<hr /> <hr />

View File

@@ -2,7 +2,7 @@ Pod::Spec.new do |s|
# Description # Description
s.name = "cglm" s.name = "cglm"
s.version = "0.4.6" s.version = "0.5.1"
s.summary = "📽 Optimized OpenGL/Graphics Math (glm) for C" s.summary = "📽 Optimized OpenGL/Graphics Math (glm) for C"
s.description = <<-DESC s.description = <<-DESC
cglm is math library for graphics programming for C. It is similar to original glm but it is written for C instead of C++ (you can use here too). See the documentation or README for all features. cglm is math library for graphics programming for C. It is similar to original glm but it is written for C instead of C++ (you can use here too). See the documentation or README for all features.

View File

@@ -29,6 +29,7 @@ LT_INIT
# Checks for libraries. # Checks for libraries.
AC_CHECK_LIB([m], [floor]) AC_CHECK_LIB([m], [floor])
m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
AC_SYS_LARGEFILE AC_SYS_LARGEFILE
# Checks for header files. # Checks for header files.

View File

@@ -46,3 +46,5 @@ Follow the :doc:`build` documentation for this
io io
call call
sphere sphere
curve
bezier

89
docs/source/bezier.rst Normal file
View File

@@ -0,0 +1,89 @@
.. default-domain:: C
Bezier
================================================================================
Header: cglm/bezier.h
Common helpers for cubic bezier and similar curves.
Table of contents (click to go):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions:
1. :c:func:`glm_bezier`
2. :c:func:`glm_hermite`
3. :c:func:`glm_decasteljau`
Functions documentation
~~~~~~~~~~~~~~~~~~~~~~~
.. c:function:: float glm_bezier(float s, float p0, float c0, float c1, float p1)
| cubic bezier interpolation
| formula:
.. code-block:: text
B(s) = P0*(1-s)^3 + 3*C0*s*(1-s)^2 + 3*C1*s^2*(1-s) + P1*s^3
| similar result using matrix:
.. code-block:: text
B(s) = glm_smc(t, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1})
| glm_eq(glm_smc(...), glm_bezier(...)) should return TRUE
Parameters:
| *[in]* **s** parameter between 0 and 1
| *[in]* **p0** begin point
| *[in]* **c0** control point 1
| *[in]* **c1** control point 2
| *[in]* **p1** end point
Returns:
B(s)
.. c:function:: float glm_hermite(float s, float p0, float t0, float t1, float p1)
| cubic hermite interpolation
| formula:
.. code-block:: text
H(s) = P0*(2*s^3 - 3*s^2 + 1) + T0*(s^3 - 2*s^2 + s) + P1*(-2*s^3 + 3*s^2) + T1*(s^3 - s^2)
| similar result using matrix:
.. code-block:: text
H(s) = glm_smc(t, GLM_HERMITE_MAT, (vec4){p0, p1, c0, c1})
| glm_eq(glm_smc(...), glm_hermite(...)) should return TRUE
Parameters:
| *[in]* **s** parameter between 0 and 1
| *[in]* **p0** begin point
| *[in]* **t0** tangent 1
| *[in]* **t1** tangent 2
| *[in]* **p1** end point
Returns:
B(s)
.. c:function:: float glm_decasteljau(float prm, float p0, float c0, float c1, float p1)
| iterative way to solve cubic equation
Parameters:
| *[in]* **prm** parameter between 0 and 1
| *[in]* **p0** begin point
| *[in]* **c0** control point 1
| *[in]* **c1** control point 2
| *[in]* **p1** end point
Returns:
parameter to use in cubic equation

41
docs/source/curve.rst Normal file
View File

@@ -0,0 +1,41 @@
.. default-domain:: C
Curve
================================================================================
Header: cglm/curve.h
Common helpers for common curves. For specific curve see its header/doc
e.g bezier
Table of contents (click to go):
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Functions:
1. :c:func:`glm_smc`
Functions documentation
~~~~~~~~~~~~~~~~~~~~~~~
.. c:function:: float glm_smc(float s, mat4 m, vec4 c)
| helper function to calculate **S** * **M** * **C** multiplication for curves
| this function does not encourage you to use SMC, instead it is a helper if you use SMC.
| if you want to specify S as vector then use more generic glm_mat4_rmc() func.
| Example usage:
.. code-block:: c
Bs = glm_smc(s, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1})
Parameters:
| *[in]* **s** parameter between 0 and 1 (this will be [s3, s2, s, 1])
| *[in]* **m** basis matrix
| *[out]* **c** position/control vector
Returns:
scalar value e.g. Bs

View File

@@ -45,6 +45,7 @@ Functions:
#. :c:func:`glm_mat4_inv_fast` #. :c:func:`glm_mat4_inv_fast`
#. :c:func:`glm_mat4_swap_col` #. :c:func:`glm_mat4_swap_col`
#. :c:func:`glm_mat4_swap_row` #. :c:func:`glm_mat4_swap_row`
#. :c:func:`glm_mat4_rmc`
Functions documentation Functions documentation
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
@@ -270,3 +271,20 @@ Functions documentation
| *[in, out]* **mat** matrix | *[in, out]* **mat** matrix
| *[in]* **row1** row1 | *[in]* **row1** row1
| *[in]* **row2** row2 | *[in]* **row2** row2
.. c:function:: float glm_mat4_rmc(vec4 r, mat4 m, vec4 c)
| **rmc** stands for **Row** * **Matrix** * **Column**
| helper for R (row vector) * M (matrix) * C (column vector)
| the result is scalar because S * M = Matrix1x4 (row vector),
| then Matrix1x4 * Vec4 (column vector) = Matrix1x1 (Scalar)
Parameters:
| *[in]* **r** row vector or matrix1x4
| *[in]* **m** matrix4x4
| *[in]* **c** column vector or matrix4x1
Returns:
scalar value e.g. Matrix1x1

View File

@@ -58,11 +58,7 @@ Functions:
#. :c:func:`glm_vec4_minv` #. :c:func:`glm_vec4_minv`
#. :c:func:`glm_vec4_clamp` #. :c:func:`glm_vec4_clamp`
#. :c:func:`glm_vec4_lerp` #. :c:func:`glm_vec4_lerp`
#. :c:func:`glm_vec4_isnan` #. :c:func:`glm_vec4_cubic`
#. :c:func:`glm_vec4_isinf`
#. :c:func:`glm_vec4_isvalid`
#. :c:func:`glm_vec4_sign`
#. :c:func:`glm_vec4_sqrt`
Functions documentation Functions documentation
~~~~~~~~~~~~~~~~~~~~~~~ ~~~~~~~~~~~~~~~~~~~~~~~
@@ -401,3 +397,11 @@ Functions documentation
| *[in]* **to** to value | *[in]* **to** to value
| *[in]* **t** interpolant (amount) clamped between 0 and 1 | *[in]* **t** interpolant (amount) clamped between 0 and 1
| *[out]* **dest** destination | *[out]* **dest** destination
.. c:function:: void glm_vec4_cubic(float s, vec4 dest)
helper to fill vec4 as [S^3, S^2, S, 1]
Parameters:
| *[in]* **s** parameter
| *[out]* **dest** destination

152
include/cglm/bezier.h Normal file
View File

@@ -0,0 +1,152 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#ifndef cglm_bezier_h
#define cglm_bezier_h
#define GLM_BEZIER_MAT_INIT {{-1.0f, 3.0f, -3.0f, 1.0f}, \
{ 3.0f, -6.0f, 3.0f, 0.0f}, \
{-3.0f, 3.0f, 0.0f, 0.0f}, \
{ 1.0f, 0.0f, 0.0f, 0.0f}}
#define GLM_HERMITE_MAT_INIT {{ 2.0f, -3.0f, 0.0f, 1.0f}, \
{-2.0f, 3.0f, 0.0f, 0.0f}, \
{ 1.0f, -2.0f, 1.0f, 0.0f}, \
{ 1.0f, -1.0f, 0.0f, 0.0f}}
/* for C only */
#define GLM_BEZIER_MAT ((mat4)GLM_BEZIER_MAT_INIT)
#define GLM_HERMITE_MAT ((mat4)GLM_HERMITE_MAT_INIT)
#define CGLM_DECASTEL_EPS 1e-9
#define CGLM_DECASTEL_MAX 1000
#define CGLM_DECASTEL_SMALL 1e-20
/*!
* @brief cubic bezier interpolation
*
* Formula:
* B(s) = P0*(1-s)^3 + 3*C0*s*(1-s)^2 + 3*C1*s^2*(1-s) + P1*s^3
*
* similar result using matrix:
* B(s) = glm_smc(t, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1})
*
* glm_eq(glm_smc(...), glm_bezier(...)) should return TRUE
*
* @param[in] s parameter between 0 and 1
* @param[in] p0 begin point
* @param[in] c0 control point 1
* @param[in] c1 control point 2
* @param[in] p1 end point
*
* @return B(s)
*/
CGLM_INLINE
float
glm_bezier(float s, float p0, float c0, float c1, float p1) {
float x, xx, ss, xs3, a;
x = 1.0f - s;
xx = x * x;
ss = s * s;
xs3 = (s - ss) * 3.0f;
a = p0 * xx + c0 * xs3;
return a + s * (c1 * xs3 + p1 * ss - a);
}
/*!
* @brief cubic hermite interpolation
*
* Formula:
* H(s) = P0*(2*s^3 - 3*s^2 + 1) + T0*(s^3 - 2*s^2 + s)
* + P1*(-2*s^3 + 3*s^2) + T1*(s^3 - s^2)
*
* similar result using matrix:
* H(s) = glm_smc(t, GLM_HERMITE_MAT, (vec4){p0, p1, c0, c1})
*
* glm_eq(glm_smc(...), glm_hermite(...)) should return TRUE
*
* @param[in] s parameter between 0 and 1
* @param[in] p0 begin point
* @param[in] t0 tangent 1
* @param[in] t1 tangent 2
* @param[in] p1 end point
*
* @return H(s)
*/
CGLM_INLINE
float
glm_hermite(float s, float p0, float t0, float t1, float p1) {
float ss, d, a, b, c, e, f;
ss = s * s;
a = ss + ss;
c = a + ss;
b = a * s;
d = s * ss;
f = d - ss;
e = b - c;
return p0 * (e + 1.0f) + t0 * (f - ss + s) + t1 * f - p1 * e;
}
/*!
* @brief iterative way to solve cubic equation
*
* @param[in] prm parameter between 0 and 1
* @param[in] p0 begin point
* @param[in] c0 control point 1
* @param[in] c1 control point 2
* @param[in] p1 end point
*
* @return parameter to use in cubic equation
*/
CGLM_INLINE
float
glm_decasteljau(float prm, float p0, float c0, float c1, float p1) {
float u, v, a, b, c, d, e, f;
int i;
if (prm - p0 < CGLM_DECASTEL_SMALL)
return 0.0f;
if (p1 - prm < CGLM_DECASTEL_SMALL)
return 1.0f;
u = 0.0f;
v = 1.0f;
for (i = 0; i < CGLM_DECASTEL_MAX; i++) {
/* de Casteljau Subdivision */
a = (p0 + c0) * 0.5f;
b = (c0 + c1) * 0.5f;
c = (c1 + p1) * 0.5f;
d = (a + b) * 0.5f;
e = (b + c) * 0.5f;
f = (d + e) * 0.5f; /* this one is on the curve! */
/* The curve point is close enough to our wanted t */
if (fabsf(f - prm) < CGLM_DECASTEL_EPS)
return glm_clamp_zo((u + v) * 0.5f);
/* dichotomy */
if (f < prm) {
p0 = f;
c0 = e;
c1 = c;
u = (u + v) * 0.5f;
} else {
c0 = a;
c1 = d;
p1 = f;
v = (u + v) * 0.5f;
}
}
return glm_clamp_zo((u + v) * 0.5f);
}
#endif /* cglm_bezier_h */

View File

@@ -27,6 +27,8 @@ extern "C" {
#include "call/project.h" #include "call/project.h"
#include "call/sphere.h" #include "call/sphere.h"
#include "call/ease.h" #include "call/ease.h"
#include "call/curve.h"
#include "call/bezier.h"
#ifdef __cplusplus #ifdef __cplusplus
} }

View File

@@ -0,0 +1,31 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#ifndef cglmc_bezier_h
#define cglmc_bezier_h
#ifdef __cplusplus
extern "C" {
#endif
#include "../cglm.h"
CGLM_EXPORT
float
glmc_bezier(float s, float p0, float c0, float c1, float p1);
CGLM_EXPORT
float
glmc_hermite(float s, float p0, float t0, float t1, float p1);
CGLM_EXPORT
float
glmc_decasteljau(float prm, float p0, float c0, float c1, float p1);
#ifdef __cplusplus
}
#endif
#endif /* cglmc_bezier_h */

23
include/cglm/call/curve.h Normal file
View File

@@ -0,0 +1,23 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#ifndef cglmc_curve_h
#define cglmc_curve_h
#ifdef __cplusplus
extern "C" {
#endif
#include "../cglm.h"
CGLM_EXPORT
float
glmc_smc(float s, mat4 m, vec4 c);
#ifdef __cplusplus
}
#endif
#endif /* cglmc_curve_h */

View File

@@ -137,4 +137,7 @@ CGLM_EXPORT
float float
glmc_ease_bounce_inout(float t); glmc_ease_bounce_inout(float t);
#ifdef __cplusplus
}
#endif
#endif /* cglmc_ease_h */ #endif /* cglmc_ease_h */

View File

@@ -113,6 +113,10 @@ CGLM_EXPORT
void void
glmc_mat4_swap_row(mat4 mat, int row1, int row2); glmc_mat4_swap_row(mat4 mat, int row1, int row2);
CGLM_EXPORT
float
glmc_mat4_rmc(vec4 r, mat4 m, vec4 c);
#ifdef __cplusplus #ifdef __cplusplus
} }
#endif #endif

View File

@@ -33,4 +33,7 @@ CGLM_EXPORT
bool bool
glmc_sphere_point(vec4 s, vec3 point); glmc_sphere_point(vec4 s, vec3 point);
#ifdef __cplusplus
}
#endif
#endif /* cglmc_sphere_h */ #endif /* cglmc_sphere_h */

View File

@@ -153,6 +153,10 @@ CGLM_EXPORT
void void
glmc_vec4_lerp(vec4 from, vec4 to, float t, vec4 dest); glmc_vec4_lerp(vec4 from, vec4 to, float t, vec4 dest);
CGLM_EXPORT
void
glmc_vec4_cubic(float s, vec4 dest);
/* ext */ /* ext */
CGLM_EXPORT CGLM_EXPORT

View File

@@ -26,5 +26,7 @@
#include "project.h" #include "project.h"
#include "sphere.h" #include "sphere.h"
#include "ease.h" #include "ease.h"
#include "curve.h"
#include "bezier.h"
#endif /* cglm_h */ #endif /* cglm_h */

40
include/cglm/curve.h Normal file
View File

@@ -0,0 +1,40 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#ifndef cglm_curve_h
#define cglm_curve_h
#include "common.h"
#include "vec4.h"
#include "mat4.h"
/*!
* @brief helper function to calculate S*M*C multiplication for curves
*
* This function does not encourage you to use SMC,
* instead it is a helper if you use SMC.
*
* if you want to specify S as vector then use more generic glm_mat4_rmc() func.
*
* Example usage:
* B(s) = glm_smc(s, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1})
*
* @param[in] s parameter between 0 and 1 (this will be [s3, s2, s, 1])
* @param[in] m basis matrix
* @param[in] c position/control vector
*
* @return B(s)
*/
CGLM_INLINE
float
glm_smc(float s, mat4 m, vec4 c) {
vec4 vs;
glm_vec4_cubic(s, vs);
return glm_mat4_rmc(vs, m, c);
}
#endif /* cglm_curve_h */

View File

@@ -118,6 +118,11 @@ glm_mat4_copy(mat4 mat, mat4 dest) {
glmm_store(dest[1], glmm_load(mat[1])); glmm_store(dest[1], glmm_load(mat[1]));
glmm_store(dest[2], glmm_load(mat[2])); glmm_store(dest[2], glmm_load(mat[2]));
glmm_store(dest[3], glmm_load(mat[3])); glmm_store(dest[3], glmm_load(mat[3]));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest[0], vld1q_f32(mat[0]));
vst1q_f32(dest[1], vld1q_f32(mat[1]));
vst1q_f32(dest[2], vld1q_f32(mat[2]));
vst1q_f32(dest[3], vld1q_f32(mat[3]));
#else #else
glm_mat4_ucopy(mat, dest); glm_mat4_ucopy(mat, dest);
#endif #endif
@@ -252,7 +257,7 @@ glm_mat4_mul(mat4 m1, mat4 m2, mat4 dest) {
glm_mat4_mul_avx(m1, m2, dest); glm_mat4_mul_avx(m1, m2, dest);
#elif defined( __SSE__ ) || defined( __SSE2__ ) #elif defined( __SSE__ ) || defined( __SSE2__ )
glm_mat4_mul_sse2(m1, m2, dest); glm_mat4_mul_sse2(m1, m2, dest);
#elif defined( __ARM_NEON_FP ) #elif defined(CGLM_NEON_FP)
glm_mat4_mul_neon(m1, m2, dest); glm_mat4_mul_neon(m1, m2, dest);
#else #else
float a00 = m1[0][0], a01 = m1[0][1], a02 = m1[0][2], a03 = m1[0][3], float a00 = m1[0][0], a01 = m1[0][1], a02 = m1[0][2], a03 = m1[0][3],
@@ -504,10 +509,15 @@ glm_mat4_scale_p(mat4 m, float s) {
CGLM_INLINE CGLM_INLINE
void void
glm_mat4_scale(mat4 m, float s) { glm_mat4_scale(mat4 m, float s) {
#ifdef __AVX__ #if defined( __SSE__ ) || defined( __SSE2__ )
glm_mat4_scale_avx(m, s);
#elif defined( __SSE__ ) || defined( __SSE2__ )
glm_mat4_scale_sse2(m, s); glm_mat4_scale_sse2(m, s);
#elif defined(CGLM_NEON_FP)
float32x4_t v0;
v0 = vdupq_n_f32(s);
vst1q_f32(m[0], vmulq_f32(vld1q_f32(m[0]), v0));
vst1q_f32(m[1], vmulq_f32(vld1q_f32(m[1]), v0));
vst1q_f32(m[2], vmulq_f32(vld1q_f32(m[2]), v0));
vst1q_f32(m[3], vmulq_f32(vld1q_f32(m[3]), v0));
#else #else
glm_mat4_scale_p(m, s); glm_mat4_scale_p(m, s);
#endif #endif
@@ -556,9 +566,7 @@ glm_mat4_det(mat4 mat) {
CGLM_INLINE CGLM_INLINE
void void
glm_mat4_inv(mat4 mat, mat4 dest) { glm_mat4_inv(mat4 mat, mat4 dest) {
#ifdef __AVX__ #if defined( __SSE__ ) || defined( __SSE2__ )
glm_mat4_inv_avx(mat, dest);
#elif defined( __SSE__ ) || defined( __SSE2__ )
glm_mat4_inv_sse2(mat, dest); glm_mat4_inv_sse2(mat, dest);
#else #else
float t[6]; float t[6];
@@ -619,9 +627,7 @@ glm_mat4_inv(mat4 mat, mat4 dest) {
CGLM_INLINE CGLM_INLINE
void void
glm_mat4_inv_fast(mat4 mat, mat4 dest) { glm_mat4_inv_fast(mat4 mat, mat4 dest) {
#ifdef __AVX__ #if defined( __SSE__ ) || defined( __SSE2__ )
glm_mat4_inv_fast_avx(mat, dest);
#elif defined( __SSE__ ) || defined( __SSE2__ )
glm_mat4_inv_fast_sse2(mat, dest); glm_mat4_inv_fast_sse2(mat, dest);
#else #else
glm_mat4_inv(mat, dest); glm_mat4_inv(mat, dest);
@@ -671,4 +677,26 @@ glm_mat4_swap_row(mat4 mat, int row1, int row2) {
mat[3][row2] = tmp[3]; mat[3][row2] = tmp[3];
} }
/*!
* @brief helper for R (row vector) * M (matrix) * C (column vector)
*
* rmc stands for Row * Matrix * Column
*
* the result is scalar because S * M = Matrix1x4 (row vector),
* then Matrix1x4 * Vec4 (column vector) = Matrix1x1 (Scalar)
*
* @param[in] r row vector or matrix1x4
* @param[in] m matrix4x4
* @param[in] c column vector or matrix4x1
*
* @return scalar value e.g. B(s)
*/
CGLM_INLINE
float
glm_mat4_rmc(vec4 r, mat4 m, vec4 c) {
vec4 tmp;
glm_mat4_mulv(m, c, tmp);
return glm_vec4_dot(r, tmp);
}
#endif /* cglm_mat_h */ #endif /* cglm_mat_h */

View File

@@ -218,7 +218,7 @@ glm_quat_normalize_to(versor q, versor dest) {
float dot; float dot;
x0 = glmm_load(q); x0 = glmm_load(q);
xdot = glmm_dot(x0, x0); xdot = glmm_vdot(x0, x0);
dot = _mm_cvtss_f32(xdot); dot = _mm_cvtss_f32(xdot);
if (dot <= 0.0f) { if (dot <= 0.0f) {

41
include/cglm/simd/arm.h Normal file
View File

@@ -0,0 +1,41 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#ifndef cglm_simd_arm_h
#define cglm_simd_arm_h
#include "intrin.h"
#ifdef CGLM_SIMD_ARM
#define glmm_load(p) vld1q_f32(p)
#define glmm_store(p, a) vst1q_f32(p, a)
static inline
float
glmm_hadd(float32x4_t v) {
#if defined(__aarch64__)
return vaddvq_f32(v);
#else
v = vaddq_f32(v, vrev64q_f32(v));
v = vaddq_f32(v, vcombine_f32(vget_high_f32(v), vget_low_f32(v)));
return vgetq_lane_f32(v, 0);
#endif
}
static inline
float
glmm_dot(float32x4_t a, float32x4_t b) {
return glmm_hadd(vmulq_f32(a, b));
}
static inline
float
glmm_norm(float32x4_t a) {
return sqrtf(glmm_dot(a, a));
}
#endif
#endif /* cglm_simd_arm_h */

View File

@@ -14,23 +14,12 @@
#include <immintrin.h> #include <immintrin.h>
CGLM_INLINE
void
glm_mat4_scale_avx(mat4 m, float s) {
__m256 y0;
y0 = _mm256_set1_ps(s);
glmm_store256(m[0], _mm256_mul_ps(y0, glmm_load256(m[0])));
glmm_store256(m[2], _mm256_mul_ps(y0, glmm_load256(m[2])));
}
CGLM_INLINE CGLM_INLINE
void void
glm_mat4_mul_avx(mat4 m1, mat4 m2, mat4 dest) { glm_mat4_mul_avx(mat4 m1, mat4 m2, mat4 dest) {
/* D = R * L (Column-Major) */ /* D = R * L (Column-Major) */
__m256 y0, y1, y2, y3, y4, y5, y6, y7, y8, y9; __m256 y0, y1, y2, y3, y4, y5, y6, y7, y8, y9;
__m256i yi0, yi1, yi2, yi3;
y0 = glmm_load256(m2[0]); /* h g f e d c b a */ y0 = glmm_load256(m2[0]); /* h g f e d c b a */
y1 = glmm_load256(m2[2]); /* p o n m l k j i */ y1 = glmm_load256(m2[2]); /* p o n m l k j i */
@@ -42,19 +31,14 @@ glm_mat4_mul_avx(mat4 m1, mat4 m2, mat4 dest) {
y4 = _mm256_permute2f128_ps(y2, y2, 0x03); /* d c b a h g f e */ y4 = _mm256_permute2f128_ps(y2, y2, 0x03); /* d c b a h g f e */
y5 = _mm256_permute2f128_ps(y3, y3, 0x03); /* l k j i p o n m */ y5 = _mm256_permute2f128_ps(y3, y3, 0x03); /* l k j i p o n m */
yi0 = _mm256_set_epi32(1, 1, 1, 1, 0, 0, 0, 0);
yi1 = _mm256_set_epi32(3, 3, 3, 3, 2, 2, 2, 2);
yi2 = _mm256_set_epi32(0, 0, 0, 0, 1, 1, 1, 1);
yi3 = _mm256_set_epi32(2, 2, 2, 2, 3, 3, 3, 3);
/* f f f f a a a a */ /* f f f f a a a a */
/* h h h h c c c c */ /* h h h h c c c c */
/* e e e e b b b b */ /* e e e e b b b b */
/* g g g g d d d d */ /* g g g g d d d d */
y6 = _mm256_permutevar_ps(y0, yi0); y6 = _mm256_permutevar_ps(y0, _mm256_set_epi32(1, 1, 1, 1, 0, 0, 0, 0));
y7 = _mm256_permutevar_ps(y0, yi1); y7 = _mm256_permutevar_ps(y0, _mm256_set_epi32(3, 3, 3, 3, 2, 2, 2, 2));
y8 = _mm256_permutevar_ps(y0, yi2); y8 = _mm256_permutevar_ps(y0, _mm256_set_epi32(0, 0, 0, 0, 1, 1, 1, 1));
y9 = _mm256_permutevar_ps(y0, yi3); y9 = _mm256_permutevar_ps(y0, _mm256_set_epi32(2, 2, 2, 2, 3, 3, 3, 3));
glmm_store256(dest[0], glmm_store256(dest[0],
_mm256_add_ps(_mm256_add_ps(_mm256_mul_ps(y2, y6), _mm256_add_ps(_mm256_add_ps(_mm256_mul_ps(y2, y6),
@@ -66,10 +50,10 @@ glm_mat4_mul_avx(mat4 m1, mat4 m2, mat4 dest) {
/* p p p p k k k k */ /* p p p p k k k k */
/* m m m m j j j j */ /* m m m m j j j j */
/* o o o o l l l l */ /* o o o o l l l l */
y6 = _mm256_permutevar_ps(y1, yi0); y6 = _mm256_permutevar_ps(y1, _mm256_set_epi32(1, 1, 1, 1, 0, 0, 0, 0));
y7 = _mm256_permutevar_ps(y1, yi1); y7 = _mm256_permutevar_ps(y1, _mm256_set_epi32(3, 3, 3, 3, 2, 2, 2, 2));
y8 = _mm256_permutevar_ps(y1, yi2); y8 = _mm256_permutevar_ps(y1, _mm256_set_epi32(0, 0, 0, 0, 1, 1, 1, 1));
y9 = _mm256_permutevar_ps(y1, yi3); y9 = _mm256_permutevar_ps(y1, _mm256_set_epi32(2, 2, 2, 2, 3, 3, 3, 3));
glmm_store256(dest[2], glmm_store256(dest[2],
_mm256_add_ps(_mm256_add_ps(_mm256_mul_ps(y2, y6), _mm256_add_ps(_mm256_add_ps(_mm256_mul_ps(y2, y6),
@@ -78,365 +62,5 @@ glm_mat4_mul_avx(mat4 m1, mat4 m2, mat4 dest) {
_mm256_mul_ps(y5, y9)))); _mm256_mul_ps(y5, y9))));
} }
CGLM_INLINE
void
glm_mat4_inv_avx(mat4 mat, mat4 dest) {
__m256 y0, y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11, y12, y13;
__m256 yt0, yt1, yt2;
__m256 t0, t1, t2;
__m256 r1, r2;
__m256 flpsign;
__m256i yi1, yi2, yi3;
y0 = glmm_load256(mat[0]); /* h g f e d c b a */
y1 = glmm_load256(mat[2]); /* p o n m l k j i */
y2 = _mm256_permute2f128_ps(y1, y1, 0x00); /* l k j i l k j i */
y3 = _mm256_permute2f128_ps(y1, y1, 0x11); /* p o n m p o n m */
y4 = _mm256_permute2f128_ps(y0, y0, 0x03); /* d c b a h g f e */
y13 = _mm256_permute2f128_ps(y4, y4, 0x00); /* h g f e h g f e */
yi1 = _mm256_set_epi32(0, 0, 0, 0, 0, 1, 1, 2);
yi2 = _mm256_set_epi32(1, 1, 1, 2, 3, 2, 3, 3);
flpsign = _mm256_set_ps(0.f, -0.f, 0.f, -0.f, -0.f, 0.f, -0.f, 0.f);
/* i i i i i j j k */
/* n n n o p o p p */
/* m m m m m n n o */
/* j j j k l k l l */
/* e e e e e f f g */
/* f f f g h g h h */
y5 = _mm256_permutevar_ps(y2, yi1);
y6 = _mm256_permutevar_ps(y3, yi2);
y7 = _mm256_permutevar_ps(y3, yi1);
y8 = _mm256_permutevar_ps(y2, yi2);
y2 = _mm256_permutevar_ps(y13, yi1);
y3 = _mm256_permutevar_ps(y13, yi2);
yi1 = _mm256_set_epi32(2, 1, 0, 0, 2, 1, 0, 0);
yi2 = _mm256_set_epi32(2, 1, 1, 0, 2, 1, 1, 0);
yi3 = _mm256_set_epi32(3, 3, 2, 0, 3, 3, 2, 0);
/*
t0[0] = k * p - o * l; t1[0] = g * p - o * h; t2[0] = g * l - k * h;
t0[1] = j * p - n * l; t1[1] = f * p - n * h; t2[1] = f * l - j * h;
t0[2] = j * o - n * k; t1[2] = f * o - n * g; t2[2] = f * k - j * g;
t0[3] = i * p - m * l; t1[3] = e * p - m * h; t2[3] = e * l - i * h;
t0[4] = i * o - m * k; t1[4] = e * o - m * g; t2[4] = e * k - i * g;
t0[5] = i * n - m * j; t1[5] = e * n - m * f; t2[5] = e * j - i * f;
*/
yt0 = _mm256_sub_ps(_mm256_mul_ps(y5, y6), _mm256_mul_ps(y7, y8));
yt1 = _mm256_sub_ps(_mm256_mul_ps(y2, y6), _mm256_mul_ps(y7, y3));
yt2 = _mm256_sub_ps(_mm256_mul_ps(y2, y8), _mm256_mul_ps(y5, y3));
/* t3 t2 t1 t0 t3 t2 t1 t0 */
/* t5 t5 t5 t4 t5 t5 t5 t4 */
y9 = _mm256_permute2f128_ps(yt0, yt0, 0x00);
y10 = _mm256_permute2f128_ps(yt0, yt0, 0x11);
//
/* t2 t1 t0 t0 t2 t1 t0 t0 */
t0 = _mm256_permutevar_ps(y9, yi1);
/* t4 t3 t3 t1 t4 t3 t3 t1 */
y11 = _mm256_shuffle_ps(y9, y10, 0x4D);
y12 = _mm256_permutevar_ps(y11, yi2);
t1 = _mm256_permute2f128_ps(y12, y9, 0x00);
/* t5 t5 t4 t2 t5 t5 t4 t2 */
y11 = _mm256_shuffle_ps(y9, y10, 0x4A);
y12 = _mm256_permutevar_ps(y11, yi3);
t2 = _mm256_permute2f128_ps(y12, y12, 0x00);
/* a a a b e e e f */
/* b b c c f f g g */
/* c d d d g h h h */
y9 = _mm256_permute_ps(y4, 0x01);
y10 = _mm256_permute_ps(y4, 0x5A);
y11 = _mm256_permute_ps(y4, 0xBF);
/*
dest[0][0] = f * t[0] - g * t[1] + h * t[2];
dest[1][0] =-(e * t[0] - g * t[3] + h * t[4]);
dest[2][0] = e * t[1] - f * t[3] + h * t[5];
dest[3][0] =-(e * t[2] - f * t[4] + g * t[5]);
dest[0][1] =-(b * t[0] - c * t[1] + d * t[2]);
dest[1][1] = a * t[0] - c * t[3] + d * t[4];
dest[2][1] =-(a * t[1] - b * t[3] + d * t[5]);
dest[3][1] = a * t[2] - b * t[4] + c * t[5];
*/
r1 = _mm256_xor_ps(_mm256_add_ps(_mm256_sub_ps(_mm256_mul_ps(y9, t0),
_mm256_mul_ps(y10, t1)),
_mm256_mul_ps(y11, t2)),
flpsign);
/* d c b a d c b a */
y2 = _mm256_permute2f128_ps(y0, y0, 0x0);
/* a a a b a a a b */
/* b b c c b b c c */
/* c d d d c d d d */
y3 = _mm256_permutevar_ps(y2, _mm256_set_epi32(0, 0, 0, 1, 0, 0, 0, 1));
y4 = _mm256_permutevar_ps(y2, _mm256_set_epi32(1, 1, 2, 2, 1, 1, 2, 2));
y5 = _mm256_permutevar_ps(y2, _mm256_set_epi32(2, 3, 3, 3, 2, 3, 3, 3));
/* t2[3] t2[2] t2[1] t2[0] t1[3] t1[2] t1[1] t1[0] */
/* t2[5] t2[5] t2[5] t2[4] t1[5] t1[5] t1[5] t1[4] */
y6 = _mm256_permute2f128_ps(yt1, yt2, 0x20);
y7 = _mm256_permute2f128_ps(yt1, yt2, 0x31);
/* t2[2] t2[1] t2[0] t2[0] t1[2] t1[1] t1[0] t1[0] */
t0 = _mm256_permutevar_ps(y6, yi1);
/* t1[4] t1[3] t1[3] t1[1] t1[4] t1[3] t1[3] t1[1] */
/* t1[4] t1[3] t1[3] t1[1] t1[4] t1[3] t1[3] t1[1] */
y11 = _mm256_shuffle_ps(y6, y7, 0x4D);
t1 = _mm256_permutevar_ps(y11, yi2);
/* t2[5] t2[5] t2[4] t2[2] t1[5] t1[5] t1[4] t1[2] */
y11 = _mm256_shuffle_ps(y6, y7, 0x4A);
t2 = _mm256_permutevar_ps(y11, yi3);
/*
dest[0][2] = b * t1[0] - c * t1[1] + d * t1[2];
dest[1][2] =-(a * t1[0] - c * t1[3] + d * t1[4]);
dest[2][2] = a * t1[1] - b * t1[3] + d * t1[5];
dest[3][2] =-(a * t1[2] - b * t1[4] + c * t1[5]);
dest[0][3] =-(b * t2[0] - c * t2[1] + d * t2[2]);
dest[1][3] = a * t2[0] - c * t2[3] + d * t2[4];
dest[2][3] =-(a * t2[1] - b * t2[3] + d * t2[5]);
dest[3][3] = a * t2[2] - b * t2[4] + c * t2[5];
*/
r2 = _mm256_xor_ps(_mm256_add_ps(_mm256_sub_ps(_mm256_mul_ps(y3, t0),
_mm256_mul_ps(y4, t1)),
_mm256_mul_ps(y5, t2)),
flpsign);
/* determinant */
y4 = _mm256_mul_ps(y0, r1);
y4 = _mm256_permute2f128_ps(y4, y4, 0x30);
y4 = _mm256_dp_ps(y0, r1, 0xff);
y5 = _mm256_div_ps(_mm256_set1_ps(1.0f), y4);
r1 = _mm256_mul_ps(r1, y5);
r2 = _mm256_mul_ps(r2, y5);
/* transpose */
/* d c b a h g f e */
/* l k j i p o n m */
y0 = _mm256_permute2f128_ps(r1, r1, 0x03);
y1 = _mm256_permute2f128_ps(r2, r2, 0x03);
/* b a f e f e b a */
/* j i n m n m j i */
/* i m a e m i e a */
/* j n b f n j f b */
/* n j f b m i e a */
y2 = _mm256_shuffle_ps(r1, y0, 0x44);
y3 = _mm256_shuffle_ps(r2, y1, 0x44);
y4 = _mm256_shuffle_ps(y2, y3, 0x88);
y5 = _mm256_shuffle_ps(y2, y3, 0xDD);
y6 = _mm256_permute2f128_ps(y4, y5, 0x20);
/* d c h g h g d c */
/* l k p o p o l k */
/* k o c g o k g c */
/* l p d h p l h d */
/* p l h d o k g c */
y2 = _mm256_shuffle_ps(r1, y0, 0xEE);
y3 = _mm256_shuffle_ps(r2, y1, 0xEE);
y4 = _mm256_shuffle_ps(y2, y3, 0x88);
y5 = _mm256_shuffle_ps(y2, y3, 0xDD);
y7 = _mm256_permute2f128_ps(y4, y5, 0x20);
glmm_store256(dest[0], y6);
glmm_store256(dest[2], y7);
}
CGLM_INLINE
void
glm_mat4_inv_fast_avx(mat4 mat, mat4 dest) {
__m256 y0, y1, y2, y3, y4, y5, y6, y7, y8, y9, y10, y11, y12, y13;
__m256 yt0, yt1, yt2;
__m256 t0, t1, t2;
__m256 r1, r2;
__m256 flpsign;
__m256i yi1, yi2, yi3;
y0 = glmm_load256(mat[0]); /* h g f e d c b a */
y1 = glmm_load256(mat[2]); /* p o n m l k j i */
y2 = _mm256_permute2f128_ps(y1, y1, 0x00); /* l k j i l k j i */
y3 = _mm256_permute2f128_ps(y1, y1, 0x11); /* p o n m p o n m */
y4 = _mm256_permute2f128_ps(y0, y0, 0x03); /* d c b a h g f e */
y13 = _mm256_permute2f128_ps(y4, y4, 0x00); /* h g f e h g f e */
yi1 = _mm256_set_epi32(0, 0, 0, 0, 0, 1, 1, 2);
yi2 = _mm256_set_epi32(1, 1, 1, 2, 3, 2, 3, 3);
flpsign = _mm256_set_ps(0.f, -0.f, 0.f, -0.f, -0.f, 0.f, -0.f, 0.f);
/* i i i i i j j k */
/* n n n o p o p p */
/* m m m m m n n o */
/* j j j k l k l l */
/* e e e e e f f g */
/* f f f g h g h h */
y5 = _mm256_permutevar_ps(y2, yi1);
y6 = _mm256_permutevar_ps(y3, yi2);
y7 = _mm256_permutevar_ps(y3, yi1);
y8 = _mm256_permutevar_ps(y2, yi2);
y2 = _mm256_permutevar_ps(y13, yi1);
y3 = _mm256_permutevar_ps(y13, yi2);
yi1 = _mm256_set_epi32(2, 1, 0, 0, 2, 1, 0, 0);
yi2 = _mm256_set_epi32(2, 1, 1, 0, 2, 1, 1, 0);
yi3 = _mm256_set_epi32(3, 3, 2, 0, 3, 3, 2, 0);
/*
t0[0] = k * p - o * l; t1[0] = g * p - o * h; t2[0] = g * l - k * h;
t0[1] = j * p - n * l; t1[1] = f * p - n * h; t2[1] = f * l - j * h;
t0[2] = j * o - n * k; t1[2] = f * o - n * g; t2[2] = f * k - j * g;
t0[3] = i * p - m * l; t1[3] = e * p - m * h; t2[3] = e * l - i * h;
t0[4] = i * o - m * k; t1[4] = e * o - m * g; t2[4] = e * k - i * g;
t0[5] = i * n - m * j; t1[5] = e * n - m * f; t2[5] = e * j - i * f;
*/
yt0 = _mm256_sub_ps(_mm256_mul_ps(y5, y6), _mm256_mul_ps(y7, y8));
yt1 = _mm256_sub_ps(_mm256_mul_ps(y2, y6), _mm256_mul_ps(y7, y3));
yt2 = _mm256_sub_ps(_mm256_mul_ps(y2, y8), _mm256_mul_ps(y5, y3));
/* t3 t2 t1 t0 t3 t2 t1 t0 */
/* t5 t5 t5 t4 t5 t5 t5 t4 */
y9 = _mm256_permute2f128_ps(yt0, yt0, 0x00);
y10 = _mm256_permute2f128_ps(yt0, yt0, 0x11);
/* t2 t1 t0 t0 t2 t1 t0 t0 */
t0 = _mm256_permutevar_ps(y9, yi1);
/* t4 t3 t3 t1 t4 t3 t3 t1 */
y11 = _mm256_shuffle_ps(y9, y10, 0x4D);
y12 = _mm256_permutevar_ps(y11, yi2);
t1 = _mm256_permute2f128_ps(y12, y9, 0x00);
/* t5 t5 t4 t2 t5 t5 t4 t2 */
y11 = _mm256_shuffle_ps(y9, y10, 0x4A);
y12 = _mm256_permutevar_ps(y11, yi3);
t2 = _mm256_permute2f128_ps(y12, y12, 0x00);
/* a a a b e e e f */
/* b b c c f f g g */
/* c d d d g h h h */
y9 = _mm256_permute_ps(y4, 0x01);
y10 = _mm256_permute_ps(y4, 0x5A);
y11 = _mm256_permute_ps(y4, 0xBF);
/*
dest[0][0] = f * t[0] - g * t[1] + h * t[2];
dest[1][0] =-(e * t[0] - g * t[3] + h * t[4]);
dest[2][0] = e * t[1] - f * t[3] + h * t[5];
dest[3][0] =-(e * t[2] - f * t[4] + g * t[5]);
dest[0][1] =-(b * t[0] - c * t[1] + d * t[2]);
dest[1][1] = a * t[0] - c * t[3] + d * t[4];
dest[2][1] =-(a * t[1] - b * t[3] + d * t[5]);
dest[3][1] = a * t[2] - b * t[4] + c * t[5];
*/
r1 = _mm256_xor_ps(_mm256_add_ps(_mm256_sub_ps(_mm256_mul_ps(y9, t0),
_mm256_mul_ps(y10, t1)),
_mm256_mul_ps(y11, t2)),
flpsign);
/* d c b a d c b a */
y2 = _mm256_permute2f128_ps(y0, y0, 0x0);
/* a a a b a a a b */
/* b b c c b b c c */
/* c d d d c d d d */
y3 = _mm256_permutevar_ps(y2, _mm256_set_epi32(0, 0, 0, 1, 0, 0, 0, 1));
y4 = _mm256_permutevar_ps(y2, _mm256_set_epi32(1, 1, 2, 2, 1, 1, 2, 2));
y5 = _mm256_permutevar_ps(y2, _mm256_set_epi32(2, 3, 3, 3, 2, 3, 3, 3));
/* t2[3] t2[2] t2[1] t2[0] t1[3] t1[2] t1[1] t1[0] */
/* t2[5] t2[5] t2[5] t2[4] t1[5] t1[5] t1[5] t1[4] */
y6 = _mm256_permute2f128_ps(yt1, yt2, 0x20);
y7 = _mm256_permute2f128_ps(yt1, yt2, 0x31);
/* t2[2] t2[1] t2[0] t2[0] t1[2] t1[1] t1[0] t1[0] */
t0 = _mm256_permutevar_ps(y6, yi1);
/* t1[4] t1[3] t1[3] t1[1] t1[4] t1[3] t1[3] t1[1] */
/* t1[4] t1[3] t1[3] t1[1] t1[4] t1[3] t1[3] t1[1] */
y11 = _mm256_shuffle_ps(y6, y7, 0x4D);
t1 = _mm256_permutevar_ps(y11, yi2);
/* t2[5] t2[5] t2[4] t2[2] t1[5] t1[5] t1[4] t1[2] */
y11 = _mm256_shuffle_ps(y6, y7, 0x4A);
t2 = _mm256_permutevar_ps(y11, yi3);
/*
dest[0][2] = b * t1[0] - c * t1[1] + d * t1[2];
dest[1][2] =-(a * t1[0] - c * t1[3] + d * t1[4]);
dest[2][2] = a * t1[1] - b * t1[3] + d * t1[5];
dest[3][2] =-(a * t1[2] - b * t1[4] + c * t1[5]);
dest[0][3] =-(b * t2[0] - c * t2[1] + d * t2[2]);
dest[1][3] = a * t2[0] - c * t2[3] + d * t2[4];
dest[2][3] =-(a * t2[1] - b * t2[3] + d * t2[5]);
dest[3][3] = a * t2[2] - b * t2[4] + c * t2[5];
*/
r2 = _mm256_xor_ps(_mm256_add_ps(_mm256_sub_ps(_mm256_mul_ps(y3, t0),
_mm256_mul_ps(y4, t1)),
_mm256_mul_ps(y5, t2)),
flpsign);
/* determinant */
y4 = _mm256_mul_ps(y0, r1);
y4 = _mm256_permute2f128_ps(y4, y4, 0x30);
y4 = _mm256_dp_ps(y0, r1, 0xff);
y5 = _mm256_rcp_ps(y4);
r1 = _mm256_mul_ps(r1, y5);
r2 = _mm256_mul_ps(r2, y5);
/* transpose */
/* d c b a h g f e */
/* l k j i p o n m */
y0 = _mm256_permute2f128_ps(r1, r1, 0x03);
y1 = _mm256_permute2f128_ps(r2, r2, 0x03);
/* b a f e f e b a */
/* j i n m n m j i */
/* i m a e m i e a */
/* j n b f n j f b */
/* n j f b m i e a */
y2 = _mm256_shuffle_ps(r1, y0, 0x44);
y3 = _mm256_shuffle_ps(r2, y1, 0x44);
y4 = _mm256_shuffle_ps(y2, y3, 0x88);
y5 = _mm256_shuffle_ps(y2, y3, 0xDD);
y6 = _mm256_permute2f128_ps(y4, y5, 0x20);
/* d c h g h g d c */
/* l k p o p o l k */
/* k o c g o k g c */
/* l p d h p l h d */
/* p l h d o k g c */
y2 = _mm256_shuffle_ps(r1, y0, 0xEE);
y3 = _mm256_shuffle_ps(r2, y1, 0xEE);
y4 = _mm256_shuffle_ps(y2, y3, 0x88);
y5 = _mm256_shuffle_ps(y2, y3, 0xDD);
y7 = _mm256_permute2f128_ps(y4, y5, 0x20);
glmm_store256(dest[0], y6);
glmm_store256(dest[2], y7);
}
#endif #endif
#endif /* cglm_mat_simd_avx_h */ #endif /* cglm_mat_simd_avx_h */

View File

@@ -27,90 +27,64 @@
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
# include <xmmintrin.h> # include <xmmintrin.h>
# include <emmintrin.h> # include <emmintrin.h>
/* OPTIONAL: You may save some instructions but latency (not sure) */
#ifdef CGLM_USE_INT_DOMAIN
# define glmm_shuff1(xmm, z, y, x, w) \
_mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(xmm), \
_MM_SHUFFLE(z, y, x, w)))
#else
# define glmm_shuff1(xmm, z, y, x, w) \
_mm_shuffle_ps(xmm, xmm, _MM_SHUFFLE(z, y, x, w))
#endif
#define glmm_shuff1x(xmm, x) glmm_shuff1(xmm, x, x, x, x)
#define glmm_shuff2(a, b, z0, y0, x0, w0, z1, y1, x1, w1) \
glmm_shuff1(_mm_shuffle_ps(a, b, _MM_SHUFFLE(z0, y0, x0, w0)), \
z1, y1, x1, w1)
static inline
__m128
glmm_dot(__m128 a, __m128 b) {
__m128 x0;
x0 = _mm_mul_ps(a, b);
x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
return _mm_add_ps(x0, glmm_shuff1(x0, 0, 1, 0, 1));
}
static inline
__m128
glmm_norm(__m128 a) {
return _mm_sqrt_ps(glmm_dot(a, a));
}
static inline
__m128
glmm_load3(float v[3]) {
__m128i xy;
__m128 z;
xy = _mm_loadl_epi64((const __m128i *)v);
z = _mm_load_ss(&v[2]);
return _mm_movelh_ps(_mm_castsi128_ps(xy), z);
}
static inline
void
glmm_store3(__m128 vx, float v[3]) {
_mm_storel_pi((__m64 *)&v[0], vx);
_mm_store_ss(&v[2], glmm_shuff1(vx, 2, 2, 2, 2));
}
#ifdef CGLM_ALL_UNALIGNED
# define glmm_load(p) _mm_loadu_ps(p)
# define glmm_store(p, a) _mm_storeu_ps(p, a)
#else
# define glmm_load(p) _mm_load_ps(p)
# define glmm_store(p, a) _mm_store_ps(p, a)
#endif
#endif
/* x86, x64 */
#if defined( __SSE__ ) || defined( __SSE2__ )
# define CGLM_SSE_FP 1 # define CGLM_SSE_FP 1
# ifndef CGLM_SIMD_x86
# define CGLM_SIMD_x86
# endif
#endif
#if defined(__SSE3__)
# include <x86intrin.h>
# ifndef CGLM_SIMD_x86
# define CGLM_SIMD_x86
# endif
#endif
#if defined(__SSE4_1__)
# include <smmintrin.h>
# ifndef CGLM_SIMD_x86
# define CGLM_SIMD_x86
# endif
#endif
#if defined(__SSE4_2__)
# include <nmmintrin.h>
# ifndef CGLM_SIMD_x86
# define CGLM_SIMD_x86
# endif
#endif #endif
#ifdef __AVX__ #ifdef __AVX__
# include <immintrin.h>
# define CGLM_AVX_FP 1 # define CGLM_AVX_FP 1
# ifndef CGLM_SIMD_x86
#ifdef CGLM_ALL_UNALIGNED # define CGLM_SIMD_x86
# define glmm_load256(p) _mm256_loadu_ps(p) # endif
# define glmm_store256(p, a) _mm256_storeu_ps(p, a)
#else
# define glmm_load256(p) _mm256_load_ps(p)
# define glmm_store256(p, a) _mm256_store_ps(p, a)
#endif
#endif #endif
/* ARM Neon */ /* ARM Neon */
#if defined(__ARM_NEON) && defined(__ARM_NEON_FP) #if defined(__ARM_NEON)
# include <arm_neon.h> # include <arm_neon.h>
# if defined(__ARM_NEON_FP)
# define CGLM_NEON_FP 1 # define CGLM_NEON_FP 1
#else # ifndef CGLM_SIMD_ARM
# undef CGLM_NEON_FP # define CGLM_SIMD_ARM
# endif
# endif
#endif
#if defined(CGLM_SIMD_x86) || defined(CGLM_NEON_FP)
# ifndef CGLM_SIMD
# define CGLM_SIMD
# endif
#endif
#if defined(CGLM_SIMD_x86)
# include "x86.h"
#endif
#if defined(CGLM_SIMD_ARM)
# include "arm.h"
#endif #endif
#endif /* cglm_intrin_h */ #endif /* cglm_intrin_h */

136
include/cglm/simd/x86.h Normal file
View File

@@ -0,0 +1,136 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#ifndef cglm_simd_x86_h
#define cglm_simd_x86_h
#include "intrin.h"
#ifdef CGLM_SIMD_x86
#ifdef CGLM_ALL_UNALIGNED
# define glmm_load(p) _mm_loadu_ps(p)
# define glmm_store(p, a) _mm_storeu_ps(p, a)
#else
# define glmm_load(p) _mm_load_ps(p)
# define glmm_store(p, a) _mm_store_ps(p, a)
#endif
#ifdef CGLM_USE_INT_DOMAIN
# define glmm_shuff1(xmm, z, y, x, w) \
_mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(xmm), \
_MM_SHUFFLE(z, y, x, w)))
#else
# define glmm_shuff1(xmm, z, y, x, w) \
_mm_shuffle_ps(xmm, xmm, _MM_SHUFFLE(z, y, x, w))
#endif
#define glmm_shuff1x(xmm, x) glmm_shuff1(xmm, x, x, x, x)
#define glmm_shuff2(a, b, z0, y0, x0, w0, z1, y1, x1, w1) \
glmm_shuff1(_mm_shuffle_ps(a, b, _MM_SHUFFLE(z0, y0, x0, w0)), \
z1, y1, x1, w1)
#ifdef __AVX__
# ifdef CGLM_ALL_UNALIGNED
# define glmm_load256(p) _mm256_loadu_ps(p)
# define glmm_store256(p, a) _mm256_storeu_ps(p, a)
# else
# define glmm_load256(p) _mm256_load_ps(p)
# define glmm_store256(p, a) _mm256_store_ps(p, a)
# endif
#endif
static inline
__m128
glmm_vhadds(__m128 v) {
#if defined(__SSE3__)
__m128 shuf, sums;
shuf = _mm_movehdup_ps(v);
sums = _mm_add_ps(v, shuf);
shuf = _mm_movehl_ps(shuf, sums);
sums = _mm_add_ss(sums, shuf);
return sums;
#else
__m128 shuf, sums;
shuf = glmm_shuff1(v, 2, 3, 0, 1);
sums = _mm_add_ps(v, shuf);
shuf = _mm_movehl_ps(shuf, sums);
sums = _mm_add_ss(sums, shuf);
return sums;
#endif
}
static inline
float
glmm_hadd(__m128 v) {
return _mm_cvtss_f32(glmm_vhadds(v));
}
static inline
__m128
glmm_vdots(__m128 a, __m128 b) {
#if (defined(__SSE4_1__) || defined(__SSE4_2__)) && defined(CGLM_SSE4_DOT)
return _mm_dp_ps(a, b, 0xFF);
#elif defined(__SSE3__) && defined(CGLM_SSE3_DOT)
__m128 x0, x1;
x0 = _mm_mul_ps(a, b);
x1 = _mm_hadd_ps(x0, x0);
return _mm_hadd_ps(x1, x1);
#else
return glmm_vhadds(_mm_mul_ps(a, b));
#endif
}
static inline
__m128
glmm_vdot(__m128 a, __m128 b) {
#if (defined(__SSE4_1__) || defined(__SSE4_2__)) && defined(CGLM_SSE4_DOT)
return _mm_dp_ps(a, b, 0xFF);
#elif defined(__SSE3__) && defined(CGLM_SSE3_DOT)
__m128 x0, x1;
x0 = _mm_mul_ps(a, b);
x1 = _mm_hadd_ps(x0, x0);
return _mm_hadd_ps(x1, x1);
#else
__m128 x0;
x0 = _mm_mul_ps(a, b);
x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
return _mm_add_ps(x0, glmm_shuff1(x0, 0, 1, 0, 1));
#endif
}
static inline
float
glmm_dot(__m128 a, __m128 b) {
return _mm_cvtss_f32(glmm_vdots(a, b));
}
static inline
float
glmm_norm(__m128 a) {
return _mm_cvtss_f32(_mm_sqrt_ss(glmm_vhadds(_mm_mul_ps(a, a))));
}
static inline
__m128
glmm_load3(float v[3]) {
__m128i xy;
__m128 z;
xy = _mm_loadl_epi64((const __m128i *)v);
z = _mm_load_ss(&v[2]);
return _mm_movelh_ps(_mm_castsi128_ps(xy), z);
}
static inline
void
glmm_store3(__m128 vx, float v[3]) {
_mm_storel_pi((__m64 *)&v[0], vx);
_mm_store_ss(&v[2], glmm_shuff1(vx, 2, 2, 2, 2));
}
#endif
#endif /* cglm_simd_x86_h */

View File

@@ -122,6 +122,8 @@ void
glm_vec4_copy(vec4 v, vec4 dest) { glm_vec4_copy(vec4 v, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, glmm_load(v)); glmm_store(dest, glmm_load(v));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vld1q_f32(v));
#else #else
dest[0] = v[0]; dest[0] = v[0];
dest[1] = v[1]; dest[1] = v[1];
@@ -157,6 +159,8 @@ void
glm_vec4_zero(vec4 v) { glm_vec4_zero(vec4 v) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(v, _mm_setzero_ps()); glmm_store(v, _mm_setzero_ps());
#elif defined(CGLM_NEON_FP)
vst1q_f32(v, vdupq_n_f32(0.0f));
#else #else
v[0] = 0.0f; v[0] = 0.0f;
v[1] = 0.0f; v[1] = 0.0f;
@@ -175,6 +179,8 @@ void
glm_vec4_one(vec4 v) { glm_vec4_one(vec4 v) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(v, _mm_set1_ps(1.0f)); glmm_store(v, _mm_set1_ps(1.0f));
#elif defined(CGLM_NEON_FP)
vst1q_f32(v, vdupq_n_f32(1.0f));
#else #else
v[0] = 1.0f; v[0] = 1.0f;
v[1] = 1.0f; v[1] = 1.0f;
@@ -194,11 +200,8 @@ glm_vec4_one(vec4 v) {
CGLM_INLINE CGLM_INLINE
float float
glm_vec4_dot(vec4 a, vec4 b) { glm_vec4_dot(vec4 a, vec4 b) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined(CGLM_SIMD)
__m128 x0; return glmm_dot(glmm_load(a), glmm_load(b));
x0 = _mm_mul_ps(glmm_load(a), glmm_load(b));
x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
return _mm_cvtss_f32(_mm_add_ss(x0, glmm_shuff1(x0, 0, 1, 0, 1)));
#else #else
return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3]; return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
#endif #endif
@@ -218,15 +221,7 @@ glm_vec4_dot(vec4 a, vec4 b) {
CGLM_INLINE CGLM_INLINE
float float
glm_vec4_norm2(vec4 v) { glm_vec4_norm2(vec4 v) {
#if defined( __SSE__ ) || defined( __SSE2__ ) return glm_vec4_dot(v, v);
__m128 x0;
x0 = glmm_load(v);
x0 = _mm_mul_ps(x0, x0);
x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
return _mm_cvtss_f32(_mm_add_ss(x0, glmm_shuff1(x0, 0, 1, 0, 1)));
#else
return v[0] * v[0] + v[1] * v[1] + v[2] * v[2] + v[3] * v[3];
#endif
} }
/*! /*!
@@ -239,12 +234,10 @@ glm_vec4_norm2(vec4 v) {
CGLM_INLINE CGLM_INLINE
float float
glm_vec4_norm(vec4 v) { glm_vec4_norm(vec4 v) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined(CGLM_SIMD)
__m128 x0; return glmm_norm(glmm_load(v));
x0 = glmm_load(v);
return _mm_cvtss_f32(_mm_sqrt_ss(glmm_dot(x0, x0)));
#else #else
return sqrtf(glm_vec4_norm2(v)); return sqrtf(glm_vec4_dot(v, v));
#endif #endif
} }
@@ -260,6 +253,8 @@ void
glm_vec4_add(vec4 a, vec4 b, vec4 dest) { glm_vec4_add(vec4 a, vec4 b, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_add_ps(glmm_load(a), glmm_load(b))); glmm_store(dest, _mm_add_ps(glmm_load(a), glmm_load(b)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(a), vld1q_f32(b)));
#else #else
dest[0] = a[0] + b[0]; dest[0] = a[0] + b[0];
dest[1] = a[1] + b[1]; dest[1] = a[1] + b[1];
@@ -280,6 +275,8 @@ void
glm_vec4_adds(vec4 v, float s, vec4 dest) { glm_vec4_adds(vec4 v, float s, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_add_ps(glmm_load(v), _mm_set1_ps(s))); glmm_store(dest, _mm_add_ps(glmm_load(v), _mm_set1_ps(s)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(v), vdupq_n_f32(s)));
#else #else
dest[0] = v[0] + s; dest[0] = v[0] + s;
dest[1] = v[1] + s; dest[1] = v[1] + s;
@@ -300,6 +297,8 @@ void
glm_vec4_sub(vec4 a, vec4 b, vec4 dest) { glm_vec4_sub(vec4 a, vec4 b, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_sub_ps(glmm_load(a), glmm_load(b))); glmm_store(dest, _mm_sub_ps(glmm_load(a), glmm_load(b)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vsubq_f32(vld1q_f32(a), vld1q_f32(b)));
#else #else
dest[0] = a[0] - b[0]; dest[0] = a[0] - b[0];
dest[1] = a[1] - b[1]; dest[1] = a[1] - b[1];
@@ -320,6 +319,8 @@ void
glm_vec4_subs(vec4 v, float s, vec4 dest) { glm_vec4_subs(vec4 v, float s, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_sub_ps(glmm_load(v), _mm_set1_ps(s))); glmm_store(dest, _mm_sub_ps(glmm_load(v), _mm_set1_ps(s)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vsubq_f32(vld1q_f32(v), vdupq_n_f32(s)));
#else #else
dest[0] = v[0] - s; dest[0] = v[0] - s;
dest[1] = v[1] - s; dest[1] = v[1] - s;
@@ -340,6 +341,8 @@ void
glm_vec4_mul(vec4 a, vec4 b, vec4 dest) { glm_vec4_mul(vec4 a, vec4 b, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_mul_ps(glmm_load(a), glmm_load(b))); glmm_store(dest, _mm_mul_ps(glmm_load(a), glmm_load(b)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vmulq_f32(vld1q_f32(a), vld1q_f32(b)));
#else #else
dest[0] = a[0] * b[0]; dest[0] = a[0] * b[0];
dest[1] = a[1] * b[1]; dest[1] = a[1] * b[1];
@@ -360,6 +363,8 @@ void
glm_vec4_scale(vec4 v, float s, vec4 dest) { glm_vec4_scale(vec4 v, float s, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_mul_ps(glmm_load(v), _mm_set1_ps(s))); glmm_store(dest, _mm_mul_ps(glmm_load(v), _mm_set1_ps(s)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vmulq_f32(vld1q_f32(v), vdupq_n_f32(s)));
#else #else
dest[0] = v[0] * s; dest[0] = v[0] * s;
dest[1] = v[1] * s; dest[1] = v[1] * s;
@@ -442,6 +447,10 @@ glm_vec4_addadd(vec4 a, vec4 b, vec4 dest) {
glmm_store(dest, _mm_add_ps(glmm_load(dest), glmm_store(dest, _mm_add_ps(glmm_load(dest),
_mm_add_ps(glmm_load(a), _mm_add_ps(glmm_load(a),
glmm_load(b)))); glmm_load(b))));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
vaddq_f32(vld1q_f32(a),
vld1q_f32(b))));
#else #else
dest[0] += a[0] + b[0]; dest[0] += a[0] + b[0];
dest[1] += a[1] + b[1]; dest[1] += a[1] + b[1];
@@ -466,6 +475,10 @@ glm_vec4_subadd(vec4 a, vec4 b, vec4 dest) {
glmm_store(dest, _mm_add_ps(glmm_load(dest), glmm_store(dest, _mm_add_ps(glmm_load(dest),
_mm_sub_ps(glmm_load(a), _mm_sub_ps(glmm_load(a),
glmm_load(b)))); glmm_load(b))));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
vsubq_f32(vld1q_f32(a),
vld1q_f32(b))));
#else #else
dest[0] += a[0] - b[0]; dest[0] += a[0] - b[0];
dest[1] += a[1] - b[1]; dest[1] += a[1] - b[1];
@@ -490,6 +503,10 @@ glm_vec4_muladd(vec4 a, vec4 b, vec4 dest) {
glmm_store(dest, _mm_add_ps(glmm_load(dest), glmm_store(dest, _mm_add_ps(glmm_load(dest),
_mm_mul_ps(glmm_load(a), _mm_mul_ps(glmm_load(a),
glmm_load(b)))); glmm_load(b))));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
vmulq_f32(vld1q_f32(a),
vld1q_f32(b))));
#else #else
dest[0] += a[0] * b[0]; dest[0] += a[0] * b[0];
dest[1] += a[1] * b[1]; dest[1] += a[1] * b[1];
@@ -514,6 +531,10 @@ glm_vec4_muladds(vec4 a, float s, vec4 dest) {
glmm_store(dest, _mm_add_ps(glmm_load(dest), glmm_store(dest, _mm_add_ps(glmm_load(dest),
_mm_mul_ps(glmm_load(a), _mm_mul_ps(glmm_load(a),
_mm_set1_ps(s)))); _mm_set1_ps(s))));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
vsubq_f32(vld1q_f32(a),
vdupq_n_f32(s))));
#else #else
dest[0] += a[0] * s; dest[0] += a[0] * s;
dest[1] += a[1] * s; dest[1] += a[1] * s;
@@ -538,6 +559,10 @@ glm_vec4_maxadd(vec4 a, vec4 b, vec4 dest) {
glmm_store(dest, _mm_add_ps(glmm_load(dest), glmm_store(dest, _mm_add_ps(glmm_load(dest),
_mm_max_ps(glmm_load(a), _mm_max_ps(glmm_load(a),
glmm_load(b)))); glmm_load(b))));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
vmaxq_f32(vld1q_f32(a),
vld1q_f32(b))));
#else #else
dest[0] += glm_max(a[0], b[0]); dest[0] += glm_max(a[0], b[0]);
dest[1] += glm_max(a[1], b[1]); dest[1] += glm_max(a[1], b[1]);
@@ -562,6 +587,10 @@ glm_vec4_minadd(vec4 a, vec4 b, vec4 dest) {
glmm_store(dest, _mm_add_ps(glmm_load(dest), glmm_store(dest, _mm_add_ps(glmm_load(dest),
_mm_min_ps(glmm_load(a), _mm_min_ps(glmm_load(a),
glmm_load(b)))); glmm_load(b))));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
vminq_f32(vld1q_f32(a),
vld1q_f32(b))));
#else #else
dest[0] += glm_min(a[0], b[0]); dest[0] += glm_min(a[0], b[0]);
dest[1] += glm_min(a[1], b[1]); dest[1] += glm_min(a[1], b[1]);
@@ -581,6 +610,8 @@ void
glm_vec4_negate_to(vec4 v, vec4 dest) { glm_vec4_negate_to(vec4 v, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_xor_ps(glmm_load(v), _mm_set1_ps(-0.0f))); glmm_store(dest, _mm_xor_ps(glmm_load(v), _mm_set1_ps(-0.0f)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, veorq_s32(vld1q_f32(v), vdupq_n_f32(-0.0f)));
#else #else
dest[0] = -v[0]; dest[0] = -v[0];
dest[1] = -v[1]; dest[1] = -v[1];
@@ -614,7 +645,7 @@ glm_vec4_normalize_to(vec4 v, vec4 dest) {
float dot; float dot;
x0 = glmm_load(v); x0 = glmm_load(v);
xdot = glmm_dot(x0, x0); xdot = glmm_vdot(x0, x0);
dot = _mm_cvtss_f32(xdot); dot = _mm_cvtss_f32(xdot);
if (dot == 0.0f) { if (dot == 0.0f) {
@@ -658,10 +689,25 @@ glm_vec4_normalize(vec4 v) {
CGLM_INLINE CGLM_INLINE
float float
glm_vec4_distance(vec4 a, vec4 b) { glm_vec4_distance(vec4 a, vec4 b) {
#if defined( __SSE__ ) || defined( __SSE2__ )
__m128 x0;
x0 = _mm_sub_ps(glmm_load(b), glmm_load(a));
x0 = _mm_mul_ps(x0, x0);
x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
return _mm_cvtss_f32(_mm_sqrt_ss(_mm_add_ss(x0,
glmm_shuff1(x0, 0, 1, 0, 1))));
#elif defined(CGLM_NEON_FP)
float32x4_t v0;
float32_t r;
v0 = vsubq_f32(vld1q_f32(a), vld1q_f32(b));
r = vaddvq_f32(vmulq_f32(v0, v0));
return sqrtf(r);
#else
return sqrtf(glm_pow2(b[0] - a[0]) return sqrtf(glm_pow2(b[0] - a[0])
+ glm_pow2(b[1] - a[1]) + glm_pow2(b[1] - a[1])
+ glm_pow2(b[2] - a[2]) + glm_pow2(b[2] - a[2])
+ glm_pow2(b[3] - a[3])); + glm_pow2(b[3] - a[3]));
#endif
} }
/*! /*!
@@ -676,6 +722,8 @@ void
glm_vec4_maxv(vec4 a, vec4 b, vec4 dest) { glm_vec4_maxv(vec4 a, vec4 b, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_max_ps(glmm_load(a), glmm_load(b))); glmm_store(dest, _mm_max_ps(glmm_load(a), glmm_load(b)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vmaxq_f32(vld1q_f32(a), vld1q_f32(b)));
#else #else
dest[0] = glm_max(a[0], b[0]); dest[0] = glm_max(a[0], b[0]);
dest[1] = glm_max(a[1], b[1]); dest[1] = glm_max(a[1], b[1]);
@@ -696,6 +744,8 @@ void
glm_vec4_minv(vec4 a, vec4 b, vec4 dest) { glm_vec4_minv(vec4 a, vec4 b, vec4 dest) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(dest, _mm_min_ps(glmm_load(a), glmm_load(b))); glmm_store(dest, _mm_min_ps(glmm_load(a), glmm_load(b)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(dest, vminq_f32(vld1q_f32(a), vld1q_f32(b)));
#else #else
dest[0] = glm_min(a[0], b[0]); dest[0] = glm_min(a[0], b[0]);
dest[1] = glm_min(a[1], b[1]); dest[1] = glm_min(a[1], b[1]);
@@ -717,6 +767,9 @@ glm_vec4_clamp(vec4 v, float minVal, float maxVal) {
#if defined( __SSE__ ) || defined( __SSE2__ ) #if defined( __SSE__ ) || defined( __SSE2__ )
glmm_store(v, _mm_min_ps(_mm_max_ps(glmm_load(v), _mm_set1_ps(minVal)), glmm_store(v, _mm_min_ps(_mm_max_ps(glmm_load(v), _mm_set1_ps(minVal)),
_mm_set1_ps(maxVal))); _mm_set1_ps(maxVal)));
#elif defined(CGLM_NEON_FP)
vst1q_f32(v, vminq_f32(vmaxq_f32(vld1q_f32(v), vdupq_n_f32(minVal)),
vdupq_n_f32(maxVal)));
#else #else
v[0] = glm_clamp(v[0], minVal, maxVal); v[0] = glm_clamp(v[0], minVal, maxVal);
v[1] = glm_clamp(v[1], minVal, maxVal); v[1] = glm_clamp(v[1], minVal, maxVal);
@@ -747,4 +800,23 @@ glm_vec4_lerp(vec4 from, vec4 to, float t, vec4 dest) {
glm_vec4_add(from, v, dest); glm_vec4_add(from, v, dest);
} }
/*!
* @brief helper to fill vec4 as [S^3, S^2, S, 1]
*
* @param[in] s parameter
* @param[out] dest destination
*/
CGLM_INLINE
void
glm_vec4_cubic(float s, vec4 dest) {
float ss;
ss = s * s;
dest[0] = ss * s;
dest[1] = ss;
dest[2] = s;
dest[3] = 1.0f;
}
#endif /* cglm_vec4_h */ #endif /* cglm_vec4_h */

View File

@@ -57,7 +57,9 @@ cglm_HEADERS = include/cglm/version.h \
include/cglm/color.h \ include/cglm/color.h \
include/cglm/project.h \ include/cglm/project.h \
include/cglm/sphere.h \ include/cglm/sphere.h \
include/cglm/ease.h include/cglm/ease.h \
include/cglm/curve.h \
include/cglm/bezier.h
cglm_calldir=$(includedir)/cglm/call cglm_calldir=$(includedir)/cglm/call
cglm_call_HEADERS = include/cglm/call/mat4.h \ cglm_call_HEADERS = include/cglm/call/mat4.h \
@@ -74,10 +76,14 @@ cglm_call_HEADERS = include/cglm/call/mat4.h \
include/cglm/call/box.h \ include/cglm/call/box.h \
include/cglm/call/project.h \ include/cglm/call/project.h \
include/cglm/call/sphere.h \ include/cglm/call/sphere.h \
include/cglm/call/ease.h include/cglm/call/ease.h \
include/cglm/call/curve.h \
include/cglm/call/bezier.h
cglm_simddir=$(includedir)/cglm/simd cglm_simddir=$(includedir)/cglm/simd
cglm_simd_HEADERS = include/cglm/simd/intrin.h cglm_simd_HEADERS = include/cglm/simd/intrin.h \
include/cglm/simd/x86.h \
include/cglm/simd/arm.h
cglm_simd_sse2dir=$(includedir)/cglm/simd/sse2 cglm_simd_sse2dir=$(includedir)/cglm/simd/sse2
cglm_simd_sse2_HEADERS = include/cglm/simd/sse2/affine.h \ cglm_simd_sse2_HEADERS = include/cglm/simd/sse2/affine.h \
@@ -107,7 +113,9 @@ libcglm_la_SOURCES=\
src/box.c \ src/box.c \
src/project.c \ src/project.c \
src/sphere.c \ src/sphere.c \
src/ease.c src/ease.c \
src/curve.c \
src/bezier.c
test_tests_SOURCES=\ test_tests_SOURCES=\
test/src/test_common.c \ test/src/test_common.c \
@@ -121,7 +129,8 @@ test_tests_SOURCES=\
test/src/test_vec4.c \ test/src/test_vec4.c \
test/src/test_vec3.c \ test/src/test_vec3.c \
test/src/test_mat3.c \ test/src/test_mat3.c \
test/src/test_affine.c test/src/test_affine.c \
test/src/test_bezier.c
all-local: all-local:
sh ./post-build.sh sh ./post-build.sh

27
src/bezier.c Normal file
View File

@@ -0,0 +1,27 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#include "../include/cglm/cglm.h"
#include "../include/cglm/call.h"
CGLM_EXPORT
float
glmc_bezier(float s, float p0, float c0, float c1, float p1) {
return glm_bezier(s, p0, c0, c1, p1);
}
CGLM_EXPORT
float
glmc_hermite(float s, float p0, float t0, float t1, float p1) {
return glm_hermite(s, p0, t0, t1, p1);
}
CGLM_EXPORT
float
glmc_decasteljau(float prm, float p0, float c0, float c1, float p1) {
return glm_decasteljau(prm, p0, c0, c1, p1);
}

15
src/curve.c Normal file
View File

@@ -0,0 +1,15 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#include "../include/cglm/cglm.h"
#include "../include/cglm/call.h"
CGLM_EXPORT
float
glmc_smc(float s, mat4 m, vec4 c) {
return glm_smc(s, m, c);
}

View File

@@ -151,3 +151,9 @@ void
glmc_mat4_swap_row(mat4 mat, int row1, int row2) { glmc_mat4_swap_row(mat4 mat, int row1, int row2) {
glm_mat4_swap_row(mat, row1, row2); glm_mat4_swap_row(mat, row1, row2);
} }
CGLM_EXPORT
float
glmc_mat4_rmc(vec4 r, mat4 m, vec4 c) {
return glm_mat4_rmc(r, m, c);
}

View File

@@ -206,6 +206,12 @@ glmc_vec4_lerp(vec4 from, vec4 to, float t, vec4 dest) {
glm_vec4_lerp(from, to, t, dest); glm_vec4_lerp(from, to, t, dest);
} }
CGLM_EXPORT
void
glmc_vec4_cubic(float s, vec4 dest) {
glm_vec4_cubic(s, dest);
}
/* ext */ /* ext */
CGLM_EXPORT CGLM_EXPORT

65
test/src/test_bezier.c Normal file
View File

@@ -0,0 +1,65 @@
/*
* Copyright (c), Recep Aslantas.
*
* MIT License (MIT), http://opensource.org/licenses/MIT
* Full license can be found in the LICENSE file
*/
#include "test_common.h"
CGLM_INLINE
float
test_bezier_plain(float s, float p0, float c0, float c1, float p1) {
float x, xx, xxx, ss, sss;
x = 1.0f - s;
xx = x * x;
xxx = xx * x;
ss = s * s;
sss = ss * s;
return p0 * xxx + 3.0f * (c0 * s * xx + c1 * ss * x) + p1 * sss;
}
CGLM_INLINE
float
test_hermite_plain(float s, float p0, float t0, float t1, float p1) {
float ss, sss;
ss = s * s;
sss = ss * s;
return p0 * (2.0f * sss - 3.0f * ss + 1.0f)
+ t0 * (sss - 2.0f * ss + s)
+ p1 * (-2.0f * sss + 3.0f * ss)
+ t1 * (sss - ss);
}
void
test_bezier(void **state) {
float s, p0, p1, c0, c1, smc, Bs, Bs_plain;
s = test_rand();
p0 = test_rand();
p1 = test_rand();
c0 = test_rand();
c1 = test_rand();
/* test cubic bezier */
smc = glm_smc(s, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1});
Bs = glm_bezier(s, p0, c0, c1, p1);
Bs_plain = test_bezier_plain(s, p0, c0, c1, p1);
assert_true(glm_eq(Bs, Bs_plain));
assert_true(glm_eq(smc, Bs_plain));
assert_true(glm_eq(Bs, smc));
/* test cubic hermite */
smc = glm_smc(s, GLM_HERMITE_MAT, (vec4){p0, p1, c0, c1});
Bs = glm_hermite(s, p0, c0, c1, p1);
Bs_plain = test_hermite_plain(s, p0, c0, c1, p1);
assert_true(glm_eq(Bs, Bs_plain));
assert_true(glm_eq(smc, Bs_plain));
assert_true(glm_eq(Bs, smc));
}

View File

@@ -58,7 +58,7 @@ test_rand_vec4(vec4 dest) {
} }
float float
test_rand_angle(void) { test_rand(void) {
srand((unsigned int)time(NULL)); srand((unsigned int)time(NULL));
return drand48(); return drand48();
} }

View File

@@ -59,7 +59,7 @@ void
test_rand_vec4(vec4 dest) ; test_rand_vec4(vec4 dest) ;
float float
test_rand_angle(void); test_rand(void);
void void
test_rand_quat(versor q); test_rand_quat(versor q);

View File

@@ -38,7 +38,10 @@ main(int argc, const char * argv[]) {
cmocka_unit_test(test_vec3), cmocka_unit_test(test_vec3),
/* affine */ /* affine */
cmocka_unit_test(test_affine) cmocka_unit_test(test_affine),
/* bezier */
cmocka_unit_test(test_bezier)
}; };
return cmocka_run_group_tests(tests, NULL, NULL); return cmocka_run_group_tests(tests, NULL, NULL);

View File

@@ -40,4 +40,7 @@ test_vec3(void **state);
void void
test_affine(void **state); test_affine(void **state);
void
test_bezier(void **state);
#endif /* test_tests_h */ #endif /* test_tests_h */

View File

@@ -20,8 +20,10 @@
</ItemGroup> </ItemGroup>
<ItemGroup> <ItemGroup>
<ClCompile Include="..\src\affine.c" /> <ClCompile Include="..\src\affine.c" />
<ClCompile Include="..\src\bezier.c" />
<ClCompile Include="..\src\box.c" /> <ClCompile Include="..\src\box.c" />
<ClCompile Include="..\src\cam.c" /> <ClCompile Include="..\src\cam.c" />
<ClCompile Include="..\src\curve.c" />
<ClCompile Include="..\src\dllmain.c" /> <ClCompile Include="..\src\dllmain.c" />
<ClCompile Include="..\src\ease.c" /> <ClCompile Include="..\src\ease.c" />
<ClCompile Include="..\src\euler.c" /> <ClCompile Include="..\src\euler.c" />
@@ -39,11 +41,14 @@
<ItemGroup> <ItemGroup>
<ClInclude Include="..\include\cglm\affine-mat.h" /> <ClInclude Include="..\include\cglm\affine-mat.h" />
<ClInclude Include="..\include\cglm\affine.h" /> <ClInclude Include="..\include\cglm\affine.h" />
<ClInclude Include="..\include\cglm\bezier.h" />
<ClInclude Include="..\include\cglm\box.h" /> <ClInclude Include="..\include\cglm\box.h" />
<ClInclude Include="..\include\cglm\call.h" /> <ClInclude Include="..\include\cglm\call.h" />
<ClInclude Include="..\include\cglm\call\affine.h" /> <ClInclude Include="..\include\cglm\call\affine.h" />
<ClInclude Include="..\include\cglm\call\bezier.h" />
<ClInclude Include="..\include\cglm\call\box.h" /> <ClInclude Include="..\include\cglm\call\box.h" />
<ClInclude Include="..\include\cglm\call\cam.h" /> <ClInclude Include="..\include\cglm\call\cam.h" />
<ClInclude Include="..\include\cglm\call\curve.h" />
<ClInclude Include="..\include\cglm\call\ease.h" /> <ClInclude Include="..\include\cglm\call\ease.h" />
<ClInclude Include="..\include\cglm\call\euler.h" /> <ClInclude Include="..\include\cglm\call\euler.h" />
<ClInclude Include="..\include\cglm\call\frustum.h" /> <ClInclude Include="..\include\cglm\call\frustum.h" />
@@ -60,6 +65,7 @@
<ClInclude Include="..\include\cglm\cglm.h" /> <ClInclude Include="..\include\cglm\cglm.h" />
<ClInclude Include="..\include\cglm\color.h" /> <ClInclude Include="..\include\cglm\color.h" />
<ClInclude Include="..\include\cglm\common.h" /> <ClInclude Include="..\include\cglm\common.h" />
<ClInclude Include="..\include\cglm\curve.h" />
<ClInclude Include="..\include\cglm\ease.h" /> <ClInclude Include="..\include\cglm\ease.h" />
<ClInclude Include="..\include\cglm\euler.h" /> <ClInclude Include="..\include\cglm\euler.h" />
<ClInclude Include="..\include\cglm\frustum.h" /> <ClInclude Include="..\include\cglm\frustum.h" />
@@ -69,6 +75,7 @@
<ClInclude Include="..\include\cglm\plane.h" /> <ClInclude Include="..\include\cglm\plane.h" />
<ClInclude Include="..\include\cglm\project.h" /> <ClInclude Include="..\include\cglm\project.h" />
<ClInclude Include="..\include\cglm\quat.h" /> <ClInclude Include="..\include\cglm\quat.h" />
<ClInclude Include="..\include\cglm\simd\arm.h" />
<ClInclude Include="..\include\cglm\simd\avx\affine.h" /> <ClInclude Include="..\include\cglm\simd\avx\affine.h" />
<ClInclude Include="..\include\cglm\simd\avx\mat4.h" /> <ClInclude Include="..\include\cglm\simd\avx\mat4.h" />
<ClInclude Include="..\include\cglm\simd\intrin.h" /> <ClInclude Include="..\include\cglm\simd\intrin.h" />
@@ -77,6 +84,7 @@
<ClInclude Include="..\include\cglm\simd\sse2\mat3.h" /> <ClInclude Include="..\include\cglm\simd\sse2\mat3.h" />
<ClInclude Include="..\include\cglm\simd\sse2\mat4.h" /> <ClInclude Include="..\include\cglm\simd\sse2\mat4.h" />
<ClInclude Include="..\include\cglm\simd\sse2\quat.h" /> <ClInclude Include="..\include\cglm\simd\sse2\quat.h" />
<ClInclude Include="..\include\cglm\simd\x86.h" />
<ClInclude Include="..\include\cglm\sphere.h" /> <ClInclude Include="..\include\cglm\sphere.h" />
<ClInclude Include="..\include\cglm\types.h" /> <ClInclude Include="..\include\cglm\types.h" />
<ClInclude Include="..\include\cglm\util.h" /> <ClInclude Include="..\include\cglm\util.h" />

View File

@@ -84,6 +84,12 @@
<ClCompile Include="..\src\ease.c"> <ClCompile Include="..\src\ease.c">
<Filter>src</Filter> <Filter>src</Filter>
</ClCompile> </ClCompile>
<ClCompile Include="..\src\curve.c">
<Filter>src</Filter>
</ClCompile>
<ClCompile Include="..\src\bezier.c">
<Filter>src</Filter>
</ClCompile>
</ItemGroup> </ItemGroup>
<ItemGroup> <ItemGroup>
<ClInclude Include="..\src\config.h"> <ClInclude Include="..\src\config.h">
@@ -233,5 +239,23 @@
<ClInclude Include="..\include\cglm\ease.h"> <ClInclude Include="..\include\cglm\ease.h">
<Filter>include\cglm</Filter> <Filter>include\cglm</Filter>
</ClInclude> </ClInclude>
<ClInclude Include="..\include\cglm\simd\arm.h">
<Filter>include\cglm\simd</Filter>
</ClInclude>
<ClInclude Include="..\include\cglm\simd\x86.h">
<Filter>include\cglm\simd</Filter>
</ClInclude>
<ClInclude Include="..\include\cglm\call\curve.h">
<Filter>include\cglm\call</Filter>
</ClInclude>
<ClInclude Include="..\include\cglm\curve.h">
<Filter>include\cglm</Filter>
</ClInclude>
<ClInclude Include="..\include\cglm\bezier.h">
<Filter>include\cglm</Filter>
</ClInclude>
<ClInclude Include="..\include\cglm\call\bezier.h">
<Filter>include\cglm\call</Filter>
</ClInclude>
</ItemGroup> </ItemGroup>
</Project> </Project>