Revert "mark readonly parameters as const"

Merge pull request #86 from recp/const
mark readonly parameters as const
2026-02-17 03:39:05 +00:00 · 2019-04-30 08:19:07 +03:00 · 2019-04-29 17:58:51 +03:00 · 2019-04-28 21:55:23 +03:00 · 2019-04-28 21:48:19 +03:00 · 2019-04-28 19:43:58 +03:00
71 changed files with 1472 additions and 292 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -69,3 +69,4 @@ win/cglm_test_*
 win/x64
 win/x85
 win/Debug
 cglm-test-ios*
--- a/.travis.yml
+++ b/.travis.yml
@@ -57,3 +57,6 @@ after_success:
        --gcov-options '\-lp'
        --verbose;
    fi
 after_failure:
  - cat ./test-suite.log
--- a/9
+++ b/9
@@ -52,3 +52,12 @@ https://gamedev.stackexchange.com/questions/28395/rotating-vector3-by-a-quaterni
 9. Sphere AABB intersect
 https://github.com/erich666/GraphicsGems/blob/master/gems/BoxSphere.c
 10. Horizontal add
 https://stackoverflow.com/questions/6996764/fastest-way-to-do-horizontal-float-vector-sum-on-x86
 11. de casteljau implementation and comments
 https://forums.khronos.org/showthread.php/10264-Animations-in-1-4-1-release-notes-revision-A/page2?highlight=bezier
 https://forums.khronos.org/showthread.php/10644-Animation-Bezier-interpolation
 https://forums.khronos.org/showthread.php/10387-2D-Tangents-in-Bezier-Splines?p=34164&viewfull=1#post34164
 https://forums.khronos.org/showthread.php/10651-Animation-TCB-Spline-Interpolation-in-COLLADA?highlight=bezier
--- a/README.md
+++ b/README.md
@@ -25,6 +25,7 @@ you have the latest version
 - **[api rename]** by starting v0.4.5, **glm_simd** functions are renamed to **glmm_**  
 - **[new option]** by starting v0.4.5, you can disable alignment requirement, check options in docs.  
 - **[major change]** by starting v0.5.0, vec3 functions use **glm_vec3_** namespace, it was **glm_vec_** until v0.5.0
 - **[major change]** by starting v0.5.1, built-in alignment is removed from **vec3** and **mat3** types
 #### Note for C++ developers:
 If you don't aware about original GLM library yet, you may also want to look at:
@@ -81,7 +82,11 @@ Currently *cglm* uses default clip space configuration (-1, 1) for camera functi
 - inline or pre-compiled function call
 - frustum (extract view frustum planes, corners...)
 - bounding box  (AABB in Frustum (culling), crop, merge...)
 - bounding sphere
 - project, unproject
 - easing functions
 - curves
 - curve interpolation helpers (S*M*C, deCasteljau...)
 - and other...
 <hr />
--- a/cglm.podspec
+++ b/cglm.podspec
@@ -2,7 +2,7 @@ Pod::Spec.new do |s|
  # Description
  s.name         = "cglm"
-  s.version      = "0.4.6"
+  s.version      = "0.5.1"
  s.summary      = "📽 Optimized OpenGL/Graphics Math (glm) for C"
  s.description  = <<-DESC
 cglm is math library for graphics programming for C. It is similar to original glm but it is written for C instead of C++ (you can use here too). See the documentation or README for all features.
--- a/configure.ac
+++ b/configure.ac
@@ -7,7 +7,7 @@
 #*****************************************************************************
 AC_PREREQ([2.69])
-AC_INIT([cglm], [0.5.0], [info@recp.me])
+AC_INIT([cglm], [0.5.4], [info@recp.me])
 AM_INIT_AUTOMAKE([-Wall -Werror foreign subdir-objects])
 AC_CONFIG_MACRO_DIR([m4])
@@ -29,6 +29,7 @@ LT_INIT
 # Checks for libraries.
 AC_CHECK_LIB([m], [floor])
 m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
 AC_SYS_LARGEFILE
 # Checks for header files.
--- a/docs/source/api.rst
+++ b/docs/source/api.rst
@@ -46,3 +46,5 @@ Follow the :doc:`build` documentation for this
   io
   call
   sphere
   curve
   bezier
--- a/docs/source/bezier.rst
+++ b/docs/source/bezier.rst
@@ -0,0 +1,89 @@
 .. default-domain:: C
 Bezier
 ================================================================================
 Header: cglm/bezier.h
 Common helpers for cubic bezier and similar curves.
 Table of contents (click to go):
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Functions:
 1. :c:func:`glm_bezier`
 2. :c:func:`glm_hermite`
 3. :c:func:`glm_decasteljau`
 Functions documentation
 ~~~~~~~~~~~~~~~~~~~~~~~
 .. c:function:: float glm_bezier(float s, float p0, float c0, float c1, float p1)
    | cubic bezier interpolation
    | formula:
    .. code-block:: text
      B(s) = P0*(1-s)^3 + 3*C0*s*(1-s)^2 + 3*C1*s^2*(1-s) + P1*s^3
    | similar result using matrix:
    .. code-block:: text
      B(s) = glm_smc(t, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1})
    | glm_eq(glm_smc(...), glm_bezier(...)) should return TRUE
    Parameters:
      | *[in]*  **s**   parameter between 0 and 1
      | *[in]*  **p0**  begin point
      | *[in]*  **c0**  control point 1
      | *[in]*  **c1**  control point 2
      | *[in]*  **p1**  end point
    Returns:
        B(s)
 .. c:function:: float glm_hermite(float s, float p0, float t0, float t1, float p1)
    | cubic hermite interpolation
    | formula:
    .. code-block:: text
      H(s) = P0*(2*s^3 - 3*s^2 + 1) + T0*(s^3 - 2*s^2 + s) + P1*(-2*s^3 + 3*s^2) + T1*(s^3 - s^2)
    | similar result using matrix:
    .. code-block:: text
      H(s) = glm_smc(t, GLM_HERMITE_MAT, (vec4){p0, p1, c0, c1})
    | glm_eq(glm_smc(...), glm_hermite(...)) should return TRUE
    Parameters:
      | *[in]*  **s**   parameter between 0 and 1
      | *[in]*  **p0**  begin point
      | *[in]*  **t0**  tangent 1
      | *[in]*  **t1**  tangent 2
      | *[in]*  **p1**  end point
    Returns:
        B(s)
 .. c:function:: float glm_decasteljau(float prm, float p0, float c0, float c1, float p1)
    | iterative way to solve cubic equation
    Parameters:
      | *[in]*  **prm** parameter between 0 and 1
      | *[in]*  **p0**  begin point
      | *[in]*  **c0**  control point 1
      | *[in]*  **c1**  control point 2
      | *[in]*  **p1**  end point
    Returns:
        parameter to use in cubic equation
--- a/docs/source/build.rst
+++ b/docs/source/build.rst
@@ -1,9 +1,7 @@
-Building cglm
+Build cglm
 ================================
-| **cglm** does not have external dependencies except for unit testing.
+| **cglm** does not have external dependencies except for unit testing. When you pulled **cglm** repo with submodules all dependencies will be pulled too. `build-deps.sh` will pull all dependencies/submodules and build for you.
 | When you pulled cglm repo with submodules all dependencies will be pulled too.
 | `build-deps.sh` will pull all dependencies/submodules and build for you.
 External dependencies:
  * cmocka - for unit testing
@@ -12,7 +10,8 @@ External dependencies:
 If you only need to inline versions, you don't need to build **cglm**, you don't need to link it to your program.
 Just import cglm to your project as dependency / external lib by copy-paste then use it as usual
-**Unix (Autotools):**
+Unix (Autotools):
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 .. code-block:: bash
  :linenos:
@@ -26,11 +25,12 @@ Just import cglm to your project as dependency / external lib by copy-paste then
  $ [sudo] make install   # install to system (optional)
 **make** will build cglm to **.libs** sub folder in project folder.
-If you don't want to install cglm to your system's folder you can get static and dynamic libs in this folder.
+If you don't want to install **cglm** to your system's folder you can get static and dynamic libs in this folder.
-**Build dependencies (windows):**
+Windows (MSBuild):
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-Windows related build files, project files are located in win folder,
+Windows related build files, project files are located in `win` folder,
 make sure you are inside in cglm/win folder.
 Code Analysis are enabled, it may take awhile to build.
@@ -50,3 +50,19 @@ then try to build with *devenv*:
  $ devenv cglm.sln /Build Release
 Currently tests are not available on Windows.
 Documentation (Sphinx):
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 **cglm** uses sphinx framework for documentation, it allows lot of formats for documentation. To see all options see sphinx build page:
 https://www.sphinx-doc.org/en/master/man/sphinx-build.html
 Example build:
 .. code-block:: bash
  :linenos:
  $ cd cglm/docs
  $ sphinx-build source build
--- a/docs/source/cam.rst
+++ b/docs/source/cam.rst
@@ -36,6 +36,7 @@ Functions:
 #. :c:func:`glm_ortho_default`
 #. :c:func:`glm_ortho_default_s`
 #. :c:func:`glm_perspective`
 #. :c:func:`glm_persp_move_far`
 #. :c:func:`glm_perspective_default`
 #. :c:func:`glm_perspective_resize`
 #. :c:func:`glm_lookat`
@@ -145,6 +146,16 @@ Functions documentation
      | *[in]*  **farVal**  far clipping planes
      | *[out]* **dest**    result matrix
 .. c:function:: void  glm_persp_move_far(mat4 proj, float deltaFar)
    | extend perspective projection matrix's far distance
    | this function does not guarantee far >= near, be aware of that!
    Parameters:
      | *[in, out]*  **proj**      projection matrix to extend
      | *[in]*       **deltaFar**  distance from existing far (negative to shink)
 .. c:function:: void glm_perspective_default(float aspect, mat4 dest)
     | set up perspective projection matrix with default near/far
--- a/docs/source/conf.py
+++ b/docs/source/conf.py
@@ -62,9 +62,9 @@ author = u'Recep Aslantas'
 # built documents.
 #
 # The short X.Y version.
-version = u'0.5.0'
+version = u'0.5.4'
 # The full version, including alpha/beta/rc tags.
-release = u'0.5.0'
+release = u'0.5.4'
 # The language for content autogenerated by Sphinx. Refer to documentation
 # for a list of supported languages.
@@ -90,7 +90,7 @@ todo_include_todos = False
 # The theme to use for HTML and HTML Help pages.  See the documentation for
 # a list of builtin themes.
 #
-html_theme = 'alabaster'
+html_theme = 'sphinx_rtd_theme'
 # Theme options are theme-specific and customize the look and feel of a theme
 # further.  For a list of options available for each theme, see the
@@ -99,13 +99,13 @@ html_theme = 'alabaster'
 # html_theme_options = {}
 html_theme_options = {
-    'github_banner': 'true',
+    # 'github_banner': 'true',
-    'github_button': 'true',
+    # 'github_button': 'true',
-    'github_user': 'recp',
+    # 'github_user': 'recp',
-    'github_repo': 'cglm',
+    # 'github_repo': 'cglm',
-    'travis_button': 'true',
+    # 'travis_button': 'true',
-    'show_related': 'true',
+    # 'show_related': 'true',
-    'fixed_sidebar': 'true'
+    # 'fixed_sidebar': 'true'
 }
 # Add any paths that contain custom static files (such as style sheets) here,
--- a/docs/source/curve.rst
+++ b/docs/source/curve.rst
@@ -0,0 +1,41 @@
 .. default-domain:: C
 Curve
 ================================================================================
 Header: cglm/curve.h
 Common helpers for common curves. For specific curve see its header/doc
 e.g bezier
 Table of contents (click to go):
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 Functions:
 1. :c:func:`glm_smc`
 Functions documentation
 ~~~~~~~~~~~~~~~~~~~~~~~
 .. c:function:: float  glm_smc(float s, mat4 m, vec4 c)
    | helper function to calculate **S** * **M** * **C** multiplication for curves
    | this function does not encourage you to use SMC, instead it is a helper if you use SMC.
    | if you want to specify S as vector then use more generic glm_mat4_rmc() func.
    | Example usage:
    .. code-block:: c
       Bs = glm_smc(s, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1})
    Parameters:
      | *[in]*  **s**  parameter between 0 and 1 (this will be [s3, s2, s, 1])
      | *[in]*  **m**  basis matrix
      | *[out]* **c**  position/control vector
    Returns:
        scalar value e.g. Bs
--- a/docs/source/features.rst
+++ b/docs/source/features.rst
@@ -0,0 +1,23 @@
 Features
 ================================================================================
 * general purpose matrix operations (mat4, mat3)
 * chain matrix multiplication (square only)
 * general purpose vector operations (cross, dot, rotate, proj, angle...)
 * affine transforms
 * matrix decomposition (extract rotation, scaling factor)
 * optimized affine transform matrices (mul, rigid-body inverse)
 * camera (lookat)
 * projections (ortho, perspective)
 * quaternions
 * euler angles / yaw-pitch-roll to matrix
 * extract euler angles
 * inline or pre-compiled function call
 * frustum (extract view frustum planes, corners...)
 * bounding box (AABB in Frustum (culling), crop, merge...)
 * bounding sphere
 * project, unproject
 * easing functions
 * curves
 * curve interpolation helpers (SMC, deCasteljau...)
 * and other...
--- a/docs/source/getting_started.rst
+++ b/docs/source/getting_started.rst
@@ -9,23 +9,26 @@ Types:
 .. code-block:: c
  :linenos:
-   typedef float vec3[3];
+  typedef float                   vec2[2];
-   typedef int  ivec3[3];
+  typedef float                   vec3[3];
-   typedef CGLM_ALIGN(16) float vec4[4];
+  typedef int                    ivec3[3];
  typedef CGLM_ALIGN_IF(16) float vec4[4];
  typedef vec4                    versor;
  typedef vec3                    mat3[3];
-   typedef vec3 mat3[3];
+  #ifdef __AVX__
-   typedef vec4 mat4[4];
+  typedef CGLM_ALIGN_IF(32) vec4  mat4[4];
-
+  #else
-   typedef vec4 versor;
+  typedef CGLM_ALIGN_IF(16) vec4  mat4[4];
  #endif
 As you can see types don't store extra informations in favor of space.
 You can send these values e.g. matrix to OpenGL directly without casting or calling a function like *value_ptr*
-Alignment is Required:
+Alignment Is Required:
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-**vec4** and **mat4** requires 16 byte alignment because vec4 and mat4 operations are
+**vec4** and **mat4** requires 16 (32 for **mat4** if AVX is enabled) byte alignment because **vec4** and **mat4** operations are vectorized by SIMD instructions (SSE/AVX/NEON).
 vectorized by SIMD instructions (SSE/AVX).
 **UPDATE:**
  By starting v0.4.5 cglm provides an option to disable alignment requirement, it is enabled as default
@@ -37,10 +40,9 @@ vectorized by SIMD instructions (SSE/AVX).
 Allocations:
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 *cglm* doesn't alloc any memory on heap. So it doesn't provide any allocator.
-You must allocate memory yourself. You should alloc memory for out parameters too if you pass pointer of memory location.
+You must allocate memory yourself. You should alloc memory for out parameters too if you pass pointer of memory location. When allocating memory, don't forget that **vec4** and **mat4** require alignment.
 When allocating memory don't forget that **vec4** and **mat4** requires alignment.
-**NOTE:** Unaligned vec4 and unaligned mat4 operations will be supported in the future. Check todo list.
+**NOTE:** Unaligned **vec4** and unaligned **mat4** operations will be supported in the future. Check todo list.
 Because you may want to multiply a CGLM matrix with external matrix.
 There is no guarantee that non-CGLM matrix is aligned. Unaligned types will have *u* prefix e.g. **umat4**
--- a/docs/source/index.rst
+++ b/docs/source/index.rst
@@ -3,7 +3,7 @@
   You can adapt this file completely to your liking, but it should at least
   contain the root `toctree` directive.
-Welcome to cglm's documentation!
+cglm Documentation
 ================================
 **cglm** is optimized 3D math library written in C99 (compatible with C89).
@@ -14,33 +14,36 @@ is considered to be supported as optional.
 Also currently only **float** type is supported for most operations.
 **Features**
 * general purpose matrix operations (mat4, mat3)
 * chain matrix multiplication (square only)
 * general purpose vector operations (cross, dot, rotate, proj, angle...)
 * affine transforms
 * matrix decomposition (extract rotation, scaling factor)
 * optimized affine transform matrices (mul, rigid-body inverse)
 * camera (lookat)
 * projections (ortho, perspective)
 * quaternions
 * euler angles / yaw-pitch-roll to matrix
 * extract euler angles
 * inline or pre-compiled function call
 * frustum (extract view frustum planes, corners...)
 * bounding box (AABB in Frustum (culling), crop, merge...)
 .. toctree::
-   :maxdepth: 1
+   :maxdepth: 2
-   :caption: Table Of Contents:
+   :caption: Getting Started:
   features
   build
   getting_started
 .. toctree::
   :maxdepth: 2
   :caption: How To:
   opengl
 .. toctree::
   :maxdepth: 2
   :caption: API:
   api
 .. toctree::
   :maxdepth: 2
   :caption: Options:
   opt
 .. toctree::
   :maxdepth: 2
   :caption: Troubleshooting:
   troubleshooting
 Indices and tables
--- a/docs/source/mat3.rst
+++ b/docs/source/mat3.rst
@@ -21,6 +21,7 @@ Functions:
 1. :c:func:`glm_mat3_copy`
 #. :c:func:`glm_mat3_identity`
 #. :c:func:`glm_mat3_identity_array`
 #. :c:func:`glm_mat3_zero`
 #. :c:func:`glm_mat3_mul`
 #. :c:func:`glm_mat3_transpose_to`
 #. :c:func:`glm_mat3_transpose`
@@ -29,8 +30,10 @@ Functions:
 #. :c:func:`glm_mat3_scale`
 #. :c:func:`glm_mat3_det`
 #. :c:func:`glm_mat3_inv`
 #. :c:func:`glm_mat3_trace`
 #. :c:func:`glm_mat3_swap_col`
 #. :c:func:`glm_mat3_swap_row`
 #. :c:func:`glm_mat3_rmc`
 Functions documentation
 ~~~~~~~~~~~~~~~~~~~~~~~
@@ -58,6 +61,13 @@ Functions documentation
      | *[in,out]* **mat**  matrix array (must be aligned (16/32) if alignment is not disabled)
      | *[in]* **count**  count of matrices
 .. c:function:: void  glm_mat3_zero(mat3 mat)
    make given matrix zero
    Parameters:
      | *[in,out]* **mat**  matrix to
 .. c:function:: void  glm_mat3_mul(mat3 m1, mat3 m2, mat3 dest)
    multiply m1 and m2 to dest
@@ -133,6 +143,16 @@ Functions documentation
      | *[in]*  **mat**  matrix
      | *[out]* **dest** destination (inverse matrix)
 .. c:function:: void glm_mat3_trace(mat3 m)
    | sum of the elements on the main diagonal from upper left to the lower right
    Parameters:
      | *[in]*  **m**  matrix
    Returns:
        trace of matrix
 .. c:function:: void  glm_mat3_swap_col(mat3 mat, int col1, int col2)
    swap two matrix columns
@@ -150,3 +170,20 @@ Functions documentation
      | *[in, out]*  **mat**   matrix
      | *[in]*       **row1**  row1
      | *[in]*       **row2**  row2
 .. c:function:: float  glm_mat3_rmc(vec3 r, mat3 m, vec3 c)
    | **rmc** stands for **Row** * **Matrix** * **Column**
    | helper for  R (row vector) * M (matrix) * C (column vector)
    | the result is scalar because R * M = Matrix1x3 (row vector),
    | then Matrix1x3 * Vec3 (column vector) = Matrix1x1 (Scalar)
    Parameters:
      | *[in]*  **r**  row vector or matrix1x3
      | *[in]*  **m**  matrix3x3
      | *[in]*  **c**  column vector or matrix3x1
    Returns:
        scalar value e.g. Matrix1x1
--- a/docs/source/mat4.rst
+++ b/docs/source/mat4.rst
@@ -26,6 +26,7 @@ Functions:
 #. :c:func:`glm_mat4_copy`
 #. :c:func:`glm_mat4_identity`
 #. :c:func:`glm_mat4_identity_array`
 #. :c:func:`glm_mat4_zero`
 #. :c:func:`glm_mat4_pick3`
 #. :c:func:`glm_mat4_pick3t`
 #. :c:func:`glm_mat4_ins3`
@@ -33,6 +34,8 @@ Functions:
 #. :c:func:`glm_mat4_mulN`
 #. :c:func:`glm_mat4_mulv`
 #. :c:func:`glm_mat4_mulv3`
 #. :c:func:`glm_mat3_trace`
 #. :c:func:`glm_mat3_trace3`
 #. :c:func:`glm_mat4_quat`
 #. :c:func:`glm_mat4_transpose_to`
 #. :c:func:`glm_mat4_transpose`
@@ -43,6 +46,7 @@ Functions:
 #. :c:func:`glm_mat4_inv_fast`
 #. :c:func:`glm_mat4_swap_col`
 #. :c:func:`glm_mat4_swap_row`
 #. :c:func:`glm_mat4_rmc`
 Functions documentation
 ~~~~~~~~~~~~~~~~~~~~~~~
@@ -78,6 +82,13 @@ Functions documentation
      | *[in,out]* **mat**  matrix array (must be aligned (16/32) if alignment is not disabled)
      | *[in]* **count**  count of matrices
 .. c:function:: void  glm_mat4_zero(mat4 mat)
    make given matrix zero
    Parameters:
      | *[in,out]* **mat**  matrix to
 .. c:function:: void  glm_mat4_pick3(mat4 mat, mat3 dest)
    copy upper-left of mat4 to mat3
@@ -156,6 +167,27 @@ Functions documentation
    | *[in]*  **v**     vec3 (right, column vector)
    | *[out]* **dest**  vec3 (result, column vector)
 .. c:function:: void  glm_mat4_trace(mat4 m)
    | sum of the elements on the main diagonal from upper left to the lower right
    Parameters:
      | *[in]*  **m**  matrix
    Returns:
        trace of matrix
 .. c:function:: void  glm_mat4_trace3(mat4 m)
    | trace of matrix (rotation part)
    | sum of the elements on the main diagonal from upper left to the lower right
    Parameters:
      | *[in]*  **m**  matrix
    Returns:
        trace of matrix
 .. c:function:: void  glm_mat4_quat(mat4 m, versor dest)
    convert mat4's rotation part to quaternion
@@ -247,3 +279,20 @@ Functions documentation
      | *[in, out]*  **mat**   matrix
      | *[in]*       **row1**  row1
      | *[in]*       **row2**  row2
 .. c:function:: float  glm_mat4_rmc(vec4 r, mat4 m, vec4 c)
    | **rmc** stands for **Row** * **Matrix** * **Column**
    | helper for  R (row vector) * M (matrix) * C (column vector)
    | the result is scalar because R * M = Matrix1x4 (row vector),
    | then Matrix1x4 * Vec4 (column vector) = Matrix1x1 (Scalar)
    Parameters:
      | *[in]*  **r**  row vector or matrix1x4
      | *[in]*  **m**  matrix4x4
      | *[in]*  **c**  column vector or matrix4x1
    Returns:
        scalar value e.g. Matrix1x1
--- a/docs/source/opengl.rst
+++ b/docs/source/opengl.rst
@@ -43,9 +43,9 @@ array of matrices:
   /* ... */
   glUniformMatrix4fv(location, count, GL_FALSE, (float *)matrix);
-in this way, passing aray of matrices is same 
+in this way, passing aray of matrices is same
-Passing / Uniforming Vectors to OpenGL:¶
+Passing / Uniforming Vectors to OpenGL:
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 You don't need to do extra thing when passing cglm vectors to OpengL or other APIs.
--- a/docs/source/opt.rst
+++ b/docs/source/opt.rst
@@ -40,3 +40,13 @@ SSE and SSE2 Shuffle Option
 **_mm_shuffle_ps** generates **shufps** instruction even if registers are same.
 You can force it to generate **pshufd** instruction by defining
 **CGLM_USE_INT_DOMAIN** macro. As default it is not defined.
 SSE3 and SSE4 Dot Product Options
 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
 You have to extra options for dot product: **CGLM_SSE4_DOT** and **CGLM_SSE3_DOT**.
 - If **SSE4** is enabled then you can define **CGLM_SSE4_DOT** to force cglm to use **_mm_dp_ps** instruction.
 - If **SSE3** is enabled then you can define **CGLM_SSE3_DOT** to force cglm to use **_mm_hadd_ps** instructions.
 otherwise cglm will use custom cglm's hadd functions which are optimized too.
--- a/docs/source/vec3.rst
+++ b/docs/source/vec3.rst
@@ -39,7 +39,6 @@ Functions:
 #. :c:func:`glm_vec3_zero`
 #. :c:func:`glm_vec3_one`
 #. :c:func:`glm_vec3_dot`
 #. :c:func:`glm_vec3_cross`
 #. :c:func:`glm_vec3_norm2`
 #. :c:func:`glm_vec3_norm`
 #. :c:func:`glm_vec3_add`
@@ -65,6 +64,8 @@ Functions:
 #. :c:func:`glm_vec3_negate_to`
 #. :c:func:`glm_vec3_normalize`
 #. :c:func:`glm_vec3_normalize_to`
 #. :c:func:`glm_vec3_cross`
 #. :c:func:`glm_vec3_crossn`
 #. :c:func:`glm_vec3_distance2`
 #. :c:func:`glm_vec3_distance`
 #. :c:func:`glm_vec3_angle`
@@ -125,12 +126,21 @@ Functions documentation
 .. c:function:: void  glm_vec3_cross(vec3 a, vec3 b, vec3 d)
-    cross product
+    cross product of two vector (RH)
    Parameters:
-      | *[in]*  **a**  source 1
+      | *[in]*  **a**     vector 1
-      | *[in]*  **b**  source 2
+      | *[in]*  **b**     vector 2
-      | *[out]* **d**  destination
+      | *[out]* **dest**  destination
 .. c:function:: void  glm_vec3_crossn(vec3 a, vec3 b, vec3 dest)
    cross product of two vector (RH) and normalize the result
    Parameters:
      | *[in]*  **a**     vector 1
      | *[in]*  **b**     vector 2
      | *[out]* **dest**  destination
 .. c:function:: float  glm_vec3_norm2(vec3 v)
--- a/docs/source/vec4.rst
+++ b/docs/source/vec4.rst
@@ -58,11 +58,7 @@ Functions:
 #. :c:func:`glm_vec4_minv`
 #. :c:func:`glm_vec4_clamp`
 #. :c:func:`glm_vec4_lerp`
-#. :c:func:`glm_vec4_isnan`
+#. :c:func:`glm_vec4_cubic`
 #. :c:func:`glm_vec4_isinf`
 #. :c:func:`glm_vec4_isvalid`
 #. :c:func:`glm_vec4_sign`
 #. :c:func:`glm_vec4_sqrt`
 Functions documentation
 ~~~~~~~~~~~~~~~~~~~~~~~
@@ -401,3 +397,11 @@ Functions documentation
      | *[in]*  **to**     to value
      | *[in]*  **t**      interpolant (amount) clamped between 0 and 1
      | *[out]* **dest**   destination
 .. c:function:: void  glm_vec4_cubic(float s, vec4 dest)
    helper to fill vec4 as [S^3, S^2, S, 1]
    Parameters:
      | *[in]*  **s**      parameter
      | *[out]* **dest**   destination
--- a/include/cglm/affine-mat.h
+++ b/include/cglm/affine-mat.h
@@ -152,7 +152,7 @@ glm_inv_tr(mat4 mat) {
  glm_inv_tr_sse2(mat);
 #else
  CGLM_ALIGN_MAT mat3 r;
-  CGLM_ALIGN(16) vec3 t;
+  CGLM_ALIGN(8)  vec3 t;
  /* rotate */
  glm_mat4_pick3t(mat, r);
--- a/include/cglm/bezier.h
+++ b/include/cglm/bezier.h
@@ -0,0 +1,154 @@
 /*
 * Copyright (c), Recep Aslantas.
 *
 * MIT License (MIT), http://opensource.org/licenses/MIT
 * Full license can be found in the LICENSE file
 */
 #ifndef cglm_bezier_h
 #define cglm_bezier_h
 #include "common.h"
 #define GLM_BEZIER_MAT_INIT  {{-1.0f,  3.0f, -3.0f,  1.0f},                   \
                              { 3.0f, -6.0f,  3.0f,  0.0f},                   \
                              {-3.0f,  3.0f,  0.0f,  0.0f},                   \
                              { 1.0f,  0.0f,  0.0f,  0.0f}}
 #define GLM_HERMITE_MAT_INIT {{ 2.0f, -3.0f,  0.0f,  1.0f},                   \
                              {-2.0f,  3.0f,  0.0f,  0.0f},                   \
                              { 1.0f, -2.0f,  1.0f,  0.0f},                   \
                              { 1.0f, -1.0f,  0.0f,  0.0f}}
 /* for C only */
 #define GLM_BEZIER_MAT  ((mat4)GLM_BEZIER_MAT_INIT)
 #define GLM_HERMITE_MAT ((mat4)GLM_HERMITE_MAT_INIT)
 #define CGLM_DECASTEL_EPS   1e-9
 #define CGLM_DECASTEL_MAX   1000
 #define CGLM_DECASTEL_SMALL 1e-20
 /*!
 * @brief cubic bezier interpolation
 *
 * Formula:
 *  B(s) = P0*(1-s)^3 + 3*C0*s*(1-s)^2 + 3*C1*s^2*(1-s) + P1*s^3
 *
 * similar result using matrix:
 *  B(s) = glm_smc(t, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1})
 *
 * glm_eq(glm_smc(...), glm_bezier(...)) should return TRUE
 *
 * @param[in]  s    parameter between 0 and 1
 * @param[in]  p0   begin point
 * @param[in]  c0   control point 1
 * @param[in]  c1   control point 2
 * @param[in]  p1   end point
 *
 * @return B(s)
 */
 CGLM_INLINE
 float
 glm_bezier(float s, float p0, float c0, float c1, float p1) {
  float x, xx, ss, xs3, a;
  x   = 1.0f - s;
  xx  = x * x;
  ss  = s * s;
  xs3 = (s - ss) * 3.0f;
  a   = p0 * xx + c0 * xs3;
  return a + s * (c1 * xs3 + p1 * ss - a);
 }
 /*!
 * @brief cubic hermite interpolation
 *
 * Formula:
 *  H(s) = P0*(2*s^3 - 3*s^2 + 1) + T0*(s^3 - 2*s^2 + s)
 *            + P1*(-2*s^3 + 3*s^2) + T1*(s^3 - s^2)
 *
 * similar result using matrix:
 *  H(s) = glm_smc(t, GLM_HERMITE_MAT, (vec4){p0, p1, c0, c1})
 *
 * glm_eq(glm_smc(...), glm_hermite(...)) should return TRUE
 *
 * @param[in]  s    parameter between 0 and 1
 * @param[in]  p0   begin point
 * @param[in]  t0   tangent 1
 * @param[in]  t1   tangent 2
 * @param[in]  p1   end point
 *
 * @return H(s)
 */
 CGLM_INLINE
 float
 glm_hermite(float s, float p0, float t0, float t1, float p1) {
  float ss, d, a, b, c, e, f;
  ss = s  * s;
  a  = ss + ss;
  c  = a  + ss;
  b  = a  * s;
  d  = s  * ss;
  f  = d  - ss;
  e  = b  - c;
  return p0 * (e + 1.0f) + t0 * (f - ss + s) + t1 * f - p1 * e;
 }
 /*!
 * @brief iterative way to solve cubic equation
 *
 * @param[in]  prm  parameter between 0 and 1
 * @param[in]  p0   begin point
 * @param[in]  c0   control point 1
 * @param[in]  c1   control point 2
 * @param[in]  p1   end point
 *
 * @return parameter to use in cubic equation
 */
 CGLM_INLINE
 float
 glm_decasteljau(float prm, float p0, float c0, float c1, float p1) {
  float u, v, a, b, c, d, e, f;
  int   i;
  if (prm - p0 < CGLM_DECASTEL_SMALL)
    return 0.0f;
  if (p1 - prm < CGLM_DECASTEL_SMALL)
    return 1.0f;
  u  = 0.0f;
  v  = 1.0f;
  for (i = 0; i < CGLM_DECASTEL_MAX; i++) {
    /* de Casteljau Subdivision */
    a  = (p0 + c0) * 0.5f;
    b  = (c0 + c1) * 0.5f;
    c  = (c1 + p1) * 0.5f;
    d  = (a  + b)  * 0.5f;
    e  = (b  + c)  * 0.5f;
    f  = (d  + e)  * 0.5f; /* this one is on the curve! */
    /* The curve point is close enough to our wanted t */
    if (fabsf(f - prm) < CGLM_DECASTEL_EPS)
      return glm_clamp_zo((u  + v) * 0.5f);
    /* dichotomy */
    if (f < prm) {
      p0 = f;
      c0 = e;
      c1 = c;
      u  = (u  + v) * 0.5f;
    } else {
      c0 = a;
      c1 = d;
      p1 = f;
      v  = (u  + v) * 0.5f;
    }
  }
  return glm_clamp_zo((u  + v) * 0.5f);
 }
 #endif /* cglm_bezier_h */
--- a/include/cglm/call.h
+++ b/include/cglm/call.h
@@ -27,6 +27,8 @@ extern "C" {
 #include "call/project.h"
 #include "call/sphere.h"
 #include "call/ease.h"
 #include "call/curve.h"
 #include "call/bezier.h"
 #ifdef __cplusplus
 }
--- a/include/cglm/call/bezier.h
+++ b/include/cglm/call/bezier.h
@@ -0,0 +1,31 @@
 /*
 * Copyright (c), Recep Aslantas.
 *
 * MIT License (MIT), http://opensource.org/licenses/MIT
 * Full license can be found in the LICENSE file
 */
 #ifndef cglmc_bezier_h
 #define cglmc_bezier_h
 #ifdef __cplusplus
 extern "C" {
 #endif
 #include "../cglm.h"
 CGLM_EXPORT
 float
 glmc_bezier(float s, float p0, float c0, float c1, float p1);
 CGLM_EXPORT
 float
 glmc_hermite(float s, float p0, float t0, float t1, float p1);
 CGLM_EXPORT
 float
 glmc_decasteljau(float prm, float p0, float c0, float c1, float p1);
 #ifdef __cplusplus
 }
 #endif
 #endif /* cglmc_bezier_h */
--- a/include/cglm/call/cam.h
+++ b/include/cglm/call/cam.h
@@ -61,6 +61,10 @@ glmc_perspective(float fovy,
                 float farVal,
                 mat4 dest);
 CGLM_EXPORT
 void
 glmc_persp_move_far(mat4 proj, float deltaFar);
 CGLM_EXPORT
 void
 glmc_perspective_default(float aspect, mat4 dest);
--- a/include/cglm/call/curve.h
+++ b/include/cglm/call/curve.h
@@ -0,0 +1,23 @@
 /*
 * Copyright (c), Recep Aslantas.
 *
 * MIT License (MIT), http://opensource.org/licenses/MIT
 * Full license can be found in the LICENSE file
 */
 #ifndef cglmc_curve_h
 #define cglmc_curve_h
 #ifdef __cplusplus
 extern "C" {
 #endif
 #include "../cglm.h"
 CGLM_EXPORT
 float
 glmc_smc(float s, mat4 m, vec4 c);
 #ifdef __cplusplus
 }
 #endif
 #endif /* cglmc_curve_h */
--- a/include/cglm/call/ease.h
+++ b/include/cglm/call/ease.h
@@ -137,4 +137,7 @@ CGLM_EXPORT
 float
 glmc_ease_bounce_inout(float t);
 #ifdef __cplusplus
 }
 #endif
 #endif /* cglmc_ease_h */
--- a/include/cglm/call/mat3.h
+++ b/include/cglm/call/mat3.h
@@ -44,6 +44,10 @@ CGLM_EXPORT
 void
 glmc_mat3_mulv(mat3 m, vec3 v, vec3 dest);
 CGLM_EXPORT
 float
 glmc_mat3_trace(mat3 m);
 CGLM_EXPORT
 void
 glmc_mat3_quat(mat3 m, versor dest);
@@ -68,6 +72,10 @@ CGLM_EXPORT
 void
 glmc_mat3_swap_row(mat3 mat, int row1, int row2);
 CGLM_EXPORT
 float
 glmc_mat3_rmc(vec3 r, mat3 m, vec3 c);
 #ifdef __cplusplus
 }
 #endif
--- a/include/cglm/call/mat4.h
+++ b/include/cglm/call/mat4.h
@@ -61,6 +61,14 @@ CGLM_EXPORT
 void
 glmc_mat4_mulv3(mat4 m, vec3 v, float last, vec3 dest);
 CGLM_EXPORT
 float
 glmc_mat4_trace(mat4 m);
 CGLM_EXPORT
 float
 glmc_mat4_trace3(mat4 m);
 CGLM_EXPORT
 void
 glmc_mat4_quat(mat4 m, versor dest);
@@ -105,6 +113,10 @@ CGLM_EXPORT
 void
 glmc_mat4_swap_row(mat4 mat, int row1, int row2);
 CGLM_EXPORT
 float
 glmc_mat4_rmc(vec4 r, mat4 m, vec4 c);
 #ifdef __cplusplus
 }
 #endif
--- a/include/cglm/call/sphere.h
+++ b/include/cglm/call/sphere.h
@@ -33,4 +33,7 @@ CGLM_EXPORT
 bool
 glmc_sphere_point(vec4 s, vec3 point);
 #ifdef __cplusplus
 }
 #endif
 #endif /* cglmc_sphere_h */
--- a/include/cglm/call/vec3.h
+++ b/include/cglm/call/vec3.h
@@ -42,7 +42,11 @@ glmc_vec3_dot(vec3 a, vec3 b);
 CGLM_EXPORT
 void
-glmc_vec3_cross(vec3 a, vec3 b, vec3 d);
+glmc_vec3_cross(vec3 a, vec3 b, vec3 dest);
 CGLM_EXPORT
 void
 glmc_vec3_crossn(vec3 a, vec3 b, vec3 dest);
 CGLM_EXPORT
 float
--- a/include/cglm/call/vec4.h
+++ b/include/cglm/call/vec4.h
@@ -153,6 +153,10 @@ CGLM_EXPORT
 void
 glmc_vec4_lerp(vec4 from, vec4 to, float t, vec4 dest);
 CGLM_EXPORT
 void
 glmc_vec4_cubic(float s, vec4 dest);
 /* ext */
 CGLM_EXPORT
--- a/include/cglm/cam.h
+++ b/include/cglm/cam.h
@@ -84,7 +84,7 @@ glm_frustum(float left,
            mat4  dest) {
  float rl, tb, fn, nv;
-  glm__memzero(float, dest, sizeof(mat4));
+  glm_mat4_zero(dest);
  rl = 1.0f / (right  - left);
  tb = 1.0f / (top    - bottom);
@@ -122,7 +122,7 @@ glm_ortho(float left,
          mat4  dest) {
  float rl, tb, fn;
-  glm__memzero(float, dest, sizeof(mat4));
+  glm_mat4_zero(dest);
  rl = 1.0f / (right  - left);
  tb = 1.0f / (top    - bottom);
@@ -259,7 +259,7 @@ glm_perspective(float fovy,
                mat4  dest) {
  float f, fn;
-  glm__memzero(float, dest, sizeof(mat4));
+  glm_mat4_zero(dest);
  f  = 1.0f / tanf(fovy * 0.5f);
  fn = 1.0f / (nearVal - farVal);
@@ -271,6 +271,30 @@ glm_perspective(float fovy,
  dest[3][2] = 2.0f * nearVal * farVal * fn;
 }
 /*!
 * @brief extend perspective projection matrix's far distance
 *
 * this function does not guarantee far >= near, be aware of that!
 *
 * @param[in, out] proj      projection matrix to extend
 * @param[in]      deltaFar  distance from existing far (negative to shink)
 */
 CGLM_INLINE
 void
 glm_persp_move_far(mat4 proj, float deltaFar) {
  float fn, farVal, nearVal, p22, p32;
  p22        = proj[2][2];
  p32        = proj[3][2];
  nearVal    = p32 / (p22 - 1.0f);
  farVal     = p32 / (p22 + 1.0f) + deltaFar;
  fn         = 1.0f / (nearVal - farVal);
  proj[2][2] = (nearVal + farVal) * fn;
  proj[3][2] = 2.0f * nearVal * farVal * fn;
 }
 /*!
 * @brief set up perspective projection matrix with default near/far
 *        and angle values
@@ -323,9 +347,7 @@ glm_lookat(vec3 eye,
  glm_vec3_sub(center, eye, f);
  glm_vec3_normalize(f);
-  glm_vec3_cross(f, up, s);
+  glm_vec3_crossn(f, up, s);
  glm_vec3_normalize(s);
  glm_vec3_cross(s, f, u);
  dest[0][0] = s[0];
--- a/include/cglm/cglm.h
+++ b/include/cglm/cglm.h
@@ -26,5 +26,7 @@
 #include "project.h"
 #include "sphere.h"
 #include "ease.h"
 #include "curve.h"
 #include "bezier.h"
 #endif /* cglm_h */
--- a/include/cglm/common.h
+++ b/include/cglm/common.h
@@ -11,8 +11,10 @@
 #define _USE_MATH_DEFINES /* for windows */
 #include <stdint.h>
 #include <stddef.h>
 #include <math.h>
 #include <float.h>
 #include <stdbool.h>
 #if defined(_MSC_VER)
 #  ifdef CGLM_DLL
@@ -26,34 +28,6 @@
 #  define CGLM_INLINE static inline __attribute((always_inline))
 #endif
 #define glm__memcpy(type, dest, src, size)                                    \
  do {                                                                        \
    type *srci;                                                               \
    type *srci_end;                                                           \
    type *desti;                                                              \
                                                                              \
    srci     = (type *)src;                                                   \
    srci_end = (type *)((char *)srci + size);                                 \
    desti    = (type *)dest;                                                  \
                                                                              \
    while (srci != srci_end)                                                  \
      *desti++ = *srci++;                                                     \
  } while (0)
 #define glm__memset(type, dest, size, val)                                    \
  do {                                                                        \
    type *desti;                                                              \
    type *desti_end;                                                          \
                                                                              \
    desti     = (type *)dest;                                                 \
    desti_end = (type *)((char *)desti + size);                               \
                                                                              \
    while (desti != desti_end)                                                \
      *desti++ = val;                                                         \
  } while (0)
 #define glm__memzero(type, dest, size) glm__memset(type, dest, size, 0)
 #include "types.h"
 #include "simd/intrin.h"
--- a/include/cglm/curve.h
+++ b/include/cglm/curve.h
@@ -0,0 +1,40 @@
 /*
 * Copyright (c), Recep Aslantas.
 *
 * MIT License (MIT), http://opensource.org/licenses/MIT
 * Full license can be found in the LICENSE file
 */
 #ifndef cglm_curve_h
 #define cglm_curve_h
 #include "common.h"
 #include "vec4.h"
 #include "mat4.h"
 /*!
 * @brief helper function to calculate S*M*C multiplication for curves
 *
 * This function does not encourage you to use SMC,
 * instead it is a helper if you use SMC.
 *
 * if you want to specify S as vector then use more generic glm_mat4_rmc() func.
 *
 * Example usage:
 *  B(s) = glm_smc(s, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1})
 *
 * @param[in]  s  parameter between 0 and 1 (this will be [s3, s2, s, 1])
 * @param[in]  m  basis matrix
 * @param[in]  c  position/control vector
 *
 * @return B(s)
 */
 CGLM_INLINE
 float
 glm_smc(float s, mat4 m, vec4 c) {
  vec4 vs;
  glm_vec4_cubic(s, vs);
  return glm_mat4_rmc(vs, m, c);
 }
 #endif /* cglm_curve_h */
--- a/include/cglm/mat3.h
+++ b/include/cglm/mat3.h
@@ -17,15 +17,19 @@
   CGLM_INLINE void  glm_mat3_copy(mat3 mat, mat3 dest);
   CGLM_INLINE void  glm_mat3_identity(mat3 mat);
   CGLM_INLINE void  glm_mat3_identity_array(mat3 * restrict mat, size_t count);
   CGLM_INLINE void  glm_mat3_zero(mat3 mat);
   CGLM_INLINE void  glm_mat3_mul(mat3 m1, mat3 m2, mat3 dest);
   CGLM_INLINE void  glm_mat3_transpose_to(mat3 m, mat3 dest);
   CGLM_INLINE void  glm_mat3_transpose(mat3 m);
   CGLM_INLINE void  glm_mat3_mulv(mat3 m, vec3 v, vec3 dest);
   CGLM_INLINE float glm_mat3_trace(mat3 m);
   CGLM_INLINE void  glm_mat3_quat(mat3 m, versor dest);
   CGLM_INLINE void  glm_mat3_scale(mat3 m, float s);
   CGLM_INLINE float glm_mat3_det(mat3 mat);
   CGLM_INLINE void  glm_mat3_inv(mat3 mat, mat3 dest);
   CGLM_INLINE void  glm_mat3_swap_col(mat3 mat, int col1, int col2);
   CGLM_INLINE void  glm_mat3_swap_row(mat3 mat, int row1, int row2);
   CGLM_INLINE float glm_mat3_rmc(vec3 r, mat3 m, vec3 c);
 */
 #ifndef cglm_mat3_h
@@ -62,7 +66,17 @@
 CGLM_INLINE
 void
 glm_mat3_copy(mat3 mat, mat3 dest) {
-  glm__memcpy(float, dest, mat, sizeof(mat3));
+  dest[0][0] = mat[0][0];
  dest[0][1] = mat[0][1];
  dest[0][2] = mat[0][2];
  dest[1][0] = mat[1][0];
  dest[1][1] = mat[1][1];
  dest[1][2] = mat[1][2];
  dest[2][0] = mat[2][0];
  dest[2][1] = mat[2][1];
  dest[2][2] = mat[2][2];
 }
 /*!
@@ -105,6 +119,18 @@ glm_mat3_identity_array(mat3 * __restrict mat, size_t count) {
  }
 }
 /*!
 * @brief make given matrix zero.
 *
 * @param[in, out]  mat  matrix
 */
 CGLM_INLINE
 void
 glm_mat3_zero(mat3 mat) {
  CGLM_ALIGN_MAT mat3 t = GLM_MAT3_ZERO_INIT;
  glm_mat3_copy(t, mat);
 }
 /*!
 * @brief multiply m1 and m2 to dest
 *
@@ -207,6 +233,18 @@ glm_mat3_mulv(mat3 m, vec3 v, vec3 dest) {
  dest[2] = m[0][2] * v[0] + m[1][2] * v[1] + m[2][2] * v[2];
 }
 /*!
 * @brief trace of matrix
 *
 * sum of the elements on the main diagonal from upper left to the lower right
 *
 * @param[in]  m matrix
 */
 CGLM_INLINE
 float
 glm_mat3_trace(mat3 m) {
  return m[0][0] + m[1][1] + m[2][2];
 }
 /*!
 * @brief convert mat3 to quaternion
@@ -359,4 +397,26 @@ glm_mat3_swap_row(mat3 mat, int row1, int row2) {
  mat[2][row2] = tmp[2];
 }
 /*!
 * @brief helper for  R (row vector) * M (matrix) * C (column vector)
 *
 * rmc stands for Row * Matrix * Column
 *
 * the result is scalar because R * M = Matrix1x3 (row vector),
 * then Matrix1x3 * Vec3 (column vector) = Matrix1x1 (Scalar)
 *
 * @param[in]  r   row vector or matrix1x3
 * @param[in]  m   matrix3x3
 * @param[in]  c   column vector or matrix3x1
 *
 * @return scalar value e.g. Matrix1x1
 */
 CGLM_INLINE
 float
 glm_mat3_rmc(vec3 r, mat3 m, vec3 c) {
  vec3 tmp;
  glm_mat3_mulv(m, c, tmp);
  return glm_vec3_dot(r, tmp);
 }
 #endif /* cglm_mat3_h */
--- a/include/cglm/mat4.h
+++ b/include/cglm/mat4.h
@@ -22,6 +22,7 @@
   CGLM_INLINE void  glm_mat4_copy(mat4 mat, mat4 dest);
   CGLM_INLINE void  glm_mat4_identity(mat4 mat);
   CGLM_INLINE void  glm_mat4_identity_array(mat4 * restrict mat, size_t count);
   CGLM_INLINE void  glm_mat4_zero(mat4 mat);
   CGLM_INLINE void  glm_mat4_pick3(mat4 mat, mat3 dest);
   CGLM_INLINE void  glm_mat4_pick3t(mat4 mat, mat3 dest);
   CGLM_INLINE void  glm_mat4_ins3(mat3 mat, mat4 dest);
@@ -29,6 +30,9 @@
   CGLM_INLINE void  glm_mat4_mulN(mat4 *matrices[], int len, mat4 dest);
   CGLM_INLINE void  glm_mat4_mulv(mat4 m, vec4 v, vec4 dest);
   CGLM_INLINE void  glm_mat4_mulv3(mat4 m, vec3 v, vec3 dest);
   CGLM_INLINE float glm_mat4_trace(mat4 m);
   CGLM_INLINE float glm_mat4_trace3(mat4 m);
   CGLM_INLINE void  glm_mat4_quat(mat4 m, versor dest) ;
   CGLM_INLINE void  glm_mat4_transpose_to(mat4 m, mat4 dest);
   CGLM_INLINE void  glm_mat4_transpose(mat4 m);
   CGLM_INLINE void  glm_mat4_scale_p(mat4 m, float s);
@@ -38,6 +42,7 @@
   CGLM_INLINE void  glm_mat4_inv_fast(mat4 mat, mat4 dest);
   CGLM_INLINE void  glm_mat4_swap_col(mat4 mat, int col1, int col2);
   CGLM_INLINE void  glm_mat4_swap_row(mat4 mat, int row1, int row2);
   CGLM_INLINE float glm_mat4_rmc(vec4 r, mat4 m, vec4 c);
 */
 #ifndef cglm_mat_h
@@ -96,7 +101,15 @@
 CGLM_INLINE
 void
 glm_mat4_ucopy(mat4 mat, mat4 dest) {
-  glm__memcpy(float, dest, mat, sizeof(mat4));
+  dest[0][0] = mat[0][0];  dest[1][0] = mat[1][0];
  dest[0][1] = mat[0][1];  dest[1][1] = mat[1][1];
  dest[0][2] = mat[0][2];  dest[1][2] = mat[1][2];
  dest[0][3] = mat[0][3];  dest[1][3] = mat[1][3];
  dest[2][0] = mat[2][0];  dest[3][0] = mat[3][0];
  dest[2][1] = mat[2][1];  dest[3][1] = mat[3][1];
  dest[2][2] = mat[2][2];  dest[3][2] = mat[3][2];
  dest[2][3] = mat[2][3];  dest[3][3] = mat[3][3];
 }
 /*!
@@ -116,6 +129,11 @@ glm_mat4_copy(mat4 mat, mat4 dest) {
  glmm_store(dest[1], glmm_load(mat[1]));
  glmm_store(dest[2], glmm_load(mat[2]));
  glmm_store(dest[3], glmm_load(mat[3]));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest[0], vld1q_f32(mat[0]));
  vst1q_f32(dest[1], vld1q_f32(mat[1]));
  vst1q_f32(dest[2], vld1q_f32(mat[2]));
  vst1q_f32(dest[3], vld1q_f32(mat[3]));
 #else
  glm_mat4_ucopy(mat, dest);
 #endif
@@ -161,6 +179,18 @@ glm_mat4_identity_array(mat4 * __restrict mat, size_t count) {
  }
 }
 /*!
 * @brief make given matrix zero.
 *
 * @param[in, out]  mat  matrix
 */
 CGLM_INLINE
 void
 glm_mat4_zero(mat4 mat) {
  CGLM_ALIGN_MAT mat4 t = GLM_MAT4_ZERO_INIT;
  glm_mat4_copy(t, mat);
 }
 /*!
 * @brief copy upper-left of mat4 to mat3
 *
@@ -250,7 +280,7 @@ glm_mat4_mul(mat4 m1, mat4 m2, mat4 dest) {
  glm_mat4_mul_avx(m1, m2, dest);
 #elif defined( __SSE__ ) || defined( __SSE2__ )
  glm_mat4_mul_sse2(m1, m2, dest);
-#elif defined( __ARM_NEON_FP )
+#elif defined(CGLM_NEON_FP)
  glm_mat4_mul_neon(m1, m2, dest);
 #else
  float a00 = m1[0][0], a01 = m1[0][1], a02 = m1[0][2], a03 = m1[0][3],
@@ -338,6 +368,32 @@ glm_mat4_mulv(mat4 m, vec4 v, vec4 dest) {
 #endif
 }
 /*!
 * @brief trace of matrix
 *
 * sum of the elements on the main diagonal from upper left to the lower right
 *
 * @param[in]  m matrix
 */
 CGLM_INLINE
 float
 glm_mat4_trace(mat4 m) {
  return m[0][0] + m[1][1] + m[2][2] + m[3][3];
 }
 /*!
 * @brief trace of matrix (rotation part)
 *
 * sum of the elements on the main diagonal from upper left to the lower right
 *
 * @param[in]  m matrix
 */
 CGLM_INLINE
 float
 glm_mat4_trace3(mat4 m) {
  return m[0][0] + m[1][1] + m[2][2];
 }
 /*!
 * @brief convert mat4's rotation part to quaternion
 *
@@ -441,10 +497,8 @@ glm_mat4_transpose(mat4 m) {
  glm_mat4_transp_sse2(m, m);
 #else
  mat4 d;
  glm_mat4_transpose_to(m, d);
-
+  glm_mat4_ucopy(d, m);
  glm__memcpy(float, m, d, sizeof(mat4));
 #endif
 }
@@ -478,6 +532,13 @@ void
 glm_mat4_scale(mat4 m, float s) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glm_mat4_scale_sse2(m, s);
 #elif defined(CGLM_NEON_FP)
  float32x4_t v0;
  v0 = vdupq_n_f32(s);
  vst1q_f32(m[0], vmulq_f32(vld1q_f32(m[0]), v0));
  vst1q_f32(m[1], vmulq_f32(vld1q_f32(m[1]), v0));
  vst1q_f32(m[2], vmulq_f32(vld1q_f32(m[2]), v0));
  vst1q_f32(m[3], vmulq_f32(vld1q_f32(m[3]), v0));
 #else
  glm_mat4_scale_p(m, s);
 #endif
@@ -637,4 +698,26 @@ glm_mat4_swap_row(mat4 mat, int row1, int row2) {
  mat[3][row2] = tmp[3];
 }
 /*!
 * @brief helper for  R (row vector) * M (matrix) * C (column vector)
 *
 * rmc stands for Row * Matrix * Column
 *
 * the result is scalar because R * M = Matrix1x4 (row vector),
 * then Matrix1x4 * Vec4 (column vector) = Matrix1x1 (Scalar)
 *
 * @param[in]  r   row vector or matrix1x4
 * @param[in]  m   matrix4x4
 * @param[in]  c   column vector or matrix4x1
 *
 * @return scalar value e.g. B(s)
 */
 CGLM_INLINE
 float
 glm_mat4_rmc(vec4 r, mat4 m, vec4 c) {
  vec4 tmp;
  glm_mat4_mulv(m, c, tmp);
  return glm_vec4_dot(r, tmp);
 }
 #endif /* cglm_mat_h */
--- a/include/cglm/project.h
+++ b/include/cglm/project.h
@@ -8,6 +8,7 @@
 #ifndef cglm_project_h
 #define cglm_project_h
 #include "common.h"
 #include "vec3.h"
 #include "vec4.h"
 #include "mat4.h"
--- a/include/cglm/quat.h
+++ b/include/cglm/quat.h
@@ -218,7 +218,7 @@ glm_quat_normalize_to(versor q, versor dest) {
  float  dot;
  x0   = glmm_load(q);
-  xdot = glmm_dot(x0, x0);
+  xdot = glmm_vdot(x0, x0);
  dot  = _mm_cvtss_f32(xdot);
  if (dot <= 0.0f) {
--- a/include/cglm/simd/arm.h
+++ b/include/cglm/simd/arm.h
@@ -0,0 +1,41 @@
 /*
 * Copyright (c), Recep Aslantas.
 *
 * MIT License (MIT), http://opensource.org/licenses/MIT
 * Full license can be found in the LICENSE file
 */
 #ifndef cglm_simd_arm_h
 #define cglm_simd_arm_h
 #include "intrin.h"
 #ifdef CGLM_SIMD_ARM
 #define glmm_load(p)      vld1q_f32(p)
 #define glmm_store(p, a)  vst1q_f32(p, a)
 static inline
 float
 glmm_hadd(float32x4_t v) {
 #if defined(__aarch64__)
  return vaddvq_f32(v);
 #else
  v = vaddq_f32(v, vrev64q_f32(v));
  v = vaddq_f32(v, vcombine_f32(vget_high_f32(v), vget_low_f32(v)));
  return vgetq_lane_f32(v, 0);
 #endif
 }
 static inline
 float
 glmm_dot(float32x4_t a, float32x4_t b) {
  return glmm_hadd(vmulq_f32(a, b));
 }
 static inline
 float
 glmm_norm(float32x4_t a) {
  return sqrtf(glmm_dot(a, a));
 }
 #endif
 #endif /* cglm_simd_arm_h */
--- a/include/cglm/simd/intrin.h
+++ b/include/cglm/simd/intrin.h
@@ -27,90 +27,64 @@
 #if defined( __SSE__ ) || defined( __SSE2__ )
 #  include <xmmintrin.h>
 #  include <emmintrin.h>
 /* OPTIONAL: You may save some instructions but latency (not sure) */
 #ifdef CGLM_USE_INT_DOMAIN
 #  define glmm_shuff1(xmm, z, y, x, w)                                        \
     _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(xmm),                \
                                        _MM_SHUFFLE(z, y, x, w)))
 #else
 #  define glmm_shuff1(xmm, z, y, x, w)                                        \
     _mm_shuffle_ps(xmm, xmm, _MM_SHUFFLE(z, y, x, w))
 #endif
 #define glmm_shuff1x(xmm, x) glmm_shuff1(xmm, x, x, x, x)
 #define glmm_shuff2(a, b, z0, y0, x0, w0, z1, y1, x1, w1)                     \
     glmm_shuff1(_mm_shuffle_ps(a, b, _MM_SHUFFLE(z0, y0, x0, w0)),           \
                 z1, y1, x1, w1)
 static inline
 __m128
 glmm_dot(__m128 a, __m128 b) {
  __m128 x0;
  x0 = _mm_mul_ps(a, b);
  x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
  return _mm_add_ps(x0, glmm_shuff1(x0, 0, 1, 0, 1));
 }
 static inline
 __m128
 glmm_norm(__m128 a) {
  return _mm_sqrt_ps(glmm_dot(a, a));
 }
 static inline
 __m128
 glmm_load3(float v[3]) {
  __m128i xy;
  __m128  z;
  xy = _mm_loadl_epi64((const __m128i *)v);
  z  = _mm_load_ss(&v[2]);
  return _mm_movelh_ps(_mm_castsi128_ps(xy), z);
 }
 static inline
 void
 glmm_store3(__m128 vx, float v[3]) {
  _mm_storel_pi((__m64 *)&v[0], vx);
  _mm_store_ss(&v[2], glmm_shuff1(vx, 2, 2, 2, 2));
 }
 #ifdef CGLM_ALL_UNALIGNED
 #  define glmm_load(p)      _mm_loadu_ps(p)
 #  define glmm_store(p, a)  _mm_storeu_ps(p, a)
 #else
 #  define glmm_load(p)      _mm_load_ps(p)
 #  define glmm_store(p, a)  _mm_store_ps(p, a)
 #endif
 #endif
 /* x86, x64 */
 #if defined( __SSE__ ) || defined( __SSE2__ )
 #  define CGLM_SSE_FP 1
 #  ifndef CGLM_SIMD_x86
 #    define CGLM_SIMD_x86
 #  endif
 #endif
 #if defined(__SSE3__)
 #  include <x86intrin.h>
 #  ifndef CGLM_SIMD_x86
 #    define CGLM_SIMD_x86
 #  endif
 #endif
 #if defined(__SSE4_1__)
 #  include <smmintrin.h>
 #  ifndef CGLM_SIMD_x86
 #    define CGLM_SIMD_x86
 #  endif
 #endif
 #if defined(__SSE4_2__)
 #  include <nmmintrin.h>
 #  ifndef CGLM_SIMD_x86
 #    define CGLM_SIMD_x86
 #  endif
 #endif
 #ifdef __AVX__
 #  include <immintrin.h>
 #  define CGLM_AVX_FP 1
-
+#  ifndef CGLM_SIMD_x86
-#ifdef CGLM_ALL_UNALIGNED
+#    define CGLM_SIMD_x86
-#  define glmm_load256(p)      _mm256_loadu_ps(p)
+#  endif
 #  define glmm_store256(p, a)  _mm256_storeu_ps(p, a)
 #else
 #  define glmm_load256(p)      _mm256_load_ps(p)
 #  define glmm_store256(p, a)  _mm256_store_ps(p, a)
 #endif
 #endif
 /* ARM Neon */
-#if defined(__ARM_NEON) && defined(__ARM_NEON_FP)
+#if defined(__ARM_NEON)
 #  include <arm_neon.h>
-#  define CGLM_NEON_FP 1
+#  if defined(__ARM_NEON_FP)
-#else
+#    define CGLM_NEON_FP 1
-#  undef  CGLM_NEON_FP
+#    ifndef CGLM_SIMD_ARM
 #      define CGLM_SIMD_ARM
 #    endif
 #  endif
 #endif
 #if defined(CGLM_SIMD_x86) || defined(CGLM_NEON_FP)
 #  ifndef CGLM_SIMD
 #    define CGLM_SIMD
 #  endif
 #endif
 #if defined(CGLM_SIMD_x86)
 #  include "x86.h"
 #endif
 #if defined(CGLM_SIMD_ARM)
 #  include "arm.h"
 #endif
 #endif /* cglm_intrin_h */
--- a/include/cglm/simd/x86.h
+++ b/include/cglm/simd/x86.h
@@ -0,0 +1,136 @@
 /*
 * Copyright (c), Recep Aslantas.
 *
 * MIT License (MIT), http://opensource.org/licenses/MIT
 * Full license can be found in the LICENSE file
 */
 #ifndef cglm_simd_x86_h
 #define cglm_simd_x86_h
 #include "intrin.h"
 #ifdef CGLM_SIMD_x86
 #ifdef CGLM_ALL_UNALIGNED
 #  define glmm_load(p)      _mm_loadu_ps(p)
 #  define glmm_store(p, a)  _mm_storeu_ps(p, a)
 #else
 #  define glmm_load(p)      _mm_load_ps(p)
 #  define glmm_store(p, a)  _mm_store_ps(p, a)
 #endif
 #ifdef CGLM_USE_INT_DOMAIN
 #  define glmm_shuff1(xmm, z, y, x, w)                                        \
     _mm_castsi128_ps(_mm_shuffle_epi32(_mm_castps_si128(xmm),                \
                                        _MM_SHUFFLE(z, y, x, w)))
 #else
 #  define glmm_shuff1(xmm, z, y, x, w)                                        \
       _mm_shuffle_ps(xmm, xmm, _MM_SHUFFLE(z, y, x, w))
 #endif
 #define glmm_shuff1x(xmm, x) glmm_shuff1(xmm, x, x, x, x)
 #define glmm_shuff2(a, b, z0, y0, x0, w0, z1, y1, x1, w1)                     \
     glmm_shuff1(_mm_shuffle_ps(a, b, _MM_SHUFFLE(z0, y0, x0, w0)),           \
                 z1, y1, x1, w1)
 #ifdef __AVX__
 #  ifdef CGLM_ALL_UNALIGNED
 #    define glmm_load256(p)      _mm256_loadu_ps(p)
 #    define glmm_store256(p, a)  _mm256_storeu_ps(p, a)
 #  else
 #    define glmm_load256(p)      _mm256_load_ps(p)
 #    define glmm_store256(p, a)  _mm256_store_ps(p, a)
 #  endif
 #endif
 static inline
 __m128
 glmm_vhadds(__m128 v) {
 #if defined(__SSE3__)
  __m128 shuf, sums;
  shuf = _mm_movehdup_ps(v);
  sums = _mm_add_ps(v, shuf);
  shuf = _mm_movehl_ps(shuf, sums);
  sums = _mm_add_ss(sums, shuf);
  return sums;
 #else
  __m128 shuf, sums;
  shuf = glmm_shuff1(v, 2, 3, 0, 1);
  sums = _mm_add_ps(v, shuf);
  shuf = _mm_movehl_ps(shuf, sums);
  sums = _mm_add_ss(sums, shuf);
  return sums;
 #endif
 }
 static inline
 float
 glmm_hadd(__m128 v) {
  return _mm_cvtss_f32(glmm_vhadds(v));
 }
 static inline
 __m128
 glmm_vdots(__m128 a, __m128 b) {
 #if (defined(__SSE4_1__) || defined(__SSE4_2__)) && defined(CGLM_SSE4_DOT)
  return _mm_dp_ps(a, b, 0xFF);
 #elif defined(__SSE3__) && defined(CGLM_SSE3_DOT)
  __m128 x0, x1;
  x0 = _mm_mul_ps(a, b);
  x1 = _mm_hadd_ps(x0, x0);
  return _mm_hadd_ps(x1, x1);
 #else
  return glmm_vhadds(_mm_mul_ps(a, b));
 #endif
 }
 static inline
 __m128
 glmm_vdot(__m128 a, __m128 b) {
 #if (defined(__SSE4_1__) || defined(__SSE4_2__)) && defined(CGLM_SSE4_DOT)
  return _mm_dp_ps(a, b, 0xFF);
 #elif defined(__SSE3__) && defined(CGLM_SSE3_DOT)
  __m128 x0, x1;
  x0 = _mm_mul_ps(a, b);
  x1 = _mm_hadd_ps(x0, x0);
  return _mm_hadd_ps(x1, x1);
 #else
  __m128 x0;
  x0 = _mm_mul_ps(a, b);
  x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
  return _mm_add_ps(x0, glmm_shuff1(x0, 0, 1, 0, 1));
 #endif
 }
 static inline
 float
 glmm_dot(__m128 a, __m128 b) {
  return _mm_cvtss_f32(glmm_vdots(a, b));
 }
 static inline
 float
 glmm_norm(__m128 a) {
  return _mm_cvtss_f32(_mm_sqrt_ss(glmm_vhadds(_mm_mul_ps(a, a))));
 }
 static inline
 __m128
 glmm_load3(float v[3]) {
  __m128i xy;
  __m128  z;
  xy = _mm_loadl_epi64((const __m128i *)v);
  z  = _mm_load_ss(&v[2]);
  return _mm_movelh_ps(_mm_castsi128_ps(xy), z);
 }
 static inline
 void
 glmm_store3(__m128 vx, float v[3]) {
  _mm_storel_pi((__m64 *)&v[0], vx);
  _mm_store_ss(&v[2], glmm_shuff1(vx, 2, 2, 2, 2));
 }
 #endif
 #endif /* cglm_simd_x86_h */
--- a/include/cglm/types.h
+++ b/include/cglm/types.h
@@ -10,12 +10,12 @@
 #if defined(_MSC_VER)
 /* do not use alignment for older visual studio versions */
-#if _MSC_VER < 1913 /*  Visual Studio 2017 version 15.6  */
+#  if _MSC_VER < 1913 /*  Visual Studio 2017 version 15.6  */
-#  define CGLM_ALL_UNALIGNED
+#    define CGLM_ALL_UNALIGNED
-#  define CGLM_ALIGN(X) /* no alignment */
+#    define CGLM_ALIGN(X) /* no alignment */
-#else
+#  else
-#  define CGLM_ALIGN(X) __declspec(align(X))
+#    define CGLM_ALIGN(X) __declspec(align(X))
-#endif
+#  endif
 #else
 #  define CGLM_ALIGN(X) __attribute((aligned(X)))
 #endif
@@ -33,20 +33,18 @@
 #endif
 typedef float                   vec2[2];
-typedef CGLM_ALIGN_IF(8)  float vec3[3];
+typedef float                   vec3[3];
 typedef int                    ivec3[3];
 typedef CGLM_ALIGN_IF(16) float vec4[4];
 typedef vec4                    versor;
 typedef vec3                    mat3[3];
 #ifdef __AVX__
 typedef CGLM_ALIGN_IF(32) vec3  mat3[3];
 typedef CGLM_ALIGN_IF(32) vec4  mat4[4];
 #else
 typedef                   vec3  mat3[3];
 typedef CGLM_ALIGN_IF(16) vec4  mat4[4];
 #endif
 typedef vec4                    versor;
 #define GLM_E         2.71828182845904523536028747135266250   /* e           */
 #define GLM_LOG2E     1.44269504088896340735992468100189214   /* log2(e)     */
 #define GLM_LOG10E    0.434294481903251827651128918916605082  /* log10(e)    */
--- a/include/cglm/util.h
+++ b/include/cglm/util.h
@@ -19,7 +19,6 @@
 #define cglm_util_h
 #include "common.h"
 #include <stdbool.h>
 #define GLM_MIN(X, Y) (((X) < (Y)) ? (X) : (Y))
 #define GLM_MAX(X, Y) (((X) > (Y)) ? (X) : (Y))
--- a/include/cglm/vec3-ext.h
+++ b/include/cglm/vec3-ext.h
@@ -31,9 +31,6 @@
 #include "common.h"
 #include "util.h"
 #include <stdbool.h>
 #include <math.h>
 #include <float.h>
 /*!
 * @brief fill a vector with specified value
--- a/include/cglm/vec3.h
+++ b/include/cglm/vec3.h
@@ -21,7 +21,6 @@
   CGLM_INLINE void  glm_vec3_zero(vec3 v);
   CGLM_INLINE void  glm_vec3_one(vec3 v);
   CGLM_INLINE float glm_vec3_dot(vec3 a, vec3 b);
   CGLM_INLINE void  glm_vec3_cross(vec3 a, vec3 b, vec3 d);
   CGLM_INLINE float glm_vec3_norm2(vec3 v);
   CGLM_INLINE float glm_vec3_norm(vec3 v);
   CGLM_INLINE void  glm_vec3_add(vec3 a, vec3 b, vec3 dest);
@@ -47,6 +46,8 @@
   CGLM_INLINE void  glm_vec3_inv_to(vec3 v, vec3 dest);
   CGLM_INLINE void  glm_vec3_normalize(vec3 v);
   CGLM_INLINE void  glm_vec3_normalize_to(vec3 v, vec3 dest);
   CGLM_INLINE void  glm_vec3_cross(vec3 a, vec3 b, vec3 d);
   CGLM_INLINE void  glm_vec3_crossn(vec3 a, vec3 b, vec3 dest);
   CGLM_INLINE float glm_vec3_distance(vec3 a, vec3 b);
   CGLM_INLINE float glm_vec3_angle(vec3 a, vec3 b);
   CGLM_INLINE void  glm_vec3_rotate(vec3 v, float angle, vec3 axis);
@@ -166,22 +167,6 @@ glm_vec3_dot(vec3 a, vec3 b) {
  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2];
 }
 /*!
 * @brief vec3 cross product
 *
 * @param[in]  a source 1
 * @param[in]  b source 2
 * @param[out] d destination
 */
 CGLM_INLINE
 void
 glm_vec3_cross(vec3 a, vec3 b, vec3 d) {
  /* (u2.v3 - u3.v2, u3.v1 - u1.v3, u1.v2 - u2.v1) */
  d[0] = a[1] * b[2] - a[2] * b[1];
  d[1] = a[2] * b[0] - a[0] * b[2];
  d[2] = a[0] * b[1] - a[1] * b[0];
 }
 /*!
 * @brief norm * norm (magnitude) of vec
 *
@@ -443,8 +428,8 @@ glm_vec3_maxadd(vec3 a, vec3 b, vec3 dest) {
 *
 * it applies += operator so dest must be initialized
 *
- * @param[in]  a    vector
+ * @param[in]  a    vector 1
- * @param[in]  s    scalar
+ * @param[in]  b    vector 2
 * @param[out] dest dest += min(a, b)
 */
 CGLM_INLINE
@@ -521,6 +506,36 @@ glm_vec3_normalize_to(vec3 v, vec3 dest) {
  glm_vec3_scale(v, 1.0f / norm, dest);
 }
 /*!
 * @brief cross product of two vector (RH)
 *
 * @param[in]  a    vector 1
 * @param[in]  b    vector 2
 * @param[out] dest destination
 */
 CGLM_INLINE
 void
 glm_vec3_cross(vec3 a, vec3 b, vec3 dest) {
  /* (u2.v3 - u3.v2, u3.v1 - u1.v3, u1.v2 - u2.v1) */
  dest[0] = a[1] * b[2] - a[2] * b[1];
  dest[1] = a[2] * b[0] - a[0] * b[2];
  dest[2] = a[0] * b[1] - a[1] * b[0];
 }
 /*!
 * @brief cross product of two vector (RH) and normalize the result
 *
 * @param[in]  a    vector 1
 * @param[in]  b    vector 2
 * @param[out] dest destination
 */
 CGLM_INLINE
 void
 glm_vec3_crossn(vec3 a, vec3 b, vec3 dest) {
  glm_vec3_cross(a, b, dest);
  glm_vec3_normalize(dest);
 }
 /*!
 * @brief angle betwen two vector
 *
--- a/include/cglm/vec4-ext.h
+++ b/include/cglm/vec4-ext.h
@@ -31,9 +31,6 @@
 #include "common.h"
 #include "vec3-ext.h"
 #include <stdbool.h>
 #include <math.h>
 #include <float.h>
 /*!
 * @brief fill a vector with specified value
--- a/include/cglm/vec4.h
+++ b/include/cglm/vec4.h
@@ -122,6 +122,8 @@ void
 glm_vec4_copy(vec4 v, vec4 dest) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glmm_store(dest, glmm_load(v));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vld1q_f32(v));
 #else
  dest[0] = v[0];
  dest[1] = v[1];
@@ -157,6 +159,8 @@ void
 glm_vec4_zero(vec4 v) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glmm_store(v, _mm_setzero_ps());
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(v, vdupq_n_f32(0.0f));
 #else
  v[0] = 0.0f;
  v[1] = 0.0f;
@@ -175,6 +179,8 @@ void
 glm_vec4_one(vec4 v) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glmm_store(v, _mm_set1_ps(1.0f));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(v, vdupq_n_f32(1.0f));
 #else
  v[0] = 1.0f;
  v[1] = 1.0f;
@@ -194,11 +200,8 @@ glm_vec4_one(vec4 v) {
 CGLM_INLINE
 float
 glm_vec4_dot(vec4 a, vec4 b) {
-#if defined( __SSE__ ) || defined( __SSE2__ )
+#if defined(CGLM_SIMD)
-  __m128 x0;
+  return glmm_dot(glmm_load(a), glmm_load(b));
  x0 = _mm_mul_ps(glmm_load(a), glmm_load(b));
  x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
  return _mm_cvtss_f32(_mm_add_ss(x0, glmm_shuff1(x0, 0, 1, 0, 1)));
 #else
  return a[0] * b[0] + a[1] * b[1] + a[2] * b[2] + a[3] * b[3];
 #endif
@@ -218,15 +221,7 @@ glm_vec4_dot(vec4 a, vec4 b) {
 CGLM_INLINE
 float
 glm_vec4_norm2(vec4 v) {
-#if defined( __SSE__ ) || defined( __SSE2__ )
+  return glm_vec4_dot(v, v);
  __m128 x0;
  x0 = glmm_load(v);
  x0 = _mm_mul_ps(x0, x0);
  x0 = _mm_add_ps(x0, glmm_shuff1(x0, 1, 0, 3, 2));
  return _mm_cvtss_f32(_mm_add_ss(x0, glmm_shuff1(x0, 0, 1, 0, 1)));
 #else
  return v[0] * v[0] + v[1] * v[1] + v[2] * v[2] + v[3] * v[3];
 #endif
 }
 /*!
@@ -239,12 +234,10 @@ glm_vec4_norm2(vec4 v) {
 CGLM_INLINE
 float
 glm_vec4_norm(vec4 v) {
-#if defined( __SSE__ ) || defined( __SSE2__ )
+#if defined(CGLM_SIMD)
-  __m128 x0;
+  return glmm_norm(glmm_load(v));
  x0 = glmm_load(v);
  return _mm_cvtss_f32(_mm_sqrt_ss(glmm_dot(x0, x0)));
 #else
-  return sqrtf(glm_vec4_norm2(v));
+  return sqrtf(glm_vec4_dot(v, v));
 #endif
 }
@@ -260,6 +253,8 @@ void
 glm_vec4_add(vec4 a, vec4 b, vec4 dest) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glmm_store(dest, _mm_add_ps(glmm_load(a), glmm_load(b)));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vaddq_f32(vld1q_f32(a), vld1q_f32(b)));
 #else
  dest[0] = a[0] + b[0];
  dest[1] = a[1] + b[1];
@@ -280,6 +275,8 @@ void
 glm_vec4_adds(vec4 v, float s, vec4 dest) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glmm_store(dest, _mm_add_ps(glmm_load(v), _mm_set1_ps(s)));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vaddq_f32(vld1q_f32(v), vdupq_n_f32(s)));
 #else
  dest[0] = v[0] + s;
  dest[1] = v[1] + s;
@@ -300,6 +297,8 @@ void
 glm_vec4_sub(vec4 a, vec4 b, vec4 dest) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glmm_store(dest, _mm_sub_ps(glmm_load(a), glmm_load(b)));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vsubq_f32(vld1q_f32(a), vld1q_f32(b)));
 #else
  dest[0] = a[0] - b[0];
  dest[1] = a[1] - b[1];
@@ -320,6 +319,8 @@ void
 glm_vec4_subs(vec4 v, float s, vec4 dest) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glmm_store(dest, _mm_sub_ps(glmm_load(v), _mm_set1_ps(s)));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vsubq_f32(vld1q_f32(v), vdupq_n_f32(s)));
 #else
  dest[0] = v[0] - s;
  dest[1] = v[1] - s;
@@ -331,15 +332,17 @@ glm_vec4_subs(vec4 v, float s, vec4 dest) {
 /*!
 * @brief multiply two vector (component-wise multiplication)
 *
- * @param a vector1
+ * @param a    vector1
- * @param b vector2
+ * @param b    vector2
- * @param d dest = (a[0] * b[0], a[1] * b[1], a[2] * b[2], a[3] * b[3])
+ * @param dest dest = (a[0] * b[0], a[1] * b[1], a[2] * b[2], a[3] * b[3])
 */
 CGLM_INLINE
 void
 glm_vec4_mul(vec4 a, vec4 b, vec4 dest) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glmm_store(dest, _mm_mul_ps(glmm_load(a), glmm_load(b)));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vmulq_f32(vld1q_f32(a), vld1q_f32(b)));
 #else
  dest[0] = a[0] * b[0];
  dest[1] = a[1] * b[1];
@@ -360,6 +363,8 @@ void
 glm_vec4_scale(vec4 v, float s, vec4 dest) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glmm_store(dest, _mm_mul_ps(glmm_load(v), _mm_set1_ps(s)));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vmulq_f32(vld1q_f32(v), vdupq_n_f32(s)));
 #else
  dest[0] = v[0] * s;
  dest[1] = v[1] * s;
@@ -426,7 +431,6 @@ glm_vec4_divs(vec4 v, float s, vec4 dest) {
 #endif
 }
 /*!
 * @brief add two vectors and add result to sum
 *
@@ -443,6 +447,10 @@ glm_vec4_addadd(vec4 a, vec4 b, vec4 dest) {
  glmm_store(dest, _mm_add_ps(glmm_load(dest),
                              _mm_add_ps(glmm_load(a),
                                         glmm_load(b))));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
                            vaddq_f32(vld1q_f32(a),
                                      vld1q_f32(b))));
 #else
  dest[0] += a[0] + b[0];
  dest[1] += a[1] + b[1];
@@ -467,6 +475,10 @@ glm_vec4_subadd(vec4 a, vec4 b, vec4 dest) {
  glmm_store(dest, _mm_add_ps(glmm_load(dest),
                              _mm_sub_ps(glmm_load(a),
                                         glmm_load(b))));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
                            vsubq_f32(vld1q_f32(a),
                                      vld1q_f32(b))));
 #else
  dest[0] += a[0] - b[0];
  dest[1] += a[1] - b[1];
@@ -491,6 +503,10 @@ glm_vec4_muladd(vec4 a, vec4 b, vec4 dest) {
  glmm_store(dest, _mm_add_ps(glmm_load(dest),
                              _mm_mul_ps(glmm_load(a),
                                         glmm_load(b))));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
                            vmulq_f32(vld1q_f32(a),
                                      vld1q_f32(b))));
 #else
  dest[0] += a[0] * b[0];
  dest[1] += a[1] * b[1];
@@ -515,6 +531,10 @@ glm_vec4_muladds(vec4 a, float s, vec4 dest) {
  glmm_store(dest, _mm_add_ps(glmm_load(dest),
                              _mm_mul_ps(glmm_load(a),
                                         _mm_set1_ps(s))));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
                            vsubq_f32(vld1q_f32(a),
                                      vdupq_n_f32(s))));
 #else
  dest[0] += a[0] * s;
  dest[1] += a[1] * s;
@@ -539,6 +559,10 @@ glm_vec4_maxadd(vec4 a, vec4 b, vec4 dest) {
  glmm_store(dest, _mm_add_ps(glmm_load(dest),
                              _mm_max_ps(glmm_load(a),
                                         glmm_load(b))));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
                            vmaxq_f32(vld1q_f32(a),
                                      vld1q_f32(b))));
 #else
  dest[0] += glm_max(a[0], b[0]);
  dest[1] += glm_max(a[1], b[1]);
@@ -552,8 +576,8 @@ glm_vec4_maxadd(vec4 a, vec4 b, vec4 dest) {
 *
 * it applies += operator so dest must be initialized
 *
- * @param[in]  a    vector
+ * @param[in]  a    vector 1
- * @param[in]  s    scalar
+ * @param[in]  b    vector 2
 * @param[out] dest dest += min(a, b)
 */
 CGLM_INLINE
@@ -563,6 +587,10 @@ glm_vec4_minadd(vec4 a, vec4 b, vec4 dest) {
  glmm_store(dest, _mm_add_ps(glmm_load(dest),
                              _mm_min_ps(glmm_load(a),
                                         glmm_load(b))));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vaddq_f32(vld1q_f32(dest),
                            vminq_f32(vld1q_f32(a),
                                      vld1q_f32(b))));
 #else
  dest[0] += glm_min(a[0], b[0]);
  dest[1] += glm_min(a[1], b[1]);
@@ -582,6 +610,8 @@ void
 glm_vec4_negate_to(vec4 v, vec4 dest) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glmm_store(dest, _mm_xor_ps(glmm_load(v), _mm_set1_ps(-0.0f)));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, veorq_s32(vld1q_f32(v), vdupq_n_f32(-0.0f)));
 #else
  dest[0] = -v[0];
  dest[1] = -v[1];
@@ -615,7 +645,7 @@ glm_vec4_normalize_to(vec4 v, vec4 dest) {
  float  dot;
  x0   = glmm_load(v);
-  xdot = glmm_dot(x0, x0);
+  xdot = glmm_vdot(x0, x0);
  dot  = _mm_cvtss_f32(xdot);
  if (dot == 0.0f) {
@@ -659,10 +689,16 @@ glm_vec4_normalize(vec4 v) {
 CGLM_INLINE
 float
 glm_vec4_distance(vec4 a, vec4 b) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  return glmm_norm(_mm_sub_ps(glmm_load(b), glmm_load(a)));
 #elif defined(CGLM_NEON_FP)
  return glmm_norm(vsubq_f32(glmm_load(a), glmm_load(b)));
 #else
  return sqrtf(glm_pow2(b[0] - a[0])
             + glm_pow2(b[1] - a[1])
             + glm_pow2(b[2] - a[2])
             + glm_pow2(b[3] - a[3]));
 #endif
 }
 /*!
@@ -677,6 +713,8 @@ void
 glm_vec4_maxv(vec4 a, vec4 b, vec4 dest) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glmm_store(dest, _mm_max_ps(glmm_load(a), glmm_load(b)));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vmaxq_f32(vld1q_f32(a), vld1q_f32(b)));
 #else
  dest[0] = glm_max(a[0], b[0]);
  dest[1] = glm_max(a[1], b[1]);
@@ -697,6 +735,8 @@ void
 glm_vec4_minv(vec4 a, vec4 b, vec4 dest) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glmm_store(dest, _mm_min_ps(glmm_load(a), glmm_load(b)));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(dest, vminq_f32(vld1q_f32(a), vld1q_f32(b)));
 #else
  dest[0] = glm_min(a[0], b[0]);
  dest[1] = glm_min(a[1], b[1]);
@@ -718,6 +758,9 @@ glm_vec4_clamp(vec4 v, float minVal, float maxVal) {
 #if defined( __SSE__ ) || defined( __SSE2__ )
  glmm_store(v, _mm_min_ps(_mm_max_ps(glmm_load(v), _mm_set1_ps(minVal)),
                           _mm_set1_ps(maxVal)));
 #elif defined(CGLM_NEON_FP)
  vst1q_f32(v, vminq_f32(vmaxq_f32(vld1q_f32(v), vdupq_n_f32(minVal)),
                         vdupq_n_f32(maxVal)));
 #else
  v[0] = glm_clamp(v[0], minVal, maxVal);
  v[1] = glm_clamp(v[1], minVal, maxVal);
@@ -748,4 +791,23 @@ glm_vec4_lerp(vec4 from, vec4 to, float t, vec4 dest) {
  glm_vec4_add(from, v, dest);
 }
 /*!
 * @brief helper to fill vec4 as [S^3, S^2, S, 1]
 *
 * @param[in]   s    parameter
 * @param[out]  dest destination
 */
 CGLM_INLINE
 void
 glm_vec4_cubic(float s, vec4 dest) {
  float ss;
  ss = s * s;
  dest[0] = ss * s;
  dest[1] = ss;
  dest[2] = s;
  dest[3] = 1.0f;
 }
 #endif /* cglm_vec4_h */
--- a/include/cglm/version.h
+++ b/include/cglm/version.h
@@ -10,6 +10,6 @@
 #define CGLM_VERSION_MAJOR 0
 #define CGLM_VERSION_MINOR 5
-#define CGLM_VERSION_PATCH 0
+#define CGLM_VERSION_PATCH 4
 #endif /* cglm_version_h */
--- a/makefile.am
+++ b/makefile.am
@@ -34,30 +34,32 @@ test_tests_CFLAGS  = $(checkCFLAGS)
 cglmdir=$(includedir)/cglm
 cglm_HEADERS = include/cglm/version.h \
-                  include/cglm/cglm.h \
+               include/cglm/cglm.h \
-                  include/cglm/call.h \
+               include/cglm/call.h \
-                  include/cglm/cam.h \
+               include/cglm/cam.h \
-                  include/cglm/io.h \
+               include/cglm/io.h \
-                  include/cglm/mat4.h \
+               include/cglm/mat4.h \
-                  include/cglm/mat3.h \
+               include/cglm/mat3.h \
-                  include/cglm/types.h \
+               include/cglm/types.h \
-                  include/cglm/common.h \
+               include/cglm/common.h \
-                  include/cglm/affine.h \
+               include/cglm/affine.h \
-                  include/cglm/vec3.h \
+               include/cglm/vec3.h \
-                  include/cglm/vec3-ext.h \
+               include/cglm/vec3-ext.h \
-                  include/cglm/vec4.h \
+               include/cglm/vec4.h \
-                  include/cglm/vec4-ext.h \
+               include/cglm/vec4-ext.h \
-                  include/cglm/euler.h \
+               include/cglm/euler.h \
-                  include/cglm/util.h \
+               include/cglm/util.h \
-                  include/cglm/quat.h \
+               include/cglm/quat.h \
-                  include/cglm/affine-mat.h \
+               include/cglm/affine-mat.h \
-                  include/cglm/plane.h \
+               include/cglm/plane.h \
-                  include/cglm/frustum.h \
+               include/cglm/frustum.h \
-                  include/cglm/box.h \
+               include/cglm/box.h \
-                  include/cglm/color.h \
+               include/cglm/color.h \
-                  include/cglm/project.h \
+               include/cglm/project.h \
-                  include/cglm/sphere.h \
+               include/cglm/sphere.h \
-                  include/cglm/ease.h
+               include/cglm/ease.h \
               include/cglm/curve.h \
               include/cglm/bezier.h
 cglm_calldir=$(includedir)/cglm/call
 cglm_call_HEADERS = include/cglm/call/mat4.h \
@@ -74,10 +76,14 @@ cglm_call_HEADERS = include/cglm/call/mat4.h \
                    include/cglm/call/box.h \
                    include/cglm/call/project.h \
                    include/cglm/call/sphere.h \
-                    include/cglm/call/ease.h
+                    include/cglm/call/ease.h \
                    include/cglm/call/curve.h \
                    include/cglm/call/bezier.h
 cglm_simddir=$(includedir)/cglm/simd
-cglm_simd_HEADERS = include/cglm/simd/intrin.h
+cglm_simd_HEADERS = include/cglm/simd/intrin.h \
                    include/cglm/simd/x86.h \
                    include/cglm/simd/arm.h
 cglm_simd_sse2dir=$(includedir)/cglm/simd/sse2
 cglm_simd_sse2_HEADERS = include/cglm/simd/sse2/affine.h \
@@ -107,7 +113,9 @@ libcglm_la_SOURCES=\
    src/box.c \
    src/project.c \
    src/sphere.c \
-    src/ease.c
+    src/ease.c \
    src/curve.c \
    src/bezier.c
 test_tests_SOURCES=\
    test/src/test_common.c \
@@ -121,7 +129,8 @@ test_tests_SOURCES=\
    test/src/test_vec4.c \
    test/src/test_vec3.c \
    test/src/test_mat3.c \
-    test/src/test_affine.c
+    test/src/test_affine.c \
    test/src/test_bezier.c
 all-local:
 	sh ./post-build.sh
--- a/post-build.sh
+++ b/post-build.sh
@@ -8,12 +8,17 @@
 cd $(dirname "$0")
-mkdir -p .libs
+mkdir -p "$(pwd)/.libs"
 libmocka_folder=$(pwd)/test/lib/cmocka/build/src/
 if [ "$(uname)" = "Darwin" ]; then
-  ln -sf $(pwd)/test/lib/cmocka/build/src/libcmocka.0.dylib \
+  libcmocka=libcmocka.0.dylib
      .libs/libcmocka.0.dylib;
 else
-  ln -sf $(pwd)/test/lib/cmocka/build/src/libcmocka.so.0 \
+  libcmocka=libcmocka.so.0
-      .libs/libcmocka.so.0;
+fi
 libcmocka_path="$libmocka_folder$libcmocka"
 if [ -f "$libcmocka_path" ]; then
  ln -sf "$libcmocka_path" "$(pwd)/.libs/$libcmocka";
 fi
--- a/src/bezier.c
+++ b/src/bezier.c
@@ -0,0 +1,27 @@
 /*
 * Copyright (c), Recep Aslantas.
 *
 * MIT License (MIT), http://opensource.org/licenses/MIT
 * Full license can be found in the LICENSE file
 */
 #include "../include/cglm/cglm.h"
 #include "../include/cglm/call.h"
 CGLM_EXPORT
 float
 glmc_bezier(float s, float p0, float c0, float c1, float p1) {
  return glm_bezier(s, p0, c0, c1, p1);
 }
 CGLM_EXPORT
 float
 glmc_hermite(float s, float p0, float t0, float t1, float p1) {
  return glm_hermite(s, p0, t0, t1, p1);
 }
 CGLM_EXPORT
 float
 glmc_decasteljau(float prm, float p0, float c0, float c1, float p1) {
  return glm_decasteljau(prm, p0, c0, c1, p1);
 }
--- a/src/cam.c
+++ b/src/cam.c
@@ -88,6 +88,12 @@ glmc_perspective(float fovy,
                  dest);
 }
 CGLM_EXPORT
 void
 glmc_persp_move_far(mat4 proj, float deltaFar) {
  glm_persp_move_far(proj, deltaFar);
 }
 CGLM_EXPORT
 void
 glmc_perspective_default(float aspect, mat4 dest) {
--- a/src/curve.c
+++ b/src/curve.c
@@ -0,0 +1,15 @@
 /*
 * Copyright (c), Recep Aslantas.
 *
 * MIT License (MIT), http://opensource.org/licenses/MIT
 * Full license can be found in the LICENSE file
 */
 #include "../include/cglm/cglm.h"
 #include "../include/cglm/call.h"
 CGLM_EXPORT
 float
 glmc_smc(float s, mat4 m, vec4 c) {
  return glm_smc(s, m, c);
 }
--- a/src/mat3.c
+++ b/src/mat3.c
@@ -50,6 +50,12 @@ glmc_mat3_mulv(mat3 m, vec3 v, vec3 dest) {
  glm_mat3_mulv(m, v, dest);
 }
 CGLM_EXPORT
 float
 glmc_mat3_trace(mat3 m) {
  return glm_mat3_trace(m);
 }
 CGLM_EXPORT
 void
 glmc_mat3_quat(mat3 m, versor dest) {
@@ -85,3 +91,9 @@ void
 glmc_mat3_swap_row(mat3 mat, int row1, int row2) {
  glm_mat3_swap_row(mat, row1, row2);
 }
 CGLM_EXPORT
 float
 glmc_mat3_rmc(vec3 r, mat3 m, vec3 c) {
  return glm_mat3_rmc(r, m, c);
 }
--- a/src/mat4.c
+++ b/src/mat4.c
@@ -74,6 +74,18 @@ glmc_mat4_mulv3(mat4 m, vec3 v, float last, vec3 dest) {
  glm_mat4_mulv3(m, v, last, dest);
 }
 CGLM_EXPORT
 float
 glmc_mat4_trace(mat4 m) {
  return glm_mat4_trace(m);
 }
 CGLM_EXPORT
 float
 glmc_mat4_trace3(mat4 m) {
  return glm_mat4_trace3(m);
 }
 CGLM_EXPORT
 void
 glmc_mat4_quat(mat4 m, versor dest) {
@@ -139,3 +151,9 @@ void
 glmc_mat4_swap_row(mat4 mat, int row1, int row2) {
  glm_mat4_swap_row(mat, row1, row2);
 }
 CGLM_EXPORT
 float
 glmc_mat4_rmc(vec4 r, mat4 m, vec4 c) {
  return glm_mat4_rmc(r, m, c);
 }
--- a/src/vec3.c
+++ b/src/vec3.c
@@ -40,8 +40,14 @@ glmc_vec3_dot(vec3 a, vec3 b) {
 CGLM_EXPORT
 void
-glmc_vec3_cross(vec3 a, vec3 b, vec3 d) {
+glmc_vec3_cross(vec3 a, vec3 b, vec3 dest) {
-  glm_vec3_cross(a, b, d);
+  glm_vec3_cross(a, b, dest);
 }
 CGLM_EXPORT
 void
 glmc_vec3_crossn(vec3 a, vec3 b, vec3 dest) {
  glm_vec3_crossn(a, b, dest);
 }
 CGLM_EXPORT
--- a/src/vec4.c
+++ b/src/vec4.c
@@ -206,6 +206,12 @@ glmc_vec4_lerp(vec4 from, vec4 to, float t, vec4 dest) {
  glm_vec4_lerp(from, to, t, dest);
 }
 CGLM_EXPORT
 void
 glmc_vec4_cubic(float s, vec4 dest) {
  glm_vec4_cubic(s, dest);
 }
 /* ext */
 CGLM_EXPORT
--- a/test/src/test_bezier.c
+++ b/test/src/test_bezier.c
@@ -0,0 +1,65 @@
 /*
 * Copyright (c), Recep Aslantas.
 *
 * MIT License (MIT), http://opensource.org/licenses/MIT
 * Full license can be found in the LICENSE file
 */
 #include "test_common.h"
 CGLM_INLINE
 float
 test_bezier_plain(float s, float p0, float c0, float c1, float p1) {
  float x, xx, xxx, ss, sss;
  x   = 1.0f - s;
  xx  = x * x;
  xxx = xx * x;
  ss  = s * s;
  sss = ss * s;
  return p0 * xxx + 3.0f * (c0 * s * xx + c1 * ss * x) + p1 * sss;
 }
 CGLM_INLINE
 float
 test_hermite_plain(float s, float p0, float t0, float t1, float p1) {
  float ss, sss;
  ss  = s  * s;
  sss = ss * s;
  return p0 * (2.0f * sss - 3.0f * ss + 1.0f)
       + t0 * (sss - 2.0f * ss + s)
       + p1 * (-2.0f * sss + 3.0f * ss)
       + t1 * (sss - ss);
 }
 void
 test_bezier(void **state) {
  float s, p0, p1, c0, c1, smc, Bs, Bs_plain;
  s        = test_rand();
  p0       = test_rand();
  p1       = test_rand();
  c0       = test_rand();
  c1       = test_rand();
  /* test cubic bezier */
  smc      = glm_smc(s, GLM_BEZIER_MAT, (vec4){p0, c0, c1, p1});
  Bs       = glm_bezier(s, p0, c0, c1, p1);
  Bs_plain = test_bezier_plain(s, p0, c0, c1, p1);
  assert_true(glm_eq(Bs,  Bs_plain));
  test_assert_eqf(smc, Bs_plain);
  test_assert_eqf(Bs,  smc);
  /* test cubic hermite */
  smc      = glm_smc(s, GLM_HERMITE_MAT, (vec4){p0, p1, c0, c1});
  Bs       = glm_hermite(s, p0, c0, c1, p1);
  Bs_plain = test_hermite_plain(s, p0, c0, c1, p1);
  assert_true(glm_eq(Bs,  Bs_plain));
  assert_true(glm_eq(smc, Bs_plain));
  assert_true(glm_eq(Bs,  smc));
 }
--- a/test/src/test_common.c
+++ b/test/src/test_common.c
@@ -5,6 +5,7 @@
 #include "test_common.h"
 #include <stdlib.h>
 #include <math.h>
 #define m 4
 #define n 4
@@ -58,7 +59,7 @@ test_rand_vec4(vec4 dest) {
 }
 float
-test_rand_angle(void) {
+test_rand(void) {
  srand((unsigned int)time(NULL));
  return drand48();
 }
--- a/test/src/test_common.h
+++ b/test/src/test_common.h
@@ -59,7 +59,7 @@ void
 test_rand_vec4(vec4 dest) ;
 float
-test_rand_angle(void);
+test_rand(void);
 void
 test_rand_quat(versor q);
--- a/test/src/test_main.c
+++ b/test/src/test_main.c
@@ -38,7 +38,10 @@ main(int argc, const char * argv[]) {
    cmocka_unit_test(test_vec3),
    /* affine */
-    cmocka_unit_test(test_affine)
+    cmocka_unit_test(test_affine),
    /* bezier */
    cmocka_unit_test(test_bezier)
  };
  return cmocka_run_group_tests(tests, NULL, NULL);
--- a/test/src/test_mat3.c
+++ b/test/src/test_mat3.c
@@ -24,9 +24,9 @@ test_mat3(void **state) {
  for (i = 0; i < m; i++) {
    for (j = 0; j < n; j++) {
      if (i == j)
-        assert_true(m3[i][j] == 1.0f);
+        assert_true(glm_eq(m3[i][j], 1.0f));
      else
-        assert_true(m3[i][j] == 0.0f);
+        assert_true(glm_eq(m3[i][j], 0.0f));
    }
  }
--- a/test/src/test_mat4.c
+++ b/test/src/test_mat4.c
@@ -24,9 +24,9 @@ test_mat4(void **state) {
  for (i = 0; i < m; i++) {
    for (j = 0; j < n; j++) {
      if (i == j)
-        assert_true(m3[i][j] == 1.0f);
+        assert_true(glm_eq(m3[i][j], 1.0f));
      else
-        assert_true(m3[i][j] == 0.0f);
+        assert_true(glm_eq(m3[i][j], 0.0f));
    }
  }
--- a/test/src/test_quat.c
+++ b/test/src/test_quat.c
@@ -25,7 +25,7 @@ test_quat(void **state) {
  /* 0. test identiy quat */
  glm_quat_identity(q4);
-  assert_true(glm_quat_real(q4) == cosf(glm_rad(0.0f) * 0.5f));
+  assert_true(glm_eq(glm_quat_real(q4), cosf(glm_rad(0.0f) * 0.5f)));
  glm_quat_mat4(q4, rot1);
  test_assert_mat4_eq2(rot1, GLM_MAT4_IDENTITY, 0.000009);
@@ -118,7 +118,7 @@ test_quat(void **state) {
  /* 9. test imag, real */
  /* 9.1 real */
-  assert_true(glm_quat_real(q4) == cosf(glm_rad(-90.0f) * 0.5f));
+  assert_true(glm_eq(glm_quat_real(q4), cosf(glm_rad(-90.0f) * 0.5f)));
  /* 9.1 imag */
  glm_quat_imag(q4, imag);
--- a/test/src/test_tests.h
+++ b/test/src/test_tests.h
@@ -40,4 +40,7 @@ test_vec3(void **state);
 void
 test_affine(void **state);
 void
 test_bezier(void **state);
 #endif /* test_tests_h */
--- a/test/src/test_vec4.c
+++ b/test/src/test_vec4.c
@@ -93,6 +93,13 @@ test_vec4(void **state) {
    /* 3. test SIMD norm2 */
    test_rand_vec4(v);
    test_assert_eqf(test_vec4_norm2(v), glm_vec4_norm2(v));
    /* 4. test SSE/SIMD distance */
    test_rand_vec4(v1);
    test_rand_vec4(v2);
    d1 = glm_vec4_distance(v1, v2);
    d2 = sqrtf(powf(v1[0]-v2[0], 2.0f) + pow(v1[1]-v2[1], 2.0f) + pow(v1[2]-v2[2], 2.0f) + pow(v1[3]-v2[3], 2.0f));
    assert_true(fabsf(d1 - d2) <= 0.000009);
  }
  /* test zero */
--- a/win/cglm.vcxproj
+++ b/win/cglm.vcxproj
@@ -20,8 +20,10 @@
  </ItemGroup>
  <ItemGroup>
    <ClCompile Include="..\src\affine.c" />
    <ClCompile Include="..\src\bezier.c" />
    <ClCompile Include="..\src\box.c" />
    <ClCompile Include="..\src\cam.c" />
    <ClCompile Include="..\src\curve.c" />
    <ClCompile Include="..\src\dllmain.c" />
    <ClCompile Include="..\src\ease.c" />
    <ClCompile Include="..\src\euler.c" />
@@ -39,11 +41,14 @@
  <ItemGroup>
    <ClInclude Include="..\include\cglm\affine-mat.h" />
    <ClInclude Include="..\include\cglm\affine.h" />
    <ClInclude Include="..\include\cglm\bezier.h" />
    <ClInclude Include="..\include\cglm\box.h" />
    <ClInclude Include="..\include\cglm\call.h" />
    <ClInclude Include="..\include\cglm\call\affine.h" />
    <ClInclude Include="..\include\cglm\call\bezier.h" />
    <ClInclude Include="..\include\cglm\call\box.h" />
    <ClInclude Include="..\include\cglm\call\cam.h" />
    <ClInclude Include="..\include\cglm\call\curve.h" />
    <ClInclude Include="..\include\cglm\call\ease.h" />
    <ClInclude Include="..\include\cglm\call\euler.h" />
    <ClInclude Include="..\include\cglm\call\frustum.h" />
@@ -60,6 +65,7 @@
    <ClInclude Include="..\include\cglm\cglm.h" />
    <ClInclude Include="..\include\cglm\color.h" />
    <ClInclude Include="..\include\cglm\common.h" />
    <ClInclude Include="..\include\cglm\curve.h" />
    <ClInclude Include="..\include\cglm\ease.h" />
    <ClInclude Include="..\include\cglm\euler.h" />
    <ClInclude Include="..\include\cglm\frustum.h" />
@@ -69,6 +75,7 @@
    <ClInclude Include="..\include\cglm\plane.h" />
    <ClInclude Include="..\include\cglm\project.h" />
    <ClInclude Include="..\include\cglm\quat.h" />
    <ClInclude Include="..\include\cglm\simd\arm.h" />
    <ClInclude Include="..\include\cglm\simd\avx\affine.h" />
    <ClInclude Include="..\include\cglm\simd\avx\mat4.h" />
    <ClInclude Include="..\include\cglm\simd\intrin.h" />
@@ -77,6 +84,7 @@
    <ClInclude Include="..\include\cglm\simd\sse2\mat3.h" />
    <ClInclude Include="..\include\cglm\simd\sse2\mat4.h" />
    <ClInclude Include="..\include\cglm\simd\sse2\quat.h" />
    <ClInclude Include="..\include\cglm\simd\x86.h" />
    <ClInclude Include="..\include\cglm\sphere.h" />
    <ClInclude Include="..\include\cglm\types.h" />
    <ClInclude Include="..\include\cglm\util.h" />
--- a/win/cglm.vcxproj.filters
+++ b/win/cglm.vcxproj.filters
@@ -84,6 +84,12 @@
    <ClCompile Include="..\src\ease.c">
      <Filter>src</Filter>
    </ClCompile>
    <ClCompile Include="..\src\curve.c">
      <Filter>src</Filter>
    </ClCompile>
    <ClCompile Include="..\src\bezier.c">
      <Filter>src</Filter>
    </ClCompile>
  </ItemGroup>
  <ItemGroup>
    <ClInclude Include="..\src\config.h">
@@ -233,5 +239,23 @@
    <ClInclude Include="..\include\cglm\ease.h">
      <Filter>include\cglm</Filter>
    </ClInclude>
    <ClInclude Include="..\include\cglm\simd\arm.h">
      <Filter>include\cglm\simd</Filter>
    </ClInclude>
    <ClInclude Include="..\include\cglm\simd\x86.h">
      <Filter>include\cglm\simd</Filter>
    </ClInclude>
    <ClInclude Include="..\include\cglm\call\curve.h">
      <Filter>include\cglm\call</Filter>
    </ClInclude>
    <ClInclude Include="..\include\cglm\curve.h">
      <Filter>include\cglm</Filter>
    </ClInclude>
    <ClInclude Include="..\include\cglm\bezier.h">
      <Filter>include\cglm</Filter>
    </ClInclude>
    <ClInclude Include="..\include\cglm\call\bezier.h">
      <Filter>include\cglm\call</Filter>
    </ClInclude>
  </ItemGroup>
 </Project>
Author	SHA1	Message	Date
Recep Aslantas	bb8ff25752	Revert "mark readonly parameters as const"	2019-04-30 08:19:07 +03:00
Recep Aslantas	98244da67f	Merge pull request #86 from recp/const mark readonly parameters as const	2019-04-29 17:58:51 +03:00
Recep Aslantas	14f06a262f	Merge branch 'master' into const	2019-04-28 21:55:23 +03:00
Recep Aslantas	392565f920	mark readonly parameters as const (continue)	2019-04-28 21:48:19 +03:00
Recep Aslantas	120ae9ae99	buil: fix linking cmocka	2019-04-28 19:43:58 +03:00
Recep Aslantas	a5f1ed32af	build: don't link libcmocka if not exists	2019-04-28 19:24:09 +03:00
Recep Aslantas	010e887946	build: don't link libcmocka if not exists	2019-04-28 19:21:13 +03:00
Recep Aslantas	6e501ef1f6	build: don't link libcmocka if not exists	2019-04-28 19:15:55 +03:00
Recep Aslantas	6ed275734b	mark readonly parameters as const	2019-04-28 12:15:43 +03:00
Recep Aslantas	85ca81ce79	Merge pull request #84 from haxpor/fix_82 Resolve vec4 : glm_vec4_distance() to satisfy compiling on armv7	2019-04-27 10:09:44 +03:00
Recep Aslantas	e909c8268d	Merge branch 'master' into fix_82	2019-04-27 09:37:08 +03:00
Recep Aslantas	73e6b65da0	test: fix comparing floats in bezier tests	2019-04-27 09:36:15 +03:00
Recep Aslantas	ecbe36df6b	Merge branch 'master' into fix_82	2019-04-21 00:24:52 +03:00
Recep Aslantas	d85b5234a9	ci: print test logs after failure	2019-04-21 00:19:17 +03:00
Wasin Thonkaew	5b80e0e3c2	test cases for glm_vec4_distance	2019-04-19 03:04:00 +08:00
Wasin Thonkaew	461a4009ba	refactor vec4 : glm_vec4_distance for SSE/SSE2 According to suggestion by recp at https://github.com/recp/cglm/issues/82#issuecomment-483051704.	2019-04-19 02:07:57 +08:00
Wasin Thonkaew	8f2f2c5572	Fix to use armv7 compatible function for glm_vec4_distance Before it used armv8 only function thus it leads to build failed for Android with target of armv7 i.e. armeabi-v7a. This fixed that issue.	2019-04-19 01:47:50 +08:00
Recep Aslantas	81a74ba225	move 'stdbool.h' to common header, add missing common.h header to public headers	2019-03-31 18:58:20 +03:00
Recep Aslantas	6c0c5167b0	docs: fix some parameter docs	2019-03-31 18:53:31 +03:00
Alejandro Coto Gutiérrez	4c5451994f	Include `stddef.h` to ensure `size_t` and other dependent types (#79 )	2019-03-29 08:54:09 +03:00
Wasin Thonkaew	73226bd2fd	Fulfill #76 (#77 ) * Fulfill #76	2019-03-20 09:32:31 +03:00
Recep Aslantas	8fa21a1837	docs: use sphinx_rtd_theme theme dor documentation	2019-03-17 09:33:38 +03:00
Recep Aslantas	ee1937f28d	now working on v0.5.4	2019-03-17 09:29:36 +03:00
Recep Aslantas	b4efcefe7f	drop glm__memcpy, glm__memset and glm__memzero * implement mat3_zero and mat4_zero functions * copy matrix items manually in ucopy functions	2019-02-13 10:14:53 +03:00
Recep Aslantas	0d2e5a996a	docs: add SSE3 and SSE4 dot product options	2019-02-13 10:13:06 +03:00
Recep Aslantas	2b1eece9ac	mat3: add rmc for mat3	2019-02-13 10:12:49 +03:00
Recep Aslantas	c8b8f4f6f0	now working on v0.5.3	2019-02-13 10:00:57 +03:00
Recep Aslantas	1a34ffcf4b	Merge pull request #72 from recp/simd-update SIMD update (NEON, SSE3, SSE4) + Features	2019-02-03 17:18:54 +03:00
Recep Aslantas	af088a1059	Merge branch 'master' into simd-update	2019-02-02 15:58:57 +03:00
Recep Aslantas	18f06743ed	build: make automake build slient (less-verbose)	2019-02-02 15:54:09 +03:00
Recep Aslantas	60cfc87009	remove bezier_solve for now	2019-02-02 15:30:05 +03:00
Recep Aslantas	4e5879497e	update docs	2019-02-02 15:29:48 +03:00
Recep Aslantas	7848dda1dd	curve: cubic hermite intrpolation	2019-01-29 22:17:44 +03:00
Recep Aslantas	1e121a4855	mat4: fix rmc multiplication	2019-01-29 22:11:04 +03:00
Recep Aslantas	0f223db7d3	Merge pull request #74 from ccworld1000/patch-1 Update cglm.podspec	2019-01-29 14:48:46 +03:00
CC	a4e2c39c1d	Update cglm.podspec update pod version	2019-01-29 16:54:02 +08:00
Recep Aslantas	c22231f296	curve: de casteljau implementation for solving cubic bezier	2019-01-28 15:52:42 +03:00
Recep Aslantas	730cb1e9f7	add bezier helpers	2019-01-28 15:32:24 +03:00
Recep Aslantas	b0e48a56ca	test: rename test_rand_angle() to test_rand()	2019-01-28 15:31:03 +03:00
Recep Aslantas	11a6e4471e	fix vec4_cubic function	2019-01-28 14:26:02 +03:00
Recep Aslantas	60cb4beb0a	curve: helper for calculate result of SMC multiplication	2019-01-26 18:06:26 +03:00
Recep Aslantas	32ddf49756	mat4: helper for row * matrix * column	2019-01-26 18:05:05 +03:00
Recep Aslantas	807d5589b4	call: add missing end guard to call headers	2019-01-26 16:05:11 +03:00
Recep Aslantas	59b9e54879	vec4: helper to fill vec4 as [S^3, S^2, S, 1]	2019-01-26 15:54:10 +03:00
Recep Aslantas	fc7f958167	simd: remove re-load in SSE4 and SSE3	2019-01-25 21:56:17 +03:00
Recep Aslantas	31bb303c55	simd: organise SIMD-functions * optimize dot product	2019-01-24 10:17:49 +03:00
Recep Aslantas	be6aa9a89a	simd: optimize some mat4 operations with neon	2019-01-22 09:39:57 +03:00
Recep Aslantas	f65f1d491b	simd: optimize vec4_distance with sse and neon	2019-01-22 09:23:51 +03:00
Recep Aslantas	f0c2a2984e	simd, neon: add missing neon support for vec4	2019-01-22 09:05:38 +03:00
Recep Aslantas	b117f3bf80	neon: add neon support for most vec4 operations	2019-01-21 23:14:04 +03:00
Recep Aslantas	07e60bd098	cam: extend frustum's far distance helper (#71 ) * this will help to implement zoom easily	2019-01-16 14:59:58 +03:00
Recep Aslantas	e3d3cd8ab8	now working on v0.5.2	2019-01-15 12:08:54 +03:00
Recep Aslantas	d17c99215d	Update README.md	2018-12-26 09:57:52 +03:00
Recep Aslantas	dc6eb492c1	Merge pull request #70 from recp/vec3-mat3 remove builtin alignment from vec3 and mat3 types	2018-12-26 09:54:48 +03:00
Recep Aslantas	7219b02d23	remove alignment from vec3 and mat3	2018-12-25 10:08:36 +03:00
Recep Aslantas	21834b4ffb	matrix: trace of matrix	2018-12-06 18:17:02 +03:00
Recep Aslantas	2ef9c23a6c	vec: normalize cross product helper	2018-12-06 18:01:52 +03:00
Recep Aslantas	92605f845a	test: fix comparing two float values in tests	2018-12-05 16:34:22 +03:00
Recep Aslantas	b23d65bef5	now working on v0.5.1	2018-12-05 16:32:13 +03:00