Fix style

This commit is contained in:
Evgeny Gavrin
2015-06-26 20:05:36 +03:00
parent 9b2663a889
commit 772d72c073
6 changed files with 189 additions and 194 deletions
+23 -23
View File
@@ -10,11 +10,11 @@ permalink: /internals/
# High-Level Design
![High-Level Design]({{ site.baseurl }}/img/engines_high_level_design.jpg){: class="thumbnail center-block img-responsive" }
On the diagram above is shown interaction of major components of software system: Parser and Runtime. Parser performs translation of input ECMAScript application into byte-code with specified format (refer to [Bytecode](/internals/#byte-code) and [Parser](/internals/#parser) page for details). Prepared bytecode is executed by Runtime engine that performs interpretation (refer to [Virtual Machine](/internals/#virtual-machine) and [ECMA](/internals/#ECMA) pages for details).
On the diagram above is shown interaction of major components of software system: Parser and Runtime. Parser performs translation of input ECMAScript application into the byte-code with the specified format (refer to [Bytecode](/internals/#byte-code) and [Parser](/internals/#parser) page for details). Prepared bytecode is executed by Runtime engine that performs interpretation (refer to [Virtual Machine](/internals/#virtual-machine) and [ECMA](/internals/#ECMA) pages for details).
# Parser
The parser is implemented as recursive descent parser. The parser does not build any type of Abstract Syntax Tree. It converts source JavaScript code directly into byte-code.
The parser is implemented as recursive descent parser. The parser does not build any type of Abstract Syntax Tree. It converts the source JavaScript code directly into the byte-code.
The parser consists of three major parts:
- lexer
@@ -25,7 +25,7 @@ The parser consists of three major parts:
These four (except the parser itself) components are initialized during `parser_init` call (jerry-core/parser/js/parser.cpp).
This initializer requires two following subsystems to be initialized: memory allocator and serializer. The need for allocator is clear. The serializer resets internal bytecode_data structure(jerry-core/parser/js/bytecode-data.h). Currently bytecode_data is singleton. During parsing it is filled by data which is needed for further execution:
This initializer requires two following subsystems to be initialized: memory allocator and serializer. The need for allocator is clear. The serializer resets internal bytecode_data structure(jerry-core/parser/js/bytecode-data.h). Currently bytecode_data is singleton. During parsing it is filled by the data which is needed for the further execution:
* Byte-code - array of opcodes (`bytecode_data.opcodes`).
* Literals - array of literals (`bytecode_data.literals`).
@@ -46,20 +46,20 @@ After initialization `parser_parse_program` (`./jerry-core/js/parser.cpp`) shoul
1. Initialize a scope.
2. Do pre-parser stage.
3. Parse scope code.
3. Parse the scope code.
After every scope is processed, parser merges all scopes into single byte-code array.
After every scope is processed, parser merges all scopes into the single byte-code array.
Two new entities were introduced - scopes and pre-parser.
* There are two types of scopes in the parser: global scope and function declaration scope. Notice that function expressions do not create a new scope in terms of the parser. A reason why is described below. Parsing process starts on global scope. If a function declaration occurs string the process, new scope is created, this new scope is pushed to a stack of current scopes; then steps 1-3 of parsing are performed. Note, that only global scope parsing shall merge all scopes into a byte-code. All scopes are stored in a tree to represent a hierarchy of them.
* There are two types of scopes in the parser: global scope and function declaration scope. Notice that function expressions do not create a new scope in terms of the parser. The reason why is described below. Parsing process starts on global scope. If a function declaration occurs string the process, new scope is created, this new scope is pushed to a stack of current scopes; then steps 1-3 of parsing are performed. Note, that only global scope parsing shall merge all scopes into a byte-code. All scopes are stored in a tree to represent a hierarchy of them.
* Pre-parser. This step performs hoisting of variable declarations. First, it dumps `reg_var_decl` opcodes. Then it goes through the script and looks for variable declaration lists. For every found variable in the scope (not in a sub-scope or function expression) it dumps var_decl opcode. After this step byte-code in the scope starts with optional `'use strict'` marker, then `reg_var_decl` and several (optional) `var_decls`.
Due to some limitations of the parser, some parsing functions take `this_arg` and/or `prop` as parameters. They are further used to dump `prop_setter` opcode. During parsing all necessary data is stored in either stacks or scope trees. After parsing of whole program, the parser merges all scopes into a single byte-code, hoisting function declarations in process. This task, so-called post-parser, is performed by `scopes_tree_raw_data` (jerry-core/js/scopes-tree.c) function. For further information about post-parser, check opcodes dumper section.
Due to some limitations of the parser, some parsing functions take `this_arg` and/or `prop` as parameters. They are further used to dump `prop_setter` opcode. During parsing all necessary data is stored in either stacks or scope trees. After parsing of the whole program, the parser merges all scopes into a single byte-code, hoisting function declarations in process. This task, so-called post-parser, is performed by `scopes_tree_raw_data` (jerry-core/js/scopes-tree.c) function. For the further information about post-parser, check opcodes dumper section.
### Lexer
The lexer splits input string on set of tokens. The token structure (`./jerry-core/parser/js/lexer.h`) consists of three elements: token type, location of the token and optional data:
The lexer splits input string into the set of tokens. The token structure (`./jerry-core/parser/js/lexer.h`) consists of three elements: token type, location of the token and optional data:
{% highlight cpp %}
typedef struct
@@ -71,7 +71,7 @@ typedef struct
token;
{% endhighlight %}
Location of token (`locus`). It is just an index of first token's character at a string that represents the program. Token types are are listed in lexer.h header file (`token_type` enum). Depending on token type, token specific data (`uid` field) has the different meaning.
Location of token (`locus`). It is just an index of the first token's character at a string that represents the program. Token types are listed in lexer.h header file (`token_type` enum). Depending on token type, token specific data (`uid` field) has the different meaning.
<div class="CSSTableGenerator" markdown="block">
@@ -84,11 +84,11 @@ Other (punctuators) | Not used.
</div>
Token matching algorithm is straightforward - look at the first character of new token, recognize the type, and then just match the rest. Comments and space characters (except new line) are ignored, so they produce no token. The algorithm uses two pointers: buffer and token_start. The first one points to the next character of the input, the other one points to the first character of token, being matched, so-called current token.
Token matching algorithm is straightforward - look at the first character of the new token, recognize the type, and then just match the rest. Comments and space characters (except new line) are ignored, so they produce no token. The algorithm uses two pointers: buffer and token_start. The first one points to the next character of the input, the other one points to the first character of token, being matched, so-called current token.
The lexer remembers two tokens during scan: current and previously seen. It also allows buffering one token to be rescanned (`lexer_save_token`) and setting scan position to any location in the file (`lexer_seek`).
The parser uses lexer two scan file two times - during pre-parsing and parsing stages.
The parser uses lexer to scan file two times - during pre-parsing and parsing stages.
Currently the lexer does not support any encoding except ASCII. Also the lexer does not support regular expressions.
@@ -110,7 +110,7 @@ The post-parser merges scopes into a single byte-code. For each scope it first d
### Serializer
Serializer dumps literals collected by the lexer to bytecode_data, is used by the dumper to dump or rewrite op_metas to a current scope. There is no much to say about this component.
Serializer dumps literals collected by the lexer to bytecode_data, is used by the dumper to dump or rewrite op_metas to a current scope.
### Syntax Errors Checker
@@ -212,7 +212,7 @@ where
## Function call/Constructor call
Function/constructor call are utilized to perform calls to functions and constructors. Destination operand is encoded in `dst` field. Operand `name_idx` specifies the name of the function to call. Arguments are encoded the same way as in native call instruction.
Function/constructor call are utilized to perform calls to functions and constructors. Destination operand is encoded in `dst` field. Operand `name_idx` specifies the name of the function to call. Arguments are encoded the same way as in the native call instruction.
<div class="CSSTableGeneratorByte" markdown="block">
@@ -227,7 +227,7 @@ where
## Function declaration
Function declarations are represented by special kind of instructions. Function name and number of arguments are located in `name_idx` and `arg_list` fields respectively.
Function declarations are represented by the special kind of instructions. Function name and number of arguments are located in `name_idx` and `arg_list` fields respectively.
<div class="CSSTableGeneratorByte" markdown="block">
@@ -237,7 +237,7 @@ Function declarations are represented by special kind of instructions. Function
where
`name_idx` - literal idx
`arg_list` - namber of arguments
`arg_list` - number of arguments
## Function expression
@@ -292,7 +292,7 @@ Meta instructions are usually utilized as continuations of other instructions. D
## Delete
JavaScript delete operator is modeled with delete instruction in the bytecode. There are two types of delete instruction, applied either to element of lexical environment or to object's property.
JavaScript delete operator is represented with delete instruction in the bytecode. There are two types of delete instruction, applied either to element of lexical environment or to object's property.
<div class="CSSTableGeneratorByte" markdown="block">
@@ -388,7 +388,7 @@ where
## Object declaration
Obect declaration instruction represents object literal in JavaScript specification. It consists of `op_obj_decl` instruction, followed by a list of `prop_data`, `prop_getter` and `prop_setter` meta instructions. A series of instructions which evaluate property values can precede meta instructions. Number of meta instructions, e.g. number of properties, is specified in the `prop_num` field.
Obect declaration instruction represents object literal in JavaScript specification. It consists of `op_obj_decl` instruction, followed by the list of `prop_data`, `prop_getter` and `prop_setter` meta instructions. A series of instructions which evaluate property values can precede meta instructions. Number of meta instructions, e.g. number of properties, is specified in the `prop_num` field.
<div class="CSSTableGeneratorByte" markdown="block">
@@ -446,7 +446,7 @@ Instruction can have up to three operands which are represented by `idx` values.
- type of meta and corresponding arguments in `op_meta`
- idx pair may represent opcode position
During execution every function of the source code has associated
During the execution every function of the source code has associated
interpreter context, which consists of the following items:
- current position (byte-code instruction to execute)
@@ -467,7 +467,7 @@ Main routines of the virtual machine are:
# ECMA
ECMA component of the engine is responsible for following notions:
ECMA component of the engine is responsible for the following notions:
- Data representation
- Runtime representation
- GC
@@ -559,11 +559,11 @@ Header occupies 8 bytes and consists of:
- compressed pointer to the next chunk
- number of elements
- rest space, aligned down to byte, is for first chunk of data in collection
- rest space, aligned down to byte, is for the first chunk of data in collection
Chunk's layout is following:
- compressed pointer to next chunk
- compressed pointer to the next chunk
- rest space, aligned down to byte, is for data stored in corresponding part of the collection
### Internal properties:
@@ -576,7 +576,7 @@ Chunk's layout is following:
- [[Code]] - where to find bytecode of the function
- native code - where to find code of native unction
- native handle - some uintptr_t assosiated with the objec
- [[FormalParameters]] - collection of pointers to ecma_string_t (the listof formal parameters of the function)
- [[FormalParameters]] - collection of pointers to ecma_string_t (the list of formal parameters of the function)
- [[PrimitiveValue]] for String - for String object
- [[PrimitiveValue]] for Number - for Number object
- [[PrimitiveValue]] for Boolean - for Boolean object
@@ -597,7 +597,7 @@ Entry of LCache has the following layout:
- property name (pointer to string)
- property pointer
Caches's row is defined by string's hash. When a property access occurs, all row's entries are searched by comparing object pointer and property name to according entry's fields, full comparison is used for property name.
Caches's row is defined by string's hash. When a property access occurs, all row's entries are searched by comparing object pointer and property name according entry's fields, full comparison is used for property name.
If corresponding entry was found, its property pointer is returned (may be NULL - in case when there is no property with specified name in given object).
Otherwise, object's property set is iterated fully and corresponding record is registered in LCache (with property pointer if it was found or NULL otherwise).