[fix] null values now are pointing at the source buffer, but len==0

This commit is contained in:
Joao Paulo Magalhaes
2022-01-16 20:03:15 +00:00
parent 2a1e39f80a
commit 2707b25d79
7 changed files with 281 additions and 117 deletions

View File

@@ -940,7 +940,7 @@ status, as this is subject to ongoing work.
## Known limitations
ryml makes no effort to follow the standard in the following situations:
ryml deliberatly makes no effort to follow the standard in the following situations:
* `%YAML` directives have no effect and are ignored.
* `%TAG` directives have no effect and are ignored. All schemas are assumed
@@ -962,9 +962,8 @@ ryml makes no effort to follow the standard in the following situations:
* Tabs after `:` or `-` are not supported. YAML test suite cases:
[6BCT](https://github.com/yaml/yaml-test-suite/tree/main/src/6BCT.yaml),
[J3BT](https://github.com/yaml/yaml-test-suite/tree/main/src/J3BT.yaml).
* Containers are not accepted as mapping keys. Keys must be
scalar strings and cannot be mappings or sequences. But mapping
values can be any of the above. YAML test suite cases:
* Containers are not accepted as mapping keys: keys must be
scalar strings. YAML test suite cases:
[4FJ6](https://github.com/yaml/yaml-test-suite/tree/main/src/4FJ6.yaml),
[6BFJ](https://github.com/yaml/yaml-test-suite/tree/main/src/6BFJ.yaml),
[6PBE](https://github.com/yaml/yaml-test-suite/tree/main/src/6PBE.yaml),
@@ -978,38 +977,70 @@ ryml makes no effort to follow the standard in the following situations:
[X38W](https://github.com/yaml/yaml-test-suite/tree/main/src/X38W.yaml),
[XW4D](https://github.com/yaml/yaml-test-suite/tree/main/src/XW4D.yaml).
### Known issues
These issues are in need of attention:
* Due to how the parser works with block sequences and mappings, null values from missing entries are pointing at the next value, and even straddle the comment after it:
```c++
csubstr yaml = R"(
seq:
- ~
- null
-
-
# a comment
-
map:
val0: ~
val1: null
val2:
val3:
# a comment
val4:
)";
Parser p;
Tree t = p.parse_in_arena("file.yml", yaml);
// as expected: (len is null, str is pointing at the value where the node starts)
EXPECT_EQ(t["seq"][0].val(), nullptr);
EXPECT_EQ(t["seq"][1].val(), nullptr);
EXPECT_EQ(t["seq"][2].val(), nullptr);
EXPECT_EQ(t["seq"][3].val(), nullptr);
EXPECT_EQ(t["seq"][4].val(), nullptr);
EXPECT_EQ(t["map"][0].val(), nullptr);
EXPECT_EQ(t["map"][1].val(), nullptr);
EXPECT_EQ(t["map"][2].val(), nullptr);
EXPECT_EQ(t["map"][3].val(), nullptr);
EXPECT_EQ(t["map"][4].val(), nullptr);
// standard null values point at the expected location:
EXPECT_EQ(csubstr(t["seq"][0].val().str, 1), csubstr("~"));
EXPECT_EQ(csubstr(t["seq"][1].val().str, 4), csubstr("null"));
EXPECT_EQ(csubstr(t["map"]["val0"].val().str, 1), csubstr("~"));
EXPECT_EQ(csubstr(t["map"]["val1"].val().str, 4), csubstr("null"));
// but empty null values currently point at the NEXT location:
EXPECT_EQ(csubstr(t["seq"][2].val().str, 15), csubstr("-\n # a comment"));
EXPECT_EQ(csubstr(t["seq"][3].val().str, 6), csubstr("-\nmap:"));
EXPECT_EQ(csubstr(t["seq"][4].val().str, 5), csubstr("\nmap:"));
EXPECT_EQ(csubstr(t["map"]["val2"].val().str, 6), csubstr(" val3:"));
EXPECT_EQ(csubstr(t["map"]["val3"].val().str, 6), csubstr(" val4:"));
EXPECT_EQ(csubstr(t["map"]["val4"].val().str, 1), csubstr("val4:\n").sub(5));
```
------
## Alternative libraries
Why this library? Because none of the existing libraries was quite what I
wanted. There are two C/C++ libraries that I know of:
Why this library? Because none of the existing libraries was quite
what I wanted. When I started this project, I was aware of these two
alternative C/C++ libraries:
* [libyaml](https://github.com/yaml/libyaml)
* [yaml-cpp](https://github.com/jbeder/yaml-cpp)
* [libyaml](https://github.com/yaml/libyaml). This is a bare C library. It does not create a representation of the data tree, so it don't see it as practical. My initial idea was to wrap parsing and emitting around libyaml's convenient event handling, but to my surprise I found out it makes heavy use of allocations and string duplications when parsing. I briefly pondered on sending PRs to reduce these allocation needs, but not having a permanent tree to store the parsed data was too much of a downside.
* [yaml-cpp](https://github.com/jbeder/yaml-cpp). This library may be full of functionality, but is heavy on the use of node-pointer-based structures like `std::map`, allocations, string copies, polymorphism and slow C++ stream serializations. This is generally a sure way of making your code slower, and strong evidence of this can be seen in the benchmark results above.
The standard [libyaml](https://github.com/yaml/libyaml) is a bare C
library. It does not create a representation of the data tree, so it can't
qualify as practical. My initial idea was to wrap parsing and emitting around
libyaml, but to my surprise I found out it makes heavy use of allocations and
string duplications when parsing. I briefly pondered on sending PRs to reduce
these allocation needs, but not having a permanent tree to store the parsed
data was too much of a downside.
When performance and low latency are important, using contiguous structures for better cache behavior and to prevent the library from trampling over the client's caches, parsing in place and using non-owning strings is of central importance. Hence this Rapid YAML library which, with minimal compromise, bridges the gap from efficiency to usability. This library takes inspiration from [RapidJSON](https://github.com/Tencent/rapidjson) and [RapidXML](http://rapidxml.sourceforge.net/).
[yaml-cpp](https://github.com/jbeder/yaml-cpp) is full of functionality, but
is heavy on the use of node-pointer-based structures like `std::map`,
allocations, string copies and slow C++ stream serializations. This is
generally a sure way of making your code slower, and strong evidence of this
can be seen in the benchmark results above.
When performance and low latency are important, using contiguous structures
for better cache behavior and to prevent the library from trampling over the
client's caches, parsing in place and using non-owning strings is of central
importance. Hence this Rapid YAML library which, with minimal compromise,
bridges the gap from efficiency to usability. This library takes inspiration
from [RapidJSON](https://github.com/Tencent/rapidjson)
and [RapidXML](http://rapidxml.sourceforge.net/).
Recently [libfyaml](https://github.com/pantoniou/libfyaml) appeared. This is a newer C library which does offer the tree as a data structure, and is still generally than ryml by a factor somewhere between 2x and 3x slower.
------

View File

@@ -4,7 +4,6 @@
#ifndef _C4_YML_EMIT_HPP_
#include "c4/yml/emit.hpp"
#endif
#include "c4/yml/detail/parser_dbg.hpp"
namespace c4 {
namespace yml {
@@ -290,7 +289,6 @@ void Emitter<Writer>::_write_scalar_block(csubstr s, size_t ilevel, bool explici
RYML_ASSERT(s.find("\r") == csubstr::npos);
csubstr trimmed = s.trimr('\n');
size_t numnewlines_at_end = s.len - trimmed.len;
_c4dbgpf("numnl=%zu s=[%zu]~~~%.*s~~~ trimmed=[%zu]~~~%.*s~~~", numnewlines_at_end, s.len, _c4prsp(s), trimmed.len, _c4prsp(trimmed));
if(numnewlines_at_end == 0)
{
this->Writer::_do_write("|-\n");
@@ -344,16 +342,9 @@ void Emitter<Writer>::_write_scalar(csubstr s, bool was_quoted)
// this block of code needed to be moved to before the needs_quotes
// assignment to workaround a g++ optimizer bug where (s.str != nullptr)
// was evaluated as true even if s.str was actually a nullptr (!!!)
if(s.len == 0)
if(s == nullptr)
{
if(s.str != nullptr)
{
this->Writer::_do_write("''");
}
else
{
this->Writer::_do_write('~');
}
this->Writer::_do_write('~');
return;
}

View File

@@ -533,7 +533,7 @@ bool Parser::_handle_unk()
_move_key_tag_to_val_tag();
_push_level();
_start_map(start_as_child);
_store_scalar("", false);
_store_scalar_null(rem.str);
addrem_flags(RVAL, RKEY);
_save_indentation();
_line_progressed(2);
@@ -546,7 +546,7 @@ bool Parser::_handle_unk()
_move_key_tag_to_val_tag();
_push_level();
_start_map(start_as_child);
_store_scalar("", false);
_store_scalar_null(rem.str);
addrem_flags(RVAL, RKEY);
_save_indentation();
_line_progressed(1);
@@ -837,14 +837,14 @@ bool Parser::_handle_seq_expl()
else if(rem.begins_with(", "))
{
_c4dbgp("found ',' -- the value was null");
_append_val_null();
_append_val_null(rem.str - 1);
_line_progressed(2);
return true;
}
else if(rem.begins_with(','))
{
_c4dbgp("found ',' -- the value was null");
_append_val_null();
_append_val_null(rem.str - 1);
_line_progressed(1);
return true;
}
@@ -1099,7 +1099,7 @@ bool Parser::_handle_seq_impl()
_move_val_anchor_to_key_anchor();
_push_level();
_start_map();
_store_scalar({}, /*is_quoted*/false);
_store_scalar_null(rem.str);
addrem_flags(RVAL, RKEY);
RYML_CHECK(_maybe_set_indentation_from_anchor_or_tag()); // one of them must exist
_line_progressed(rem.begins_with(": ") ? 2u : 1u);
@@ -1111,7 +1111,7 @@ bool Parser::_handle_seq_impl()
addrem_flags(RNXT, RVAL); // before _push_level!
_push_level();
_start_map();
_store_scalar({}, /*is_quoted*/false);
_store_scalar_null(rem.str);
addrem_flags(RVAL, RKEY);
_c4dbgpf("set indentation from map anchor: %zu", m_state->indref + 2);
_set_indentation(m_state->indref + 2); // this is the column where the map starts
@@ -1139,7 +1139,7 @@ bool Parser::_rval_dash_start_or_continue_seq()
{
_c4dbgp("prev val was empty");
addrem_flags(RNXT, RVAL);
_append_val_null();
_append_val_null(&m_state->line_contents.full[ind]);
return false;
}
_c4dbgp("val is a nested seq, indented");
@@ -1180,7 +1180,7 @@ bool Parser::_handle_map_expl()
if(has_all(SSCL))
{
_c4dbgp("the last val was null");
_append_key_val_null();
_append_key_val_null(rem.str - 1);
rem_flags(RVAL);
}
_pop_level();
@@ -1249,7 +1249,7 @@ bool Parser::_handle_map_expl()
if(!has_all(SSCL))
{
_c4dbgp("no key was found, defaulting to empty key ''");
_store_scalar("", false);
_store_scalar_null(rem.str);
}
return true;
}
@@ -1261,7 +1261,7 @@ bool Parser::_handle_map_expl()
if(!has_all(SSCL))
{
_c4dbgp("no key was found, defaulting to empty key ''");
_store_scalar("", false);
_store_scalar_null(rem.str);
}
return true;
}
@@ -1275,7 +1275,7 @@ bool Parser::_handle_map_expl()
else if(rem.begins_with(','))
{
_c4dbgp("prev scalar was a key with null value");
_append_key_val_null();
_append_key_val_null(rem.str - 1);
_line_progressed(1);
return true;
}
@@ -1284,7 +1284,7 @@ bool Parser::_handle_map_expl()
_c4dbgp("map terminates after a key...");
_RYML_CB_ASSERT(m_stack.m_callbacks, has_all(SSCL));
_c4dbgp("the last val was null");
_append_key_val_null();
_append_key_val_null(rem.str - 1);
rem_flags(RVAL);
if(has_all(RSEQIMAP))
{
@@ -1311,7 +1311,7 @@ bool Parser::_handle_map_expl()
else if(rem.begins_with('}'))
{
_c4dbgp("the last val was null");
_append_key_val_null();
_append_key_val_null(rem.str - 1);
_line_progressed(1);
return true;
}
@@ -1328,7 +1328,7 @@ bool Parser::_handle_map_expl()
if(!has_all(SSCL))
{
_c4dbgp("no key was found, defaulting to empty key ''");
_store_scalar("", false);
_store_scalar_null(rem.str);
}
return true;
}
@@ -1397,7 +1397,7 @@ bool Parser::_handle_map_expl()
else if(rem.begins_with(','))
{
_c4dbgp("appending empty val");
_append_key_val_null();
_append_key_val_null(rem.str - 1);
addrem_flags(RKEY, RVAL);
_line_progressed(1);
if(has_any(RSEQIMAP))
@@ -1413,7 +1413,7 @@ bool Parser::_handle_map_expl()
_c4dbgp("stopping implicitly nested 1x map");
if(has_any(SSCL))
{
_append_key_val_null();
_append_key_val_null(rem.str - 1);
}
_stop_seqimap();
_pop_level();
@@ -1473,12 +1473,12 @@ bool Parser::_handle_map_impl()
{
_c4dbgpf("it's a%s scalar", is_quoted ? " quoted" : "");
if(has_all(CPLX|SSCL))
_append_key_val_null();
_append_key_val_null(rem.str - 1);
_store_scalar(rem, is_quoted);
if(has_all(CPLX|RSET))
{
_c4dbgp("it's a complex key, so use null value '~'");
_append_key_val_null();
_append_key_val_null(rem.str);
}
rem = m_state->line_contents.rem;
@@ -1512,7 +1512,7 @@ bool Parser::_handle_map_impl()
add_flags(CPLX);
_line_progressed(2);
if(has_any(SSCL))
_append_key_val_null();
_append_key_val_null(rem.str - 1);
return true;
}
else if(has_all(CPLX) && rem.begins_with(':'))
@@ -1536,7 +1536,7 @@ bool Parser::_handle_map_impl()
if(!has_all(SSCL))
{
_c4dbgp("key was empty...");
_store_scalar("", false);
_store_scalar_null(rem.str);
}
addrem_flags(RVAL, RKEY);
_line_progressed(2);
@@ -1548,7 +1548,7 @@ bool Parser::_handle_map_impl()
if(!has_all(SSCL))
{
_c4dbgp("key was empty...");
_store_scalar("", false);
_store_scalar_null(rem.str);
}
addrem_flags(RVAL, RKEY);
_line_progressed(1);
@@ -1772,7 +1772,7 @@ bool Parser::_handle_key_anchors_and_refs()
{
_RYML_CB_ASSERT(m_stack.m_callbacks, has_any(RKEY));
_c4dbgp("there is a stored key, so this anchor is for the next element");
_append_key_val_null();
_append_key_val_null(rem.str - 1);
rem_flags(CPLX);
return true;
}
@@ -1962,10 +1962,11 @@ bool Parser::_handle_types()
{
_RYML_CB_ASSERT(m_stack.m_callbacks, has_any(RKEY));
_c4dbgp("there is a stored key, so this tag is for the next element");
_append_key_val_null();
_append_key_val_null(rem.str - 1);
rem_flags(CPLX);
}
const char *tag_beginning = rem.str;
size_t tag_indentation = m_state->line_contents.current_col(t);
_c4dbgpf("there was a tag: '%.*s', indentation=%zu", _c4prsp(t), tag_indentation);
_RYML_CB_ASSERT(m_stack.m_callbacks, t.end() > m_state->line_contents.rem.begin());
@@ -1994,8 +1995,8 @@ bool Parser::_handle_types()
if(rem == ':' || rem.begins_with(": "))
{
_c4dbgp("the last val was null, and this is a tag from a null key");
_append_key_val_null();
_store_scalar_null();
_append_key_val_null(tag_beginning - 1);
_store_scalar_null(rem.str - 1);
// do not change the flag to key, it is ~
_RYML_CB_ASSERT(m_stack.m_callbacks, rem.begin() > m_state->line_contents.rem.begin());
size_t token_len = rem == ':' ? 1 : 2;
@@ -2340,7 +2341,7 @@ bool Parser::_scan_scalar(csubstr *C4_RESTRICT scalar, bool *C4_RESTRICT quoted)
if(s == '~' || s == "null" || s == "Null" || s == "NULL")
{
_c4dbgpf("scalar was '%.*s', so use {}", _c4prsp(s));
s = {};
s.len = 0u;
}
*scalar = s;
@@ -2987,7 +2988,7 @@ void Parser::_end_stream()
else if(m_tree->is_map(m_state->node_id))
{
_c4dbgp("append null key val...");
added = _append_key_val_null();
added = _append_key_val_null(m_state->line_contents.rem.str);
if(has_any(RSEQIMAP))
{
_stop_seqimap();
@@ -3008,7 +3009,7 @@ void Parser::_end_stream()
}
else if(has_all(RSEQ|RVAL) && has_none(EXPL))
{
added = _append_val_null();
added = _append_val_null(m_state->line_contents.rem.str);
}
if(added)
@@ -3111,10 +3112,12 @@ void Parser::_start_map(bool as_child)
m_tree->to_map(m_state->node_id);
_c4dbgpf("start_map: id=%zd", m_state->node_id);
}
m_tree->_p(m_state->node_id)->m_val.scalar.str = m_state->line_contents.rem.str;
_write_val_anchor(m_state->node_id);
}
else
{
_RYML_CB_ASSERT(m_stack.m_callbacks, parent_id != NONE);
m_state->node_id = parent_id;
_c4dbgpf("start_map: id=%zd", m_state->node_id);
type_bits as_doc = 0;
@@ -3133,15 +3136,13 @@ void Parser::_start_map(bool as_child)
if(m_key_anchor.not_empty())
m_key_anchor_was_before = true;
_write_val_anchor(parent_id);
if(parent_id != NONE)
if(m_stack.size() >= 2)
{
if(m_stack.size() >= 2)
{
State const& parent_state = m_stack.top(1);
if(parent_state.flags & RSET)
add_flags(RSET);
}
State const& parent_state = m_stack.top(1);
if(parent_state.flags & RSET)
add_flags(RSET);
}
m_tree->_p(parent_id)->m_val.scalar.str = m_state->line_contents.rem.str;
}
if( ! m_val_tag.empty())
{
@@ -3188,13 +3189,13 @@ void Parser::_stop_map()
{
_c4dbgpf("stop_map[%zu]: RVAL", m_state->node_id);
if(!has_all(SSCL))
_store_scalar({}, /*is_quoted*/false);
_append_key_val_null();
_store_scalar_null(m_state->line_contents.rem.str);
_append_key_val_null(m_state->line_contents.rem.str);
}
else if(has_all(CPLX|RKEY))
{
_store_scalar({}, /*is_quoted*/false);
_append_key_val_null();
_store_scalar_null(m_state->line_contents.rem.str);
_append_key_val_null(m_state->line_contents.rem.str);
}
}
@@ -3240,6 +3241,7 @@ void Parser::_start_seq(bool as_child)
_c4dbgpf("start_seq: id=%zd%s", m_state->node_id, as_doc ? " as doc" : "");
}
_write_val_anchor(m_state->node_id);
m_tree->_p(m_state->node_id)->m_val.scalar.str = m_state->line_contents.rem.str;
}
else
{
@@ -3259,6 +3261,7 @@ void Parser::_start_seq(bool as_child)
_move_scalar_from_top();
_c4dbgpf("start_seq: id=%zd%s", m_state->node_id, as_doc ? " as_doc" : "");
_write_val_anchor(parent_id);
m_tree->_p(parent_id)->m_val.scalar.str = m_state->line_contents.rem.str;
}
if( ! m_val_tag.empty())
{
@@ -3301,7 +3304,7 @@ void Parser::_start_seqimap()
_c4dbgpf("node %zu has no children yet, using empty key", m_state->node_id);
_push_level();
_start_map();
_store_scalar("", false);
_store_scalar_null(m_state->line_contents.rem.str);
}
add_flags(RSEQIMAP|EXPL);
}
@@ -3426,7 +3429,7 @@ bool Parser::_handle_indentation()
{
if(has_all(RMAP))
{
_append_key_val_null();
_append_key_val_null(rem.sub(ind).str - 1);
addrem_flags(RKEY, RVAL);
}
else if(has_all(RSEQ))
@@ -3464,12 +3467,12 @@ bool Parser::_handle_indentation()
if(has_all(RMAP))
{
_RYML_CB_ASSERT(m_stack.m_callbacks, has_all(SSCL));
_append_key_val_null();
_append_key_val_null(rem.sub(ind).str - 1);
}
else if(has_all(RSEQ))
{
_RYML_CB_ASSERT(m_stack.m_callbacks, has_none(SSCL));
_append_val_null();
_append_val_null(rem.sub(ind).str - 1);
}
}
// search the stack frame to jump to based on its indentation

View File

@@ -343,15 +343,16 @@ private:
NodeData* _append_val(csubstr val, bool quoted=false);
NodeData* _append_key_val(csubstr val, bool val_quoted=false);
inline NodeData* _append_val_null() { return _append_val({}/*"~"*/); }
inline NodeData* _append_key_val_null() { return _append_key_val({}/*"~"*/); }
bool _rval_dash_start_or_continue_seq();
void _store_scalar(csubstr const& s, bool is_quoted);
void _store_scalar_null() { _store_scalar({}/*"~"*/, false); }
csubstr _consume_scalar();
void _move_scalar_from_top();
inline NodeData* _append_val_null(const char *str) { _RYML_CB_ASSERT(m_stack.m_callbacks, str >= m_buf.begin() && str <= m_buf.end()); return _append_val({str, size_t(0)}); }
inline NodeData* _append_key_val_null(const char *str) { _RYML_CB_ASSERT(m_stack.m_callbacks, str >= m_buf.begin() && str <= m_buf.end()); return _append_key_val({str, size_t(0)}); }
inline void _store_scalar_null(const char *str) { _RYML_CB_ASSERT(m_stack.m_callbacks, str >= m_buf.begin() && str <= m_buf.end()); _store_scalar({str, size_t(0)}, false); }
void _set_indentation(size_t behind);
void _save_indentation(size_t behind=0);
bool _maybe_set_indentation_from_anchor_or_tag();

View File

@@ -3,6 +3,7 @@
#define C4_RYML_TEST_GROUP_HPP_
#include "./test_case.hpp"
#include "c4/yml/detail/parser_dbg.hpp"
#include "c4/span.hpp"
#if defined(_MSC_VER)

View File

@@ -1,64 +1,137 @@
#include "./test_group.hpp"
#include "c4/error.hpp"
namespace c4 {
namespace yml {
C4_SUPPRESS_WARNING_GCC_WITH_PUSH("-Wuseless-cast")
csubstr getafter(csubstr yaml, csubstr pattern)
{
size_t pos = yaml.find(pattern);
RYML_ASSERT(pos != npos);
RYML_ASSERT(yaml.sub(pos).begins_with(pattern));
return yaml.sub(pos + pattern.len);
}
#define _check_null_pointing_at(expr, pattern, arena) \
do \
{ \
EXPECT_EQ(expr, nullptr); \
EXPECT_EQ(expr.len, 0u); \
EXPECT_NE(expr.str, nullptr); \
EXPECT_GE(expr.str, arena.begin()); \
EXPECT_LT(expr.str, arena.end()); \
size_t exprpos = (expr.str - arena.begin()); \
EXPECT_TRUE(arena.sub(exprpos).begins_with(pattern)); \
ASSERT_GE(arena.sub(exprpos).len, csubstr(pattern).len); \
EXPECT_EQ(arena.sub(exprpos).first(csubstr(pattern).len), csubstr(pattern)); \
} while(0)
TEST(null_val, simple)
{
auto tree = parse_in_arena("{foo: , bar: '', baz: [,,,], bat: [ , , , ], two: [,,], one: [,], empty: []}");
EXPECT_EQ(tree["foo"].val(), nullptr);
EXPECT_EQ(tree["bar"].val(), "");
Tree tree = parse_in_arena("{foo: , bar: '', baz: [,,,], bat: [ , , , ], two: [,,], one: [,], empty: []}");
_check_null_pointing_at(tree["foo"].val(), " ,", tree.arena());
ASSERT_EQ(tree["baz"].num_children(), 3u);
EXPECT_EQ(tree["baz"][0].val(), nullptr);
EXPECT_EQ(tree["baz"][1].val(), nullptr);
EXPECT_EQ(tree["baz"][2].val(), nullptr);
_check_null_pointing_at(tree["baz"][0].val(), "[,,,]", tree.arena());
_check_null_pointing_at(tree["baz"][1].val(), ",,,]", tree.arena());
_check_null_pointing_at(tree["baz"][2].val(), ",,]", tree.arena());
ASSERT_EQ(tree["bat"].num_children(), 3u);
EXPECT_EQ(tree["bat"][0].val(), nullptr);
EXPECT_EQ(tree["bat"][1].val(), nullptr);
EXPECT_EQ(tree["bat"][2].val(), nullptr);
_check_null_pointing_at(tree["bat"][0].val(), " , , , ]", tree.arena());
_check_null_pointing_at(tree["bat"][1].val(), " , , ]", tree.arena());
_check_null_pointing_at(tree["bat"][2].val(), " , ]", tree.arena());
ASSERT_EQ(tree["two"].num_children(), 2u);
EXPECT_EQ(tree["two"][0].val(), nullptr);
EXPECT_EQ(tree["two"][1].val(), nullptr);
_check_null_pointing_at(tree["two"][0].val(), "[,,]", tree.arena());
_check_null_pointing_at(tree["two"][1].val(), ",,]", tree.arena());
ASSERT_EQ(tree["one"].num_children(), 1u);
EXPECT_EQ(tree["one"][0].val(), nullptr);
EXPECT_EQ(tree["empty"].num_children(), 0u);
_check_null_pointing_at(tree["one"][0].val(), "[,]", tree.arena());
ASSERT_EQ(tree["empty"].num_children(), 0u);
}
TEST(null_val, simple_seq)
TEST(null_val, block_seq)
{
auto tree = parse_in_arena(R"(
# these have no space after the dash
csubstr yaml = R"(
# nospace
-
-
-
# these have ONE space after the dash
# onespace
-
-
-
)");
ASSERT_EQ(tree.rootref().num_children(), 6u);
EXPECT_EQ(tree[0].val(), nullptr);
EXPECT_EQ(tree[1].val(), nullptr);
EXPECT_EQ(tree[2].val(), nullptr);
EXPECT_EQ(tree[3].val(), nullptr);
EXPECT_EQ(tree[4].val(), nullptr);
EXPECT_EQ(tree[5].val(), nullptr);
# null
- null
- null
- null
- ~
)";
ASSERT_EQ(yaml.count('\r'), 0u);
auto after = [yaml](csubstr pattern){ return getafter(yaml, pattern); };
Tree tree = parse_in_arena(yaml);
ASSERT_EQ(tree.rootref().num_children(), 10u);
// FIXME: empty vals in block seqs are pointing at the next item!
_check_null_pointing_at(tree[0].val(), after("nospace\n-\n"), tree.arena());
_check_null_pointing_at(tree[1].val(), after("nospace\n-\n-\n"), tree.arena());
_check_null_pointing_at(tree[2].val(), after("nospace\n-\n-\n-\n# onespace\n"), tree.arena());
_check_null_pointing_at(tree[3].val(), after("onespace\n- \n"), tree.arena());
_check_null_pointing_at(tree[4].val(), after("onespace\n- \n- \n"), tree.arena());
_check_null_pointing_at(tree[5].val(), after("onespace\n- \n- \n- \n# null\n"), tree.arena());
// but explicitly null vals are ok:
_check_null_pointing_at(tree[6].val(), "null\n- null\n- null\n- ~\n", tree.arena());
_check_null_pointing_at(tree[7].val(), "null\n- null\n- ~", tree.arena());
_check_null_pointing_at(tree[8].val(), "null\n- ~\n", tree.arena());
_check_null_pointing_at(tree[9].val(), "~\n", tree.arena());
}
TEST(null_val, block_map)
{
csubstr yaml = R"(
# nospace
val0:
val1:
val2:
# onespace
val3:
val4:
val5:
# null
val6: null
val7: null
val8: null
val9: ~
)";
ASSERT_EQ(yaml.count('\r'), 0u);
auto after = [yaml](csubstr pattern){ return getafter(yaml, pattern); };
Tree tree = parse_in_arena(yaml);
ASSERT_EQ(tree.rootref().num_children(), 10u);
// FIXME: empty vals in block seqs are pointing at the next item!
_check_null_pointing_at(tree["val0"].val(), after("val0:"), tree.arena());
_check_null_pointing_at(tree["val1"].val(), after("val1:"), tree.arena());
_check_null_pointing_at(tree["val2"].val(), after("val2:\n# onespace"), tree.arena());
_check_null_pointing_at(tree["val3"].val(), after("val3: "), tree.arena());
_check_null_pointing_at(tree["val4"].val(), after("val4: "), tree.arena());
_check_null_pointing_at(tree["val5"].val(), after("val5: \n# null"), tree.arena());
// but explicitly null vals are ok:
_check_null_pointing_at(tree["val6"].val(), "null\nval7:", tree.arena());
_check_null_pointing_at(tree["val7"].val(), "null\nval8:", tree.arena());
_check_null_pointing_at(tree["val8"].val(), "null\nval9:", tree.arena());
_check_null_pointing_at(tree["val9"].val(), "~\n", tree.arena());
}
TEST(null_val, issue103)
{
C4_SUPPRESS_WARNING_GCC_WITH_PUSH("-Wuseless-cast")
auto tree = parse_in_arena(R"({test: null})");
csubstr yaml = R"({test: null})";
Tree tree = parse_in_arena(yaml);
ASSERT_EQ(tree.size(), 2u);
EXPECT_EQ(tree.root_id(), 0u);
EXPECT_EQ(tree.first_child(0), 1u);
EXPECT_EQ((type_bits)tree.type(1), (type_bits)(KEY|VAL));
EXPECT_EQ(tree.key(1), "test");
EXPECT_EQ(tree.val(1), nullptr);
_check_null_pointing_at(tree.val(1), "null", tree.arena());
tree = parse_in_arena(R"({test: Null})");
ASSERT_EQ(tree.size(), 2u);
@@ -67,6 +140,7 @@ TEST(null_val, issue103)
EXPECT_EQ((type_bits)tree.type(1), (type_bits)(KEY|VAL));
EXPECT_EQ(tree.key(1), "test");
EXPECT_EQ(tree.val(1), nullptr);
_check_null_pointing_at(tree.val(1), "Null", tree.arena());
tree = parse_in_arena(R"({test: NULL})");
ASSERT_EQ(tree.size(), 2u);
@@ -75,6 +149,7 @@ TEST(null_val, issue103)
EXPECT_EQ((type_bits)tree.type(1), (type_bits)(KEY|VAL));
EXPECT_EQ(tree.key(1), "test");
EXPECT_EQ(tree.val(1), nullptr);
_check_null_pointing_at(tree.val(1), "NULL", tree.arena());
tree = parse_in_arena(R"({test: })");
ASSERT_EQ(tree.size(), 2u);
@@ -83,6 +158,7 @@ TEST(null_val, issue103)
EXPECT_EQ((type_bits)tree.type(1), (type_bits)(KEY|VAL));
EXPECT_EQ(tree.key(1), "test");
EXPECT_EQ(tree.val(1), nullptr);
_check_null_pointing_at(tree.val(1), " }", tree.arena());
tree = parse_in_arena(R"({test: ~})");
ASSERT_EQ(tree.size(), 2u);
@@ -91,6 +167,16 @@ TEST(null_val, issue103)
EXPECT_EQ((type_bits)tree.type(1), (type_bits)(KEY|VAL));
EXPECT_EQ(tree.key(1), "test");
EXPECT_EQ(tree.val(1), nullptr);
_check_null_pointing_at(tree.val(1), "~", tree.arena());
tree = parse_in_arena(R"({test: "~"})");
ASSERT_EQ(tree.size(), 2u);
EXPECT_EQ(tree.root_id(), 0u);
EXPECT_EQ(tree.first_child(0), 1u);
EXPECT_EQ((type_bits)tree.type(1), (type_bits)(KEY|VAL));
EXPECT_EQ(tree.key(1), "test");
EXPECT_EQ(tree.val(1), "~");
EXPECT_NE(tree.val(1), nullptr);
tree = parse_in_arena(R"({test: "null"})");
ASSERT_EQ(tree.size(), 2u);
@@ -99,6 +185,7 @@ TEST(null_val, issue103)
EXPECT_EQ((type_bits)tree.type(1), (type_bits)(KEY|VAL));
EXPECT_EQ(tree.key(1), "test");
EXPECT_EQ(tree.val(1), "null");
EXPECT_NE(tree.val(1), nullptr);
tree = parse_in_arena(R"({test: "Null"})");
ASSERT_EQ(tree.size(), 2u);
@@ -107,6 +194,7 @@ TEST(null_val, issue103)
EXPECT_EQ((type_bits)tree.type(1), (type_bits)(KEY|VAL));
EXPECT_EQ(tree.key(1), "test");
EXPECT_EQ(tree.val(1), "Null");
EXPECT_NE(tree.val(1), nullptr);
tree = parse_in_arena(R"({test: "NULL"})");
ASSERT_EQ(tree.size(), 2u);
@@ -115,17 +203,64 @@ TEST(null_val, issue103)
EXPECT_EQ((type_bits)tree.type(1), (type_bits)(KEY|VAL));
EXPECT_EQ(tree.key(1), "test");
EXPECT_EQ(tree.val(1), "NULL");
EXPECT_NE(tree.val(1), nullptr);
C4_SUPPRESS_WARNING_GCC_POP
}
TEST(null_val, null_key)
{
auto tree = parse_in_arena(R"(null: null)");
auto tree = parse_in_arena(R"({null: null})");
ASSERT_EQ(tree.size(), 2u);
EXPECT_EQ(tree[0].key(), nullptr);
EXPECT_EQ(tree[0].val(), nullptr);
_check_null_pointing_at(tree[0].key(), "null: ", tree.arena());
_check_null_pointing_at(tree[0].val(), "null}", tree.arena());
}
TEST(null_val, readme_example)
{
csubstr yaml = R"(
seq:
- ~
- null
-
-
# a comment
-
map:
val0: ~
val1: null
val2:
val3:
# a comment
val4:
)";
Parser p;
Tree t = p.parse_in_arena("file.yml", yaml);
// as expected: (len is null, str is pointing at the value where the node starts)
EXPECT_EQ(t["seq"][0].val(), nullptr);
EXPECT_EQ(t["seq"][1].val(), nullptr);
EXPECT_EQ(t["seq"][2].val(), nullptr);
EXPECT_EQ(t["seq"][3].val(), nullptr);
EXPECT_EQ(t["seq"][4].val(), nullptr);
EXPECT_EQ(t["map"][0].val(), nullptr);
EXPECT_EQ(t["map"][1].val(), nullptr);
EXPECT_EQ(t["map"][2].val(), nullptr);
EXPECT_EQ(t["map"][3].val(), nullptr);
EXPECT_EQ(t["map"][4].val(), nullptr);
// standard null values point at the expected location:
EXPECT_EQ(csubstr(t["seq"][0].val().str, 1), csubstr("~"));
EXPECT_EQ(csubstr(t["seq"][1].val().str, 4), csubstr("null"));
EXPECT_EQ(csubstr(t["map"]["val0"].val().str, 1), csubstr("~"));
EXPECT_EQ(csubstr(t["map"]["val1"].val().str, 4), csubstr("null"));
// but empty null values currently point at the NEXT location:
EXPECT_EQ(csubstr(t["seq"][2].val().str, 15), csubstr("-\n # a comment"));
EXPECT_EQ(csubstr(t["seq"][3].val().str, 6), csubstr("-\nmap:"));
EXPECT_EQ(csubstr(t["seq"][4].val().str, 5), csubstr("\nmap:"));
EXPECT_EQ(csubstr(t["map"]["val2"].val().str, 6), csubstr(" val3:"));
EXPECT_EQ(csubstr(t["map"]["val3"].val().str, 6), csubstr(" val4:"));
EXPECT_EQ(csubstr(t["map"]["val4"].val().str, 1), csubstr("val4:\n").sub(5));
}
@@ -364,3 +499,5 @@ INSTANTIATE_GROUP(NULL_VAL)
} // namespace yml
} // namespace c4
C4_SUPPRESS_WARNING_GCC_POP