[fix] preserve quote status of container keys

This commit is contained in:
Joao Paulo Magalhaes
2022-01-30 01:03:43 +00:00
parent 0adf33a33f
commit 24e5276819
7 changed files with 186 additions and 21 deletions

View File

@@ -5,7 +5,7 @@ This release improves compliance with the [YAML test suite](https://github.com/y
As part of the [new feature to track source locations](https://github.com/biojppm/rapidyaml/pull/168), opportunity was taken to address a number of pre-existing API issues. These changes consisted of:
- Deprecate `c4::yml::parse()` and `c4::yml::Parser::parse()` overloads; all these functions will be shortly removed. Until removal, any call from client code will trigger a compiler warning.
- Deprecate `c4::yml::parse()` and `c4::yml::Parser::parse()` overloads; all these functions will be removed in short order. Until removal, any call from client code will trigger a compiler warning.
- Add `parse()` alternatives, either `parse_in_place()` or `parse_in_arena()`:
- `parse_in_place()` receives only `substr` buffers, ie mutable YAML source buffers. Trying to pass a `csubstr` buffer to `parse_in_place()` will cause a compile error:
```c++
@@ -15,18 +15,18 @@ As part of the [new feature to track source locations](https://github.com/biojpp
csubstr readonly = /*...*/;
Tree tree = parse_in_place(readonly); // compile error
```
- `parse_in_arena()` receives only `csubstr` buffers, ie immutable YAML source buffers. Prior to parsing, the buffer is copied to the tree's arena, then the copy is parsed in place. Because `parse_in_arena()` is meant for immutable buffers, overloads receiving a `substr` YAML buffer are now declared and marked deprecated, and intentionally left undefined, such that calling `parse_in_arena()` with a `substr` will cause a linker error.
- `parse_in_arena()` receives only `csubstr` buffers, ie immutable YAML source buffers. Prior to parsing, the buffer is copied to the tree's arena, then the copy is parsed in place. Because `parse_in_arena()` is meant for immutable buffers, overloads receiving a `substr` YAML buffer are now declared but marked deprecated, and intentionally left undefined, such that calling `parse_in_arena()` with a `substr` will cause a linker error as well as a compiler warning.
```c++
substr readwrite = /*...*/;
Tree tree = parse_in_arena(readwrite); // compile warning+linker error
```
This is to prevent an accidental extra copy of the mutable source buffer to the tree's arena: `substr` is implicitly convertible to `csubstr`. If you really intend to parse an originally mutable buffer in the tree's arena, convert it first to immutable by assigning the `substr` to a `csubstr` prior to calling `parse_in_arena()`:
This is to prevent an accidental extra copy of the mutable source buffer to the tree's arena: `substr` is implicitly convertible to `csubstr`. If you really intend to parse an originally mutable buffer in the tree's arena, convert it first explicitly to immutable by assigning the `substr` to a `csubstr` prior to calling `parse_in_arena()`:
```c++
substr readwrite = /*...*/;
csubstr readonly = readwrite; // ok
Tree tree = parse_in_arena(readonly); // ok
```
This problem is not raised by `parse_in_place()` because `csubstr` is not implicitly convertible to `substr`.
This problem does not occur with `parse_in_place()` because `csubstr` is not implicitly convertible to `substr`.
- In the python API, `ryml.parse()` was removed and not just deprecated; the `parse_in_arena()` and `parse_in_place()` now replace this.
- `Callbacks`: changed behavior in `Parser` and `Tree`:
- When a tree is copy-constructed or move-constructed to another, the receiving tree will start with the callbacks of the original.
@@ -102,8 +102,7 @@ As part of the [new feature to track source locations](https://github.com/biojpp
### Fixes
- Ensure error messages do not wrap around the buffer when the YAML source line is too long ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)).
- Ensure error is emitted on unclosed flow sequence characters eg `[[[` ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)). Same thing for `[]]`.
- Ensure container keys preserve quote flags when the key is quoted ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)).
- Fix [#203](https://github.com/biojppm/rapidyaml/issues/203): when parsing, do not convert `null` or `~` to null scalar strings. Now the scalar strings contain the verbatim contents of the original scalar; to query whether a scalar value is null, use `Tree::key_is_null()/val_is_null()` and `NodeRef::key_is_null()/val_is_null()` which return true if it is empty or any of the unquoted strings `~`, `null`, `Null`, or `NULL`. ([PR#207](https://github.com/biojppm/rapidyaml/pulls/207)):
- Fix [#205](https://github.com/biojppm/rapidyaml/issues/205): fix parsing of escaped characters in double-quoted strings: `"\\\"\n\r\t\<TAB>\/\<SPC>\0\b\f\a\v\e\_\N\L\P"` ([PR#207](https://github.com/biojppm/rapidyaml/pulls/207)).
- Fix [#204](https://github.com/biojppm/rapidyaml/issues/204): add decoding of unicode codepoints `\x` `\u` `\U` in double-quoted scalars:
@@ -120,6 +119,8 @@ As part of the [new feature to track source locations](https://github.com/biojpp
- Fix [#185](https://github.com/biojppm/rapidyaml/issues/185): compilation failures in earlier Xcode versions ([PR #187](https://github.com/biojppm/rapidyaml/pull/187) and [PR c4core#61](https://github.com/biojppm/c4core/pull/61)):
- `c4/substr_fwd.hpp`: (failure in Xcode 12 and earlier) forward declaration for `std::allocator` is inside the `inline namespace __1`, unlike later versions.
- `c4/error.hpp`: (failure in debug mode in Xcode 11 and earlier) `__clang_major__` does not mean the same as in the common clang, and as a result the warning `-Wgnu-inline-cpp-without-extern` does not exist there.
- Ensure error messages do not wrap around the buffer when the YAML source line is too long ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)).
- Ensure error is emitted on unclosed flow sequence characters eg `[[[` ([PR#210](https://github.com/biojppm/rapidyaml/pulls/210)). Same thing for `[]]`.
### Improvements

View File

@@ -3091,8 +3091,11 @@ void Parser::_start_map(bool as_child)
m_state->node_id = m_tree->append_child(parent_id);
if(has_all(SSCL))
{
type_bits key_quoted = NOTYPE;
if(m_state->flags & SSCL_QUO) // before consuming the scalar
key_quoted |= KEYQUO;
csubstr key = _consume_scalar();
m_tree->to_map(m_state->node_id, key, has_all(SSCL_QUO) ? KEYQUO : NOTYPE);
m_tree->to_map(m_state->node_id, key, key_quoted);
_c4dbgpf("start_map: id=%zd key='%.*s'", m_state->node_id, _c4prsp(m_tree->key(m_state->node_id)));
_write_key_anchor(m_state->node_id);
if( ! m_key_tag.empty())
@@ -3216,8 +3219,11 @@ void Parser::_start_seq(bool as_child)
if(has_all(SSCL))
{
_RYML_CB_ASSERT(m_stack.m_callbacks, m_tree->is_map(parent_id));
csubstr name = _consume_scalar();
m_tree->to_seq(m_state->node_id, name);
type_bits key_quoted = 0;
if(m_state->flags & SSCL_QUO) // before consuming the scalar
key_quoted |= KEYQUO;
csubstr key = _consume_scalar();
m_tree->to_seq(m_state->node_id, key, key_quoted);
_c4dbgpf("start_seq: id=%zd name='%.*s'", m_state->node_id, _c4prsp(m_tree->key(m_state->node_id)));
_write_key_anchor(m_state->node_id);
if( ! m_key_tag.empty())
@@ -3279,18 +3285,22 @@ void Parser::_start_seqimap()
_c4dbgpf("start_seqimap at node=%zu. has_children=%d", m_state->node_id, m_tree->has_children(m_state->node_id));
_RYML_CB_ASSERT(m_stack.m_callbacks, has_all(RSEQ|EXPL));
// create a map, and turn the last scalar of this sequence
// into the key of the map's first child.
// into the key of the map's first child. This scalar was
// understood to be a value in the sequence, but it is
// actually a key of a map, implicitly opened here.
// Eg [val, key: val]
//
// Yep, YAML is crazy.
if(m_tree->has_children(m_state->node_id) && m_tree->has_val(m_tree->last_child(m_state->node_id)))
{
auto prev = m_tree->last_child(m_state->node_id);
_c4dbgpf("has children and last child=%zu has val. saving the scalars, val='%.*s', val='%.*s'", prev, _c4prsp(m_tree->val(prev)), _c4prsp(m_tree->valsc(prev).scalar));
size_t prev = m_tree->last_child(m_state->node_id);
NodeType ty = m_tree->_p(prev)->m_type; // don't use type() because it masks out the quotes
NodeScalar tmp = m_tree->valsc(prev);
_c4dbgpf("has children and last child=%zu has val. saving the scalars, val='%.*s' quoted=%d", prev, _c4prsp(tmp.scalar), ty.is_val_quoted());
m_tree->remove(prev);
_push_level();
_start_map();
_store_scalar(tmp.scalar, m_tree->is_val_quoted(prev));
_store_scalar(tmp.scalar, ty.is_val_quoted());
m_key_anchor = tmp.anchor;
m_key_tag = tmp.tag;
}

View File

@@ -266,12 +266,12 @@ R"('implicit block key' : [
'implicit flow key m' : {key1: val1, key2: val2},
'implicit flow key s' : [val1, val2],
])",
L{N("implicit block key", L{
N(L{N("implicit flow key 1", "value1")}),
N(L{N("implicit flow key 2", "value2")}),
N(L{N("implicit flow key 3", "value3")}),
N(L{N("implicit flow key m", L{N("key1", "val1"), N("key2", "val2")})}),
N(L{N("implicit flow key s", L{N("val1"), N("val2")})}),
L{N(KEYSEQ|KEYQUO, "implicit block key", L{
N(L{N(KEYVAL|KEYQUO, "implicit flow key 1", "value1")}),
N(L{N(KEYVAL|KEYQUO, "implicit flow key 2", "value2")}),
N(L{N(KEYVAL|KEYQUO, "implicit flow key 3", "value3")}),
N(L{N(KEYMAP|KEYQUO, "implicit flow key m", L{N("key1", "val1"), N("key2", "val2")})}),
N(L{N(KEYSEQ|KEYQUO, "implicit flow key s", L{N("val1"), N("val2")})}),
})}),
/* TODO JAVAI 209

View File

@@ -51,6 +51,128 @@ TEST(simple_map, many_unmatched_brackets)
}
}
TEST(simple_map, missing_quoted_key)
{
csubstr yaml = R"(
"top1" :
"key1" : scalar1
'top2' :
'key2' : scalar2
#---
#"top1" :
# "key1" : scalar1
#'top2' :
# 'key2' : scalar2
---
'x2': {'y': z}
---
'x3':
'y': z
---
x4:
'y': z
---
'x5':
'y': z
---
x6:
'y': z
#---
#'x7' : [
# 'y' : z,
# ]
#---
#"x8" :
# "y" : value,
# "x" : value
#"y" :
# "y" : value,
# "x" : value
)";
test_check_emit_check(yaml, [](Tree const &t){
size_t doc = 0;
EXPECT_TRUE(t.docref(doc)["top1"].is_key_quoted());
EXPECT_TRUE(t.docref(doc)["top2"].is_key_quoted());
EXPECT_TRUE(t.docref(doc)["top1"]["key1"].is_key_quoted());
EXPECT_TRUE(t.docref(doc)["top2"]["key2"].is_key_quoted());
//++doc;
//EXPECT_TRUE(t.docref(doc)["top1"].is_key_quoted());
//EXPECT_TRUE(t.docref(doc)["top2"].is_key_quoted());
//EXPECT_TRUE(t.docref(doc)["top1"]["key1"].is_key_quoted());
//EXPECT_TRUE(t.docref(doc)["top2"]["key2"].is_key_quoted());
++doc;
EXPECT_TRUE(t.docref(doc)["x2"].is_key_quoted());
EXPECT_TRUE(t.docref(doc)["x2"]["y"].is_key_quoted());
++doc;
EXPECT_TRUE(t.docref(doc)["x3"].is_key_quoted());
EXPECT_TRUE(t.docref(doc)["x3"]["y"].is_key_quoted());
++doc;
EXPECT_FALSE(t.docref(doc)["x4"].is_key_quoted());
EXPECT_TRUE(t.docref(doc)["x4"]["y"].is_key_quoted());
++doc;
EXPECT_TRUE(t.docref(doc)["x5"].is_key_quoted());
EXPECT_TRUE(t.docref(doc)["y"].is_key_quoted());
++doc;
EXPECT_FALSE(t.docref(doc)["x6"].is_key_quoted());
EXPECT_TRUE(t.docref(doc)["y"].is_key_quoted());
//++doc;
//EXPECT_TRUE(t.docref(doc)["x7"].is_key_quoted());
//EXPECT_TRUE(t.docref(doc)["x7"][0]["y"].is_key_quoted());
//++doc;
//EXPECT_TRUE(t.docref(doc)["x8"].is_key_quoted());
//EXPECT_TRUE(t.docref(doc)["x8"]["y"].is_key_quoted());
//EXPECT_TRUE(t.docref(doc)["x8"]["x"].is_key_quoted());
//EXPECT_TRUE(t.docref(doc)["y"].is_key_quoted());
//EXPECT_TRUE(t.docref(doc)["y"]["y"].is_key_quoted());
//EXPECT_TRUE(t.docref(doc)["y"]["x"].is_key_quoted());
});
}
#ifdef JAVAI
void verify_error_is_reported(csubstr case_name, csubstr yaml, size_t col={})
{
SCOPED_TRACE(case_name);
SCOPED_TRACE(yaml);
Tree tree;
Location loc = {};
loc.col = col;
ExpectError::do_check(&tree, [&](){
parse_in_arena(yaml, &tree);
}, loc);
}
TEST(simple_map, no_map_key_flow)
{
verify_error_is_reported("map key", R"({ first: Sammy, last: Sosa }: foo)", 28u);
}
TEST(simple_map, no_map_key_block)
{
verify_error_is_reported("map key", R"(?
first: Sammy
last: Sosa
:
foo
)");
}
TEST(simple_map, no_seq_key_flow)
{
verify_error_is_reported("seq key", R"([Sammy, Sosa]: foo)", 28u);
}
TEST(simple_map, no_seq_key_block)
{
verify_error_is_reported("map key", R"(?
- Sammy
- Sosa
:
foo
)");
}
#endif
//-----------------------------------------------------------------------------
//-----------------------------------------------------------------------------
//-----------------------------------------------------------------------------

View File

@@ -53,6 +53,39 @@ TEST(simple_seq, many_unmatched_brackets)
}
}
TEST(simple_seq, missing_quoted_key)
{
csubstr yaml = R"(
"top1" :
["0", "1", ]
'top2' :
["0", "1", ]
---
#"top1" :
# - "0"
# - "1"
#'top2' :
# - "0"
# - "1"
)";
test_check_emit_check(yaml, [](Tree const &t){
size_t doc = 0;
EXPECT_TRUE(t.docref(doc)["top1"].is_key_quoted());
EXPECT_TRUE(t.docref(doc)["top2"].is_key_quoted());
EXPECT_TRUE(t.docref(doc)["top1"][0].is_val_quoted());
EXPECT_TRUE(t.docref(doc)["top1"][1].is_val_quoted());
EXPECT_TRUE(t.docref(doc)["top2"][0].is_val_quoted());
EXPECT_TRUE(t.docref(doc)["top2"][1].is_val_quoted());
//++doc;
//EXPECT_TRUE(t.docref(doc)["top1"].is_key_quoted());
//EXPECT_TRUE(t.docref(doc)["top2"].is_key_quoted());
//EXPECT_TRUE(t.docref(doc)["top1"][0].is_val_quoted());
//EXPECT_TRUE(t.docref(doc)["top1"][1].is_val_quoted());
//EXPECT_TRUE(t.docref(doc)["top2"][0].is_val_quoted());
//EXPECT_TRUE(t.docref(doc)["top2"][1].is_val_quoted());
});
}
//-----------------------------------------------------------------------------
//-----------------------------------------------------------------------------

View File

@@ -3,7 +3,6 @@
namespace c4 {
namespace yml {
TEST(double_quoted, test_suite_KSS4)
{
csubstr yaml = R"(

View File

@@ -39,7 +39,7 @@ int main(int argc, const char *argv[])
emit_events(&evt, tree);
std::fwrite(evt.data(), 1, evt.size(), stdout);
}
catch(const std::exception& e)
catch(std::exception const&)
{
return 1;
}