0

I need to search for fields and values within a text and turn them into an object.

Example of text

// condition 1
<@if VERSION = "A1" || VERSION = "A3">

<@assign CTA = "blue">
<@assign CTA2 = "green">
<@assign TEXT1 = "Hello<br/>World">

<@elseif VERSION = "A2">

<@assign CTA = "red">
<@assign CTA2 = "yellow">
<@assign CTA3 = "brown">
<@assign TEXT1 = "Click <a href='https://example.com' style='text-decoration:none;color:#000000;'>here</a>">

<@else>

<@assign CTA = "black">
<@assign CTA2 = "white">
<@assign CTA3 = "pink">

</@if>

// condition 2
<@if VERSION = "A4" || VERSION = "A5">

<@assign CTA = "purple">
<@assign CTA2 = "orange">
<@assign TEXT1 = "Hi <span style='font-weight:bold;'>John</span>">

</@if>


// condition 3
<@if LANG = "en_US">

<@assign TITLE = "English">

<@else>

<@assign TITLE = "French">

</@if>

If the condition contains "@assign" must construct an object

code I'm trying

jsonObj = [];

var hidden_text = html_c.replace(/<@IF[\s\S]*?<\/@IF>/gi, function(i) {
  i = i.replace(/<@IF[\s\S]*?>/gi, function(k) {
    var $ogg;
    item = {}

    k = k.replace(/(^(?!.*@IF)|(?<=@IF)).*?((?=\=))/gi, function(x) {
      x = x.replace(/^\s+|\s+$|\s+(?=\s)/g, "");
      item[x] = [];
      $ogg = x;
      return x;
    });
    jsonObj.push(item);

    item2 = {}
    k = k.replace(/"[\s\S]*?"/gi, function(y) {
      item2[y] = [];
      return y;
    });
    item[$ogg].push(item2);

    return k;
  });

  return i;
});

console.log(jsonObj);
<script>
  const html_c = `
<@if VERSION = "A1" || VERSION = "A3">

<@assign CTA = "blue">
<@assign CTA2 = "green">
<@assign TEXT1 = "Hello<br/>World">

<@elseif VERSION = "A2">

<@assign CTA = "red">
<@assign CTA2 = "yellow">
<@assign CTA3 = "brown">
<@assign TEXT1 = "Click <a href='https://example.com' 
style='text-decoration:none;color:#000000;'>here</a>">

</@if>

// condition 2
<@if VERSION = "A4" || VERSION = "A5">

<@assign CTA = "purple">
<@assign CTA2 = "orange">
<@assign TEXT1 = "Hi <span style='font- 
weight:bold;'>John</span>">

</@if>

// condition 3
<@if LANG = "en_US">

<@assign TITLE = "English">

<@else>

<@assign TITLE = "French">

</@if>
`;
</script>

With this code I can create the first part of the object but I don't know how to go about it. moreover, if there are more than one condition with the same field name (e.g. VERSION) a new object is created, while I would like to make it go and update the existing one.

the result I want to get is this, considering:

  1. "VERSION" could have any other name, the script must take the name it finds.

  2. the values of the <@assign> variables may contain some html code

  3. In the if and elseif conditions there could also be the double operator ==

case of the first condition

[{
  "VERSION": [{
    "A1": [{
      "CTA": "blue",
      "CTA2": "green",
      "TEXT1": "Hello<br/>World",
    }, ],
    "A3": [{
      "CTA": "blue",
      "CTA2": "green",
      "TEXT1": "Hello<br/>World",
    }, ],
    "A2": [{
      "CTA": "red",
      "CTA2": "yellow",
      "CTA3": "brown",
      "TEXT1": "Click <a href='https://example.com' style='text- 
      decoration:none;color:#000000;'>here</a>",
    }, ],
    "ELSE": [{
      "CTA": "black",
      "CTA2": "white",
      "CTA3": "pink",
    }, ],
  }]
}, ]

after, if there is another condition that contains '@assign' the object must be updated:

in the case of condition 2, the field 'VERSION' already exists within the object so it will have to update by adding the values found

"A4": [{
  "CTA": "purple",
  "CTA2": "orange",
  "TEXT1": "Hi <span style='font- 
weight:bold;'>John</span>",
    }, ],
"A5": [{
  "CTA": "purple",
  "CTA2": "orange",
  "TEXT1": "Hi <span style='font- 
weight:bold;'>John</span>",
    }, ],

in the case of condition 3, the LANG field does not exist in the object and therefore will have to be created

"LANG": [{
"en_US": [{
  "TITLE": "English",
    }, ],
  }]

the variables declared in the possible <@else> will go to update the already existing object "ELSE"

Final object

        [{
  "VERSION": [{
    "A1": [{
      "CTA": "blue",
      "CTA2": "green",
      "TEXT1": "Hello<br/>World",
    }, ],
    "A3": [{
      "CTA": "blue",
      "CTA2": "green",
      "TEXT1": "Hello<br/>World",
    }, ],
    "A2": [{
      "CTA": "red",
      "CTA2": "yellow",
      "CTA3": "brown",
      "TEXT1": "Click <a href='https://example.com' style='text- 
      decoration:none;color:#000000;'>here</a>",
    }, ],
    "A4": [{
      "CTA": "purple",
      "CTA2": "orange",
      "TEXT1": "Hi <span style='font- 
    weight:bold;'>John</span>",
        }, ],
    "A5": [{
      "CTA": "purple",
      "CTA2": "orange",
      "TEXT1": "Hi <span style='font- 
    weight:bold;'>John</span>",
        }, ],
    "ELSE": [{
      "CTA": "black",
      "CTA2": "white",
      "CTA3": "pink",
    }, ],
      }],
   "LANG": [{
    "en_US": [{
      "TITLE": "English",
        }, ],
   "ELSE": [{
      "TITLE": "French",
    }, ],
      }],
    }, ]

UPDATE

Condition 4

If a new condition calls up an existing field and value pair, the object must update. For example:

<@if VERSION = "A1">

<@assign CTA = "black">
<@assign TEXT2 = "Hello world">

</@if>

VERSION: "A1" has already been created in the object, so it needs to be updated:

  • CTA is already present within it, so the value will be replaced with "black"
  • TEXT2 was not yet present, so it will be added
15
  • Your top-level callback function doesn't return anything, so hidden_text will replace all the matches with undefined. Commented Feb 20, 2024 at 21:40
  • If you don't need to replace anything, use .match(/regexp/gi).forEach() to iterate over the matches. Commented Feb 20, 2024 at 21:41
  • 2
    Your repeated reuse of the variable i makes this code really hard to follow. Commented Feb 20, 2024 at 21:44
  • 1
    In addition, a viable solution should not be based on regex alone. I would choose an approach which uses regex in order to create a parsable custom markup (e.g. remove any @ from the custom tags and ensure always closed <assign></assign> tags) where one then can parse an html document from. Querying values from a parsed DOM with element and attribute nodes is easier and more reliable. Commented Feb 21, 2024 at 8:17
  • 1
    @Vale46 ... Regarding ... "by closing the </assign> tag there would be a conflict in case the value contains html code." ... of cause not, because whoever provides the html-markup as an attribute value is responsible of escaping the markup properly. But again, please provide example code to your just described scenario. Commented Feb 21, 2024 at 14:22

1 Answer 1

1

From one of my above comments ...

In addition, a viable solution should not be based on regex alone. I would choose an approach which uses regex in order to create a parsable custom markup (e.g. remove any @ from the custom tags and ensure always closed <assign></assign> tags) where one then can parse an html document from. Querying values from a parsed DOM with element and attribute nodes is easier and more reliable.

In order to create parsable markup one needs to ...

  • remove any @ character from the provided markup's custom tags.

    '<@assign CTA3="brown"></@if>'.replace(/(<\/?)@(?=if|elseif|assign)/g, '$1')the provided markup's custom tags.
    
  • replace any custom <assign ...> tag with a closed version of itself.

    '<assign CTA="blue"><assign CTA2="green">'.replace(/<assign.*?>/g, '$&</assign>')
    
  • replace any VERSION attribute with a unique 'version' related name, here with a suffix which uses the index of each matched VERSION attribute.

    '<if VERSION = "A1" || VERSION = "A3">'.replace(
      /version\s*=\s*"([^"]+)"(?:\s*\|\|)?/gi,
      (match, capture, idx) => `version-${ idx }="${ capture }"`
    )
    

The result of the above described steps can be passed to a DOMParser's parseFromString method in order to create e.g. an HTML document.

Such a DOM can be regularly queried, for instance by querySelectorAll. If one spreads the retrieved node-lists into arrays, one programmatically can create and aggregate the target data structure via nested reduced based passages.

const markup = `
<@if VERSION = "A1" || VERSION = "A3">

<@assign CTA = "blue">
<@assign CTA2 = "green">

<@elseif VERSION = "A2">

<@assign CTA = "red">
<@assign CTA2 = "yellow">
<@assign CTA3 = "brown">

</@if>`;

const parsableMarkup = createParsableMarkup(markup);
const versionData = parseVersionData(parsableMarkup);

document.querySelector('textarea').value = parsableMarkup;
console.log(versionData);
body { margin: 0; }
.as-console-wrapper { left: auto!important; top: 0; width: 50%; min-height: 100%!important; }
<script>
function createParsableMarkup(markup) {
  return markup

    // see ... [https://regex101.com/r/rxpfho/1]
    .replace(/(<\/?)@(?=if|elseif|assign)/g, '$1')

    // see ... [https://regex101.com/r/rxpfho/2]
    .replace(/<assign.*?>/g, '$&</assign>')

    // see ... [https://regex101.com/r/rxpfho/3]
    .replace(
      /version\s*=\s*"([^"]+)"(?:\s*\|\|)?/gi,
      (match, capture, idx) => `version-${ idx }="${ capture }"`
    )
    .trim();
}

function aggregateAssignmentData(result, assignmentNode) {
  return assignmentNode
    .getAttributeNames()
    .reduce((data, name) => ({

      ...data,
      [ name.toUpperCase() ]: assignmentNode.getAttribute(name),

    }), result);
}

function parseVersionData(markup) {
  const htmlDoc = (new DOMParser)
    .parseFromString(markup, 'text/html');

  return {
    VERSION: [
      ...htmlDoc.querySelectorAll('if, elseif')
    ]
    .reduce((result, versionNode) => {

      const assignmentData = [
        ...versionNode.querySelectorAll(':scope > assign')
      ]
      .reduce(aggregateAssignmentData, {})

      versionNode
        .getAttributeNames()
        .filter(name => name.startsWith('version-'))
        .reduce((versionData, versionKey) => {

          const versionName = versionNode.getAttribute(versionKey);

          // - create the version specific data-entry.
          // - since different versions can initially
          //   share one and the same configuration,
          //   assign a copy of the before created
          //   assignment-data to each such version.
          versionData[versionName] = { ...assignmentData };

          return versionData;          

        }, result);

      return result;        

    }, {}),
  };
}
</script>

<textarea cols="34" rows="12"></textarea>

Edit ... which targets/covers all the additional requirements the OP came up with at a later point.

In order to fulfill the additional and, at one hand, more generic requirements regarding unknown attribute-names, but also more restricting ones regarding attribute-values, where the latter can contain HTML markup, the above approach has to be changed in terms of ...

  • 1) how a parsable html markup can be still achieved,
  • 2) how the meta target-structure can be created from unknown attribute names of the conditional custom tags <@if ...> and <@elseif ...>.

Regarding 1), the regex patterns from the above posted first example code not only need to be adapted but new ones have to be utilized as well.

One would start with targeting every attribute-name and attribute-value pair of the originally provided custom markup. The used regex ... (?<name>[\p{L}\p{N}_-]+)\s*=\s*"(?<value>.*?)(?<!\\)"/gus ... does even match line breaks within a value's content. The string replacement does fix/sanitize each attribute's name-value assignment by removing unnecessary white spaces and line breaks, but most importantly by escaping the value sequence via encodeURI, thus enabling further regex based parsing in the first place ...

// see ... [https://regex101.com/r/gI0egU/1]
markup.replace(
  /(?<name>[\p{L}\p{N}_-]+)\s*=\s*"(?<value>.*?)(?<!\\)"/gus,
  (match, name, value) =>
    `${ name }="${ encodeURI(value.replace(/\n+/, '')) }"`
)

The next one, we do already know ... it's /(<\/?)@(?=if|elseif|assign)/g which, used with the correct replacement, removes any @ character from the provided markup's custom tags ...

// see ... [https://regex101.com/r/gI0egU/2]
markup.replace(/(<\/?)@(?=if|elseif|assign)/g, '$1')

Third, one does assure the correct closing for every <assign ...> tag ... /<assign.*?>/g ...

// see ... [https://regex101.com/r/gI0egU/3]
markup.replace(/<assign.*?>/g, '$&</assign>')

Last, one provides a number based suffix to each attribute name of the conditional <if ...> and <elseif ...> tags ... /(?:(if\s+)|\|\|\s*)(?<attrName>[\p{L}\p{N}_-]+)(?==")/gu ...

// see ... [https://regex101.com/r/gI0egU/5]
markup.replace(
  /(?:(if\s+)|\|\|\s*)(?<attrName>[\p{L}\p{N}_-]+)(?==")/gu,
  (match, $1, attrName, idx) => `${ $1 ?? '' } ${ attrName }-${ idx }`
)

... which is necessary in order to guarantee only unique attribute names. This specially treated attribute names will be transformed back into their normal/initial form when the data structure gets parsed from the HTML-document.

Regarding 2), one does need the help of yet another regex ... /^([\p{L}\p{N}_-]+)-\d+$/u ...

// see ... [https://regex101.com/r/gI0egU/6]
const regXInitialAttrName = /^([\p{L}\p{N}_-]+)-\d+$/u;

... in order to verify ...

regXInitialAttrName.test(attrName)

... and restore the mutated attribute names ...

const initialAttrName =
  (regXInitialAttrName.exec(attrName)?.at(1) ?? attrName).toUpperCase();

... of the conditional if and elseif DOM-nodes.

During parsing/aggregating the final data-structure from the HTML-document, there is still another restoration to do ... each uri-encoded value has to be decoded via decodeURI.

And the first solution's example code does finally change to the following one ...

const markup = `<@if VERSION = "A1" || VERSION = "A3">

<@assign CTA = "blue">
<@assign CTA2 = "green">
<@assign TEXT1 = "Hello<br/>World">

<@elseif VERSION = "A2">

<@assign CTA = "red">
<@assign CTA2 = "yellow">
<@assign CTA3 = "brown">
<@assign TEXT1 = "Click <a href='https://example.com' 
style='text-decoration:none;color:#000000;'>here</a>">

</@if>

// condition 2
<@if VERSION = "A4" || VERSION = "A5">

<@assign CTA = "purple">
<@assign CTA2 = "orange">
<@assign TEXT1 = "Hi <span style='font- 
weight:bold;'>John</span>">

</@if>

// condition 3
<@if LANG = "en_US">

<@assign TITLE = "English">

</@if>`;

const parsableHtmlMarkup = createParsableHtmlMarkup(markup);
const attributeData = parseAttributeData(parsableHtmlMarkup);

document.querySelector('textarea').value = parsableHtmlMarkup;
console.log(attributeData);
body { margin: 0; }
textarea { position: relative; z-index: 1; background-color: #eee; }
.as-console-row-code { background-color: #ddd; }
.as-console-wrapper { left: auto!important; top: 0; width: 41%; min-height: 100%!important; }
<script>
function createParsableHtmlMarkup(markup) {
  return markup

    // see ... [https://regex101.com/r/gI0egU/1]
    .replace(
      /(?<name>[\p{L}\p{N}_-]+)\s*=\s*"(?<value>.*?)(?<!\\)"/gus,
      (match, name, value) =>
        `${ name }="${ encodeURI(value.replace(/\n+/, '')) }"`
    )
    // see ... [https://regex101.com/r/gI0egU/2]
    .replace(/(<\/?)@(?=if|elseif|assign)/g, '$1')

    // see ... [https://regex101.com/r/gI0egU/3]
    .replace(/<assign.*?>/g, '$&</assign>')

    // see ... [https://regex101.com/r/gI0egU/5]
    .replace(
      /(?:(if\s+)|\|\|\s*)(?<attrName>[\p{L}\p{N}_-]+)(?==")/gu,
      (match, $1, attrName, idx) => `${ $1 ?? '' } ${ attrName }-${ idx }`
    )
    .trim();
}

function aggregateAssignmentData(result, assignmentNode) {
  return assignmentNode
    .getAttributeNames()
    .reduce((data, name) => ({

      ...data,
      [ name.toUpperCase() ]: decodeURI(assignmentNode.getAttribute(name)),

    }), result);
}

function parseAttributeData(markup) {
  const htmlDoc = (new DOMParser)
    .parseFromString(markup, 'text/html');

  // see ... [https://regex101.com/r/gI0egU/6]
  const regXInitialAttrName = /^([\p{L}\p{N}_-]+)-\d+$/u;

  return [
    ...htmlDoc.querySelectorAll('if, elseif')
  ]
  .reduce((result, conditionalNode) => {

    const assignmentData = [
      ...conditionalNode.querySelectorAll(':scope > assign')
    ]
    .reduce(aggregateAssignmentData, {})

    conditionalNode
      .getAttributeNames()
      .filter(attrName => regXInitialAttrName.test(attrName))
      .reduce((attrData, attrName) => {

        const initialAttrName =
          (regXInitialAttrName.exec(attrName)?.at(1) ?? attrName).toUpperCase();

        const assignmentGroup = (attrData[initialAttrName] ??= {});

        const key = conditionalNode.getAttribute(attrName);

        // - create the attribute specific data-entry.
        // - since different conditional attribute entries
        //   can initially share one and the same configuration,
        //   assign a copy of the before created assignment-data
        //   to each such data-entry.
        assignmentGroup[key] = { ...assignmentData };

        return attrData;          

      }, result);

    return result;        

  }, {});
}
</script>

<textarea cols="43" rows="12"></textarea>

Sign up to request clarification or add additional context in comments.

8 Comments

great, that's the concept of the result. there are some problems though: "VERSION" could be named in other ways too, and the script should take the name it finds. the values of the "assign" variables can also contain html code, and this script will "break" if it finds it because at the first html tag it does a replace with </assign>. how can it be modified to make it work? thank you for support
@Vale46 ... You're running again into the issue of not providing all the necessary requirements with your initial question. Please do always declare any acceptance criteria upfront. And regarding the most crucial information ... "the values of the "assign" variables can also contain html code" ... How can one omit such a code-breaking (in terms of parsers) scenario at all? Please provide additional example code which reflects all the new requirements.
@Vale46 ... Regarding the code changes according to your changed requirements, are there any questions left?
@PietroSeliger wow great job! thank you very much for your help. I've tested the code and it seems to work fine, but there would be some adjustments to make. i think i made the mistake of forgetting something again... i'm trying to study your code to get there myself but it's not easy.
What I should add is: 1. make the field recognition work even if there is the double operator (e.g. VERSION == "A1") 2. Also recognize <@else> if it is present. In this case the following would be added to the object: ELSE : { ... the @assign's } 3. If an already created field and value pair is queried again in a subsequent condition, the object must update itself by adding any new variables created.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.