8

I'm building an entity highlighter so I can upload a text file, view the contents on the screen, then highlight words that are in an array. This is array is populated by the user when they manually highlight a selection e.g...

const entities = ['John Smith', 'Apple', 'some other word'];

This is my text document that is displayed on the screen. It contains a lot of text, and some of this text needs to be visually highlighted to the user once they manually highlight some text, like the name John Smith, Apple and some other word

Now I want to visually highlight all instances of the entity in the text by wrapping it in some markup, and doing something like this works perfectly:

getFormattedText() {
    const paragraphs = this.props.text.split(/\n/);
    const { entities } = this.props;

    return paragraphs.map((p) => {
        let entityWrapped = p;

        entities.forEach((text) => {
        const re = new RegExp(`${text}`, 'g');
        entityWrapped =
            entityWrapped.replace(re, `<em>${text}</em>`);
        });

        return `<p>${entityWrapped}</p>`;
    }).toString().replace(/<\/p>,/g, '</p>');
}

...however(!), this just gives me a big string so I have to dangerously set the inner HTML, and therefor I can't then attach an onClick event 'the React way' on any of these highlighted entities, which is something I need to do.

The React way of doing this would be to return an array that looks something like this:

['This is my text document that is displayed on the screen. It contains a lot of text, and some of this text needs to be visually highlighted to the user, like the name', {}, {}, {}] Where the {} are the React Objects containing the JSX stuff.

I've had a stab at this with a few nested loops, but it's buggy as hell, difficult to read and as I'm incrementally adding more entities the performance takes a huge hit.

So, my question is... what's the best way to solve this issue? Ensuring code is simple and readable, and that we don't get huge performance issues, as I'm potentially dealing with documents which are very long. Is this the time that I let go of my React morals and dangerouslySetInnerHTML, along with events bound directly to the DOM?

Update

@AndriciCezar's answer below does a perfect job of formatting the array of Strings and Objects ready for React to render, however it's not very performant once the array of entities is large (>100) and the body of text is also large (>100kb). We're looking at about 10x longer to render this as an array V's a string.

Does anyone know a better way to do this that gives the speed of rendering a large string but the flexibility of being able to attach React events on the elements? Or is dangerouslySetInnerHTML the best solution in this scenario?

4
  • It would help people answer the question if you added a runnable minimal reproducible example using Stack Snippets (the [<>] toolbar button) showing a structure you want to add the text to, where the text comes from, etc. Stack Snippets support React, including JSX; here's how to do one. Commented May 12, 2017 at 8:22
  • DanV, Do you need a better response to your problem? Maybe I have understood wrong what you have asked? Commented May 14, 2017 at 20:20
  • Hey @AndriciCezar your answer looks great, I've just not had the time to put it into action. Thanks btw! Commented May 15, 2017 at 7:05
  • DanV what do you think about my updated answer? Commented May 23, 2017 at 21:16

3 Answers 3

5

Here's a solution that uses a regex to split the string on each keyword. You could make this simpler if you don't need it to be case insensitive or highlight keywords that are multiple words.

import React from 'react';

const input = 'This is a test. And this is another test.';
const keywords = ['this', 'another test'];

export default class Highlighter extends React.PureComponent {
    highlight(input, regexes) {
        if (!regexes.length) {
            return input;
        }
        let split = input.split(regexes[0]);
        // Only needed if matches are case insensitive and we need to preserve the
        // case of the original match
        let replacements = input.match(regexes[0]);
        let result = [];
        for (let i = 0; i < split.length - 1; i++) {
            result.push(this.highlight(split[i], regexes.slice(1)));
            result.push(<em>{replacements[i]}</em>);
        }
        result.push(this.highlight(split[split.length - 1], regexes.slice(1)));
        return result;
    }
    render() {
        let regexes = keywords.map(word => new RegExp(`\\b${word}\\b`, 'ig'));
        return (
            <div>
                { this.highlight(input, regexes) }
            </div>);
    }
}
Sign up to request clarification or add additional context in comments.

5 Comments

Thanks for the code, and this looks to be a bit more efficient than @AndriciCezar's answer. However it causes an infinite loop for me: jsfiddle.net/69z2wepo/78947 and also doesn't address the rendering problem. Generating the Array even with 10,000s of items isn't overly expensive, it's the rendering that is.
@DanV whoops, fixed a mistake in an edit and updated the fiddle at jsfiddle.net/69z2wepo/78956
@DanV It would help if you included the sample input with which you are having performance issues.
This feels quite snappy, although I'm struggling to measure the rendering time as for some reason componentDidUpdate isn't firing in the fiddle. I've updated it with some example data jsfiddle.net/69z2wepo/79008
@DanV you had the time and timeEnd calls the wrong way round. I've switched them in jsfiddle.net/69z2wepo/79009. It gets between 675-927ms for me.
5
+50

Have you tried something like this?

The complexity is number of paragraphs * number of keywords. For a paragraph of 22,273 words (121,104 characters) and 3 keywords, it takes 44ms on my PC to generate the array.

!!! UPDATE: I think this is the clearest and efficientest way to highlight the keywords. I used James Brierley's answer to optimize it.

I tested on 320kb of data with 500 keywords and it loads pretty slow. Another idea it will be to render the paragraphs progressive. Render first 10 paragraphs, and after that, at scroll or after some time, render the rest.

And a JS Fiddle with your example: https://jsfiddle.net/69z2wepo/79047/

const Term = ({ children }) => (
  <em style={{backgroundColor: "red"}} onClick={() => alert(children)}>
    {children}
  </em>
);

const Paragraph = ({ paragraph, keywords }) => {
  let keyCount = 0;
  console.time("Measure paragraph");

  let myregex = keywords.join('\\b|\\b');
  let splits = paragraph.split(new RegExp(`\\b${myregex}\\b`, 'ig'));
  let matches = paragraph.match(new RegExp(`\\b${myregex}\\b`, 'ig'));
  let result = [];

  for (let i = 0; i < splits.length; ++i) {
    result.push(splits[i]);
    if (i < splits.length - 1)
      result.push(<Term key={++keyCount}>{matches[i]}</Term>);
  }

  console.timeEnd("Measure paragraph");

  return (
    <p>{result}</p>
  );
};


const FormattedText = ({ paragraphs, keywords }) => {
    console.time("Measure");

    const result = paragraphs.map((paragraph, index) =>
      <Paragraph key={index} paragraph={paragraph} keywords={keywords} /> );

    console.timeEnd("Measure");
    return (
      <div>
        {result}
      </div>
    );
};

const paragraphs = ["Lorem ipsum dolor sit amet, consectetur adipiscing elit. Nulla ornare tellus scelerisque nunc feugiat, sed posuere enim congue. Vestibulum efficitur, erat sit amet aliquam lacinia, urna lorem vehicula lectus, sit amet ullamcorper ex metus vitae mi. Sed ullamcorper varius congue. Morbi sollicitudin est magna. Pellentesque sodales interdum convallis. Vivamus urna lectus, porta eget elit in, laoreet feugiat augue. Quisque dignissim sed sapien quis sollicitudin. Curabitur vehicula, ex eu tincidunt condimentum, sapien elit consequat enim, at suscipit massa velit quis nibh. Suspendisse et ipsum in sem fermentum gravida. Nulla facilisi. Vestibulum nisl augue, efficitur sit amet dapibus nec, convallis nec velit. Nunc accumsan odio eu elit pretium, quis consectetur lacus varius"];
const keywords = ["Lorem Ipsum"];

class App extends React.Component {
  constructor(props) {
    super(props);

    this.state = {
      limitParagraphs: 10
    };
  }

  componentDidMount() {
    setTimeout(
      () =>
        this.setState({
          limitParagraphs: 200
        }),
      1000
    );
  }

  render() {
    return (
      <FormattedText paragraphs={paragraphs.slice(0, this.state.limitParagraphs)} keywords={keywords} />
    );
  }
}

ReactDOM.render(
  <App />, 
  document.getElementById("root"));
<script src="https://cdn.jsdelivr.net/lodash/4.17.4/lodash.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/react/15.1.0/react.min.js"></script>
<script src="https://cdnjs.cloudflare.com/ajax/libs/react/15.1.0/react-dom.min.js"></script>

<div id="root">
</div>

8 Comments

This works well - thank you! Be great if it didn't depend on Lodash though ;)
You can create a polyfill for flatten and you will not depend on lodash. It was much simpler to create the snippet with lodash :)
Hey @AndriciCezar after testing this with real data I'm realising your answer is only half the solution. I'm using a 150kb text file with approx 100 matched entities, and although generating the array isn't overly expensive, the total time it takes to generate the array and update the DOM is coming in at about 1.5second in Chrome on a half decent MBP, compared to 200ms if I just generate a string and dangerouslySetInnerHTML.
Then it means that the rendering is taking the most time. After the first render, how much is taking if you add one more paragraph?
I want to know if the creation of the Virtual Dom is taking time or just the rendering.
|
2

The first thing I did was split the paragraph into an array of words.

const words = paragraph.split( ' ' );

Then I mapped the words array to a bunch of <span> tags. This allows me to attach onDoubleClick events to each word.

return (
  <div>
    {
      words.map( ( word ) => {
        return (
          <span key={ uuid() }
                onDoubleClick={ () => this.highlightSelected() }>
                {
                  this.checkHighlighted( word ) ?
                  <em>{ word } </em>
                  :
                  <span>{ word } </span>
                }
          </span>
        )
      })
    }
  </div>
);

So if a word is double clicked, I fire the this.highlightSelected() function and then as I conditionally render the word based on whether or not it is highlighted.

highlightSelected() {

    const selected = window.getSelection();
    const { data } = selected.baseNode;

    const formattedWord = this.formatWord( word );
    let { entities } = this.state;

    if( entities.indexOf( formattedWord ) !== -1 ) {
      entities = entities.filter( ( entity ) => {
        return entity !== formattedWord;
      });
    } else {
      entities.push( formattedWord );
    }  

    this.setState({ entities: entities });
}

All I am doing here is either removing or pushing the word to a an array in my component's state. checkHighlighted() will just check if the word being rendered exists in that array.

checkHighlighted( word ) {

    const formattedWord = this.formatWord( word );

    if( this.state.entities.indexOf( formattedWord ) !== -1 ) {
      return true;
    }
    return false;
  }

And finally, the formatWord() function is simply removing any periods or commas and making everything lower case.

formatWord( word ) {
    return word.replace(/([a-z]+)[.,]/ig, '$1').toLowerCase();
}

Hope this helps!

1 Comment

Don't think this will work for entities that contain multiple words, i.e. 'John Smith'

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.