for loop string each word

Question

if this type character '這' = NonEnglish each will take up 2 word space, and English will take up 1 word space, Max length limit is 10 word space; How to get the first 10 space.
for below example how to get the result This這 is?
I'm trying to use for loop from first word but I don't know how to get each word in string...

string = "This這 is是 English中文 …";

var NonEnglish = "[^\u0000-\u0080]+",
    Pattern = new RegExp(NonEnglish),
    MaxLength = 10,
    Ratio = 2;

If it's a mixed of english & non-english, cant you just remove non-english since you don't need them? then do a split after that — fedmich
– fedmich, Commented Feb 27, 2014 at 5:29
@Good.luck I need to get first 10 symbols but if there is 1 non english word will equal 2 symbol — user1775888
– user1775888, Commented Feb 27, 2014 at 5:30
@fedmich ?? the words just for example the string maybe will be th中文isisiisi — user1775888
– user1775888, Commented Feb 27, 2014 at 5:32
@user1775888 Are we supposed to use the same regex you provide or something of our own ? — HighBoots
– HighBoots, Commented Feb 27, 2014 at 5:38

Community · Accepted Answer · 2017-05-23 12:03:41Z

8

If you mean you want to get that part of the string where it's length has reached 10, here's the answer:

var string = "This這 is是 English中文 …";

function check(string){
  // Length of A-Za-z characters is 1, and other characters which OP wants is 2
  var length = i = 0, len = string.length; 

  // you can iterate over strings just as like arrays
  for(;i < len; i++){

    // if the character is what the OP wants, add 2, else 1
    length += /\u0000-\u0080/.test(string[i]) ? 2 : 1;

    // if length is >= 10, come out of loop
    if(length >= 10) break;
  }

  // return string from the first letter till the index where we aborted the for loop
  return string.substr(0, i);
}

alert(check(string));

Live Demo

EDIT 1:

Replaced .match with .test. The former returns a whole array while the latter simply returns true or false.
Improved RegEx. Since we are checking only one character, no need for ^ and + that were before.
Replaced len with string.length. Here's why.

edited May 23, 2017 at 12:03

CommunityBot

11 silver badge

answered Feb 27, 2014 at 5:29

HighBoots

2931 silver badge5 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

Mr_Green Over a year ago

is it possible to use variable i out of scope of for loop?

fedmich Over a year ago

Just be careful as this can take time long time to process because you used regex on "each character of string"

HighBoots Over a year ago

@user1775888 I agree with fedmich. Here's a video which shows how. And see my answer edit, I added some things which might increase the speed significantly with larger strings.

aroth · Accepted Answer · 2014-02-27 05:35:08Z

I'd suggest something along the following lines (assuming that you're trying to break the string up into snippets that are <= 10 bytes in length):

string = "This這 is是 English中文 …";

function byteCount(text) {
    //get the number of bytes consumed by a string
    return encodeURI(text).split(/%..|./).length - 1;
}

function tokenize(text, targetLen) {
    //break a string up into snippets that are <= to our target length
    var result = [];

    var pos = 0;
    var current = "";
    while (pos < text.length) {
        var next = current + text.charAt(pos);

        if (byteCount(next) > targetLen) {
            result.push(current);
            current = "";
            pos--;
        }
        else if (byteCount(next) == targetLen) {
            result.push(next);
            current = "";
        }
        else {
            current = next;
        }

        pos++;
    }
    if (current != "") {
       result.push(current);
    }

    return result;
};

console.log(tokenize(string, 10));

http://jsfiddle.net/5pc6L/

Collectives™ on Stack Overflow

for loop string each word

2 Answers 2

Live Demo

3 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related