Skip to main content
edited title
Link
muru
  • 78.4k
  • 16
  • 214
  • 320

What is wrong with myusing "\t" to grep for tab-separated values?

Became Hot Network Question
+env
Source Link
TGar
  • 307
  • 1
  • 3
  • 11

I have a .tsv file (values separated by tabs) with four values. So each line should have only three tabs and some text around each tab like this:

value   value2  value3  value4

But it looks that some lines are broken (there is more than three tabs). I need to find out these lines.


I came up with following grep pattern.

grep -v "^[^\t]+\t[^\t]+\t[^\t]+\t[^\t]+$"

My thinking:

  • first ^ matches the beggining
  • [^\t]+ matches more than one "no tab character"
  • \t matches single tab character
  • $ matches end

And than I just put it into right order with correct number of times. That should match correct lines. So I reverted it by -v option to get the wrong lines.

But with the -v option it matches any line in the file and also some random text I tried that don't have any tabs inside.

What is my mistake please?

EDIT: I am using debian and bash.

I have a .tsv file (values separated by tabs) with four values. So each line should have only three tabs and some text around each tab like this:

value   value2  value3  value4

But it looks that some lines are broken (there is more than three tabs). I need to find out these lines.


I came up with following grep pattern.

grep -v "^[^\t]+\t[^\t]+\t[^\t]+\t[^\t]+$"

My thinking:

  • first ^ matches the beggining
  • [^\t]+ matches more than one "no tab character"
  • \t matches single tab character
  • $ matches end

And than I just put it into right order with correct number of times. That should match correct lines. So I reverted it by -v option to get the wrong lines.

But with the -v option it matches any line in the file and also some random text I tried that don't have any tabs inside.

What is my mistake please?

I have a .tsv file (values separated by tabs) with four values. So each line should have only three tabs and some text around each tab like this:

value   value2  value3  value4

But it looks that some lines are broken (there is more than three tabs). I need to find out these lines.


I came up with following grep pattern.

grep -v "^[^\t]+\t[^\t]+\t[^\t]+\t[^\t]+$"

My thinking:

  • first ^ matches the beggining
  • [^\t]+ matches more than one "no tab character"
  • \t matches single tab character
  • $ matches end

And than I just put it into right order with correct number of times. That should match correct lines. So I reverted it by -v option to get the wrong lines.

But with the -v option it matches any line in the file and also some random text I tried that don't have any tabs inside.

What is my mistake please?

EDIT: I am using debian and bash.

added 32 characters in body
Source Link
TGar
  • 307
  • 1
  • 3
  • 11

I have a .tsv file (values separated by tabs) with four values. So each line should have only three tabs and some text around each tab like this:

value   value2  value3  value4

But it looks that some lines are broken (there is more than three tabs). I need to find out these lines.


I came up with following grep pattern.

grep -v "^[^\t]+\t[^\t]+\t[^\t]+\t[^\t]+$"

My thinking:

  • first ^ matches the beggining
  • [^\t]+ matches more than one "no tab character"
  • \t matches single tab character
  • $ matches end

And than I just put it into right order with correct number of times. That should match correct lines. So I reverted it by -v option to get the wrong lines.

But with the -v option it matches any line in the file and also some random text I tried that don't have any tabs inside.

What is my mistake please?

I have a .tsv file (values separated by tabs) with four values. So each line should have only three tabs and some text around each tab like this:

value   value2  value3  value4

But it looks that some lines are broken (there is more than three tabs).


I came up with following grep pattern.

grep -v "^[^\t]+\t[^\t]+\t[^\t]+\t[^\t]+$"

My thinking:

  • first ^ matches the beggining
  • [^\t]+ matches more than one "no tab character"
  • \t matches single tab character
  • $ matches end

And than I just put it into right order with correct number of times. That should match correct lines. So I reverted it by -v option to get the wrong lines.

But with the -v option it matches any line in the file and also some random text I tried that don't have any tabs inside.

What is my mistake please?

I have a .tsv file (values separated by tabs) with four values. So each line should have only three tabs and some text around each tab like this:

value   value2  value3  value4

But it looks that some lines are broken (there is more than three tabs). I need to find out these lines.


I came up with following grep pattern.

grep -v "^[^\t]+\t[^\t]+\t[^\t]+\t[^\t]+$"

My thinking:

  • first ^ matches the beggining
  • [^\t]+ matches more than one "no tab character"
  • \t matches single tab character
  • $ matches end

And than I just put it into right order with correct number of times. That should match correct lines. So I reverted it by -v option to get the wrong lines.

But with the -v option it matches any line in the file and also some random text I tried that don't have any tabs inside.

What is my mistake please?

Source Link
TGar
  • 307
  • 1
  • 3
  • 11
Loading