So I have a list of words, like 50,000 of them, and I want to remove certain numbers and letters from them. Specifically, I want to remove anything that has a number from 0-99 followed by either an E or Z, so for example: 4E, 11Z, 11E, 20Z, etc
The words that I want to remove them from look like this:-
- 6S,9,12S-trimethyl-2E,4E,8E,10E-tetradecatetraenoic acid
- 7Z,14Z-eicosadienoic acid
- 13,17,21,25-tetramethyl-5Z-hexacosenoic acid
- CDP-DG(18:1(11Z)/22:6(4Z,7Z,10Z,13Z,16Z,19Z))
- PC(20:4(5Z,8Z,11Z,14Z)/17:2(9Z,12Z))
As you can see the thing I want to remove appears in different ways in the words (as in within a bracket or after a hyphen etc). So far, I've done:
public class EZConfig {
public static void main(String[] args) throws IOException{
BufferedReader br = new BufferedReader(new FileReader("C:/Users/colles-a-l-kxc127/Dropbox/PhD/Java/MetabolitesCompiled/src/commonNames"));
try {
StringBuilder sb = new StringBuilder();
String line = br.readLine();
while (line != null) {
if(line.contains("[0-99][E|Z]")){
System.out.println(line + " TRUE");
}
else{
System.out.println(line);
}
line = br.readLine();
}
} finally {
br.close();
}
}
}
Just to see if I can pick up the number/E or Z annotations but I can't seem. I need to basically script something that will remove all those annotations from my list of words. Anyone know what I can do in order to achieve this?
[0-99]doesn't match any number between 0 and 99. It matches any digit, then 9, if I recall correctly, but the syntax you're looking for is[0-9]+, which will match one or more digits in a row.