0

I'm having a little trouble figuring out what to do.

Basically using java I'm trying to:

  • Reading in the html from a website
  • I want to find the content after a certain string in this case being

     title="
    
  • Store that in a string.

The first and last steps are simple for me but I'm having no luck (and never had with regex).

I believe this is the beginning of what I need:

   String regex = "(?<=title=\")\\S+";
   Pattern name = Pattern.compile(regex);

After that I have no clue. Any help?

2
  • Use jsoup instead. (Here we go again...) Commented Jul 27, 2012 at 17:23
  • I suggest using some library for this (you'll get even XPath support): HttpUnit, JSoup, NekoHtml Commented Jul 27, 2012 at 17:23

2 Answers 2

1
import java.util.regex.Matcher;
import java.util.regex.Pattern;

String EXAMPLE_TEST = "......";
Pattern pattern = Pattern.compile("(?<=title=\")(\\S+)")
Matcher matcher = pattern.matcher(EXAMPLE_TEST);
while (matcher.find()) {
  System.out.println(matcher.group());
}

Note: You might consider to use regex pattern (?<=title=\")([^\"]*)

Sign up to request clarification or add additional context in comments.

2 Comments

Do we want the whole title or just until the first whitespace?
@maerics - I believe OP didn't ask to change regex pattern but (s)he wants help with rest of code to get match to variable, or so...
0
List<String> result_list = new ArrayList<String>();
Pattern p = Pattern.compile("title=\"(.*)\"");
Matcher m = p.matcher("title=\"test\"");
boolean result = m.find();

while(result)
{
    result_list.add(m.group(0));
    result = m.find();
}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.