0

I'm trying the extract the ProductValue from the following bit of Javascript:

<script language="javascript" type="text/javascript">
lpAddVars('page','Section','womens');
lpAddVars('page','CartTotal','0.00');

    lpAddVars('page','ProductID','43577');
    lpAddVars('page','ProductValue','128.00');  

</script>

I don't think Beautiful Soup parses javascript so I think the best way to do this may be to use a regular expression, but I'm very new to re and so far nothing I've tried seems to work. Any advice or help on how to accomplish this?

Thanks!

2 Answers 2

1

This should work:

import re

javascript_text = '''
    <script language="javascript" type="text/javascript">
    lpAddVars('page','Section','womens');
    lpAddVars('page','CartTotal','0.00');

        lpAddVars('page','ProductID','43577');
        lpAddVars('page','ProductValue','128.00');  

    </script>
'''

product_value = re.findall(r"ProductValue.*,['|\"](.*)['|\"]", javascript_text)

# at this point, product_value = ['128.00']

So what is "ProductValue.*,'|\"['|\"]" even doing?

"ProductValue.*,'|\"['|\"]"

ProductValue -- just a literal string that you're searching for

.* -- we want any amount of characters, so spaces, single quotes, whatever

, -- we'll stop allowing ".*" to match on all characters once we reach the ","

['|\"] -- we want to match either a single quote or a double quote

(.*) -- this is the bit we're actually interested in, which can be any characters

['|\"] -- again, we'll stop the ".*" once we reach a closing single or double quote

From this point on, I would do something like:

product_values = []
for value in product_value:
    value = value.strip() # get rid of any excess whitespace
    value = float(value) # ProductValue appears to be a float of some sort
    product_values.append(value) # store the value
Sign up to request clarification or add additional context in comments.

1 Comment

thanks, this works great! the quality of the answers on SO never ceases to amaze me :)
0
/'ProductValue'\s*,\s*(.*?)\s\)/

1 Comment

thanks! i actually know so little about regular expressions that i'm not even sure how to implement this. re.search("/'ProductValue'\s*,\s*(.*?)\s\)/", html) ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.