0

RegEx Expression:

[Height|Length|Width|Depth]:\D*\s*(\d*\.*-*\d*)-*\D*\s*  [Height|Length|Width|Depth]:\D*\s*(\d*\.*-*\d*)-*\D*\s*[Height|Length|Width|Depth]:\D*\s*(\d*\.*-*\d)*-*\D*\s*

Input Text - JSON TEXT

{"Product Type":["Printer Cartridges"],"Product Name":["Xerox - Yellow - toner cartridge ( equivalent to: HP CB382A ) - for HP Color LaserJet CM6030 CM6040 CP6015"],"Brand":["XEROX"],"Product Long Description":["<!-- CNET Content -->Toner cartridges for HP printers from Xerox deliver brilliant image quality and excellent reliability at a low cost. Compared to the original HP toner cartridge youll get better or equal page yield pay around 25% less. Get more pay less without risk.<br><br><h3 id=detailspecs>Specifications</h3><span class=font_size3bold>General</span><br>&nbsp;<img align=absmiddle src=http://images.highspeedbackbone.net/main/gfx-blkbullet.jpg>&nbsp;&nbsp;Compatible Cartridge: &nbsp;HP CB382A<br><br><span class=font_size3bold>Consumable</span><br>&nbsp;<img align=absmiddle src=http://images.highspeedbackbone.net/main/gfx-blkbullet.jpg>&nbsp;&nbsp;Consumable Type: &nbsp;Toner cartridge<br>&nbsp;<img align=absmiddle src=http://images.highspeedbackbone.net/main/gfx-blkbullet.jpg>&nbsp;&nbsp;Printing Technology: &nbsp;Laser<br>&nbsp;<img align=absmiddle src=http://images.highspeedbackbone.net/main/gfx-blkbullet.jpg>&nbsp;&nbsp;Color: &nbsp;Yellow<br>&nbsp;<img align=absmiddle src=http://images.highspeedbackbone.net/main/gfx-blkbullet.jpg>&nbsp;&nbsp;Included Qty: &nbsp;1-pack<br>&nbsp;<img align=absmiddle src=http://images.highspeedbackbone.net/main/gfx-blkbullet.jpg>&nbsp;&nbsp;Duty Cycle: &nbsp;Up to 23500 pages at 5% coverage<br><br><span class=font_size3bold>Compatibility Information</span><br>&nbsp;<img align=absmiddle src=http://images.highspeedbackbone.net/main/gfx-blkbullet.jpg>&nbsp;&nbsp;Compatible with: &nbsp;HP Color LaserJet CM6030 MFP CM6030f MFP CM6040 MFP CM6040f MFP CP6015de CP6015dn CP6015n CP6015x CP6015xh<br><!-- END CNET Content -->"],"Item ID":["41057188"],"Product Segment":["Electronics"],"UPC":["095205855838"]}

Problem:

RegEx should check in JSON text if one of these words --> (Height or Width or Length or Depth) are there then fetch the value.

Since above given JSON text doesn't have this kind of value it should not find anything but my RegEx is finding undesirable value. I think I am missing something in RegEx.

Edit:

For this input JSON - I should be be able to extract Height, Length, Width or Depth:

{"Brand":["Concord Fans"],"Energy Guide: Appliance Labeling Rule Required":["N"],"Country of Origin: Components":["USA and/or Imported"],"Product Short Description":["Height: 6.2."],"Actual Color":["Multicolor"],"Product Segment":["Clothing, Shoes & Accessories"],"Color":["Multicolor"],"Product Name":["Concord Fans RM-08 Remote & Wall Control Set"],"Product Type":["Televisions"],"Manufacturer Part Number":["RM-08"],"Manufacturer":["Concord Fans"],"Category":["TVs"],"Product Long Description":["Height: 6-2- Width: 8-8- Length: 8-8- Energy Star: No- Energy Saver: No- UL Classification: UL Certified- UL Application: Dry SKU: CNCD467"],"GTIN":["00014592213038"],"Number of Batteries":["0"],"E-Waste Recycling Compliance Required":["N"],"UPC":["014592213038"]}
4
  • 1
    Have you tried using the json library? docs.python.org/3/library/json.html Commented Dec 14, 2015 at 3:05
  • 1
    Can you give an example of the undesirable value, and the sample JSON that returned that value? Commented Dec 14, 2015 at 3:07
  • Undesired Value I am getting on running this regex and input value is: ['1-', '', '23500'] Commented Dec 14, 2015 at 3:10
  • Edit your question with that info please. Commented Dec 14, 2015 at 5:21

2 Answers 2

1

It is not a good idea, in general, to parse JSON data with regular expressions, but you definitely have something wrong in this part of the regular expression:

[Height|Length|Width|Depth]

This would, for instance, match a single "H":

>>> re.search("[Height|Length|Width|Depth]", "H").group()
'H'

It looks like you've meant to use a non-capturing group here:

(?:Height|Length|Width|Depth)

See also:

Sign up to request clarification or add additional context in comments.

1 Comment

I have updated my post with an example that should be able to extract information. Can you please suggest.
1

It looks that your data is json compatiable- So try json module instead-Details here. After converting into json you can access using regular dictionary key as d['Product Long Description'] and thereafter you can use many way to extract any information from that - I just showed one way to go-

import json,re

s = """{"Brand":["Concord Fans"],"Energy Guide: Appliance Labeling Rule Required":["N"],"Country of Origin: Components":["USA and/or Imported"],"Product Short Description":["Height: 6.2."],"Actual Color":["Multicolor"],"Product Segment":["Clothing, Shoes & Accessories"],"Color":["Multicolor"],"Product Name":["Concord Fans RM-08 Remote & Wall Control Set"],"Product Type":["Televisions"],"Manufacturer Part Number":["RM-08"],"Manufacturer":["Concord Fans"],"Category":["TVs"],"Product Long Description":["Height: 6-2- Width: 8-8- Length: 8-8- Energy Star: No- Energy Saver: No- UL Classification: UL Certified- UL Application: Dry SKU: CNCD467"],"GTIN":["00014592213038"],"Number of Batteries":["0"],"E-Waste Recycling Compliance Required":["N"],"UPC":["014592213038"]}"""

d=json.loads(json.loads(json.dumps(s)))
print d['Product Long Description']
print ''.join(d['Product Long Description']).split(":")[0:4]
print [filter(len,y) for y in re.findall(r'Height:\s*([\d.]+-[\d.]+)|Width:\s*([\d.]+-[\d.]+)|Length:\s*([\d.]+-[\d.]+)',''.join(d['Product Long Description']))]

Output-

[u'Height: 6-2- Width: 8-8- Length: 8-8- Energy Star: No- Energy Saver: No- UL Classification: UL Certified- UL Application: Dry SKU: CNCD467']
[u'Height', u' 6-2- Width', u' 8-8- Length', u' 8-8- Energy Star']
[(u'6-2',), (u'8-8',), (u'8-8',)]

2 Comments

I didn't understand the last print statement, how did u extract. it will be helpful if u can explain briefly.
r'\w+:' means to split a string by : that have one or more word char i.e. a-z and 0-9 before.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.