0

I'm using Pycharm as the software tool to code in python.

These words are in a text format but they are supposed to return different outputs

word = "<p>Santa is fat</p>"
secondword = "Potato & Tomato"
thirdword = "Koala eats http://koala.org/ a lot</p>"

I want to replace each of the following "<" , ">" , "&" to "&lt;" , "&gt;" , "&amp;"

So the output should look like

outputword = "&lt;p&gt;Santa is fat&lt;/p&gt;"
outputsecondword = "Fish &amp; Chips"
outputthirdword = ""&lt;p&gt;Koala eats <a href='http://koala.org/'>http://koala.org/</a> a lot&lt;/p&gt;"

Notice that the third word is a URL. I dont want to use the html library. I'm a noob at Python so please provide me with simple solutions. I considered using lists but whenever I replace a character in the list, it doesn't change

2
  • 2
    Note that the HTML entities are '&gt;' and '&lt;'... Commented May 4, 2015 at 10:17
  • 2
    When you say "I considered using lists but whenever I replace a character in the list, it doesn't change", that doesn't explain what you tried well enough for anyone to explain what you did wrong. Maybe you were one typo away from getting it right; maybe you were totally on the wrong track—if you show us the code, we can tell you. Commented May 4, 2015 at 10:21

2 Answers 2

8

Python comes with batteries included:

import html

word = "<p>Santa is fat</p>"
print(html.escape(word))

Output:

&lt;p&gt;Santa is fat&lt;/p&gt;
Sign up to request clarification or add additional context in comments.

7 Comments

Upvote for not answering what he asks, but what he wants/needs.
How would you do it without using the "import html"
You wouldn't. Use the libraries Python provides. And just in case your next question is how to parse HTML with regular expressions: you don't either.
Basic replace method: s = 'abcdef' s = s.replace('e', 'g') s >> 'abcdgf'
Would it be possible to ask for a solution using a basic replace?
|
2

Without using the html library, you can do the replacements like this:

replacewith = {'<':'lt;', '>':'gt;'}
for w in replacewith:
        word = word.replace(w,replacewith[w])

In [407]: word
Out[407]: 'lt;pgt;Santa is fatlt;/pgt;'

Or, in one line:

 word.replace('<','lt;').replace('>','gt;')

Update:

You can move the code into a function and call it like this:

def replace_char(word, replacewith=replacewith):
    for w in replacewith:
            word = word.replace(w,replacewith[w])
    return word

Calling it with word like below will give you:

replace_char("<p>Santa is fat</p>")
Out[457]: 'lt;pgt;Santa is fatlt;/pgt;'

To get the second one to work, update the dictionary:

In [454]: replacewith.update({'Potato':'Fish', 'Tomato':'Chips', '&': '&amp;',})
In [455]: replace_char("Potato & Tomato", replacewith)
Out[455]: 'Fish &amp; Chips'

You can do the same for any new characters that may appear in other new strings in pretty much the same way. Your input thirdword is missing a <p> right at the beginning.

In [461]: replacewith.update({'http://koala.org/':'<a href="http://koala.org/">http://koala.org/</a>'})
In [463]: replace_char("Koala eats http://koala.org/ a lot</p>", replacewith)
Out[463]: 'Koala eats lt;a href="http://koala.org/"gt;http://koala.org/lt;/agt; a lotlt;/pgt;'

10 Comments

Why the if w in word: test? If that's supposed to be an optimization, all you're actually doing is forcing it to linearly search word twice instead of once…
Also, why replacewith.get(w) instead of just replacewith[w]?
Give me a moment, let me try this.
@abarnert Thanks for the comment. Fixed both. Unintentional "typos", both.
It is a dictionary @Manu.
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.