1

I have a list of words (equivalent to about two full sentences) and I want to split it into two parts: one part containing 90% of the words and another part containing 10% of them. After that, I want to print a list of the unique words within the 10% list, lexicographically sorted. This is what I have so far:

    pos_90 = (90*len(words)) // 100 #list with 90% of the words
    pos_90 = pos_90 + 1 #I incremented the number by 1 in order to use it as an index
    pos_10 = (10*len(words)) // 100 #list with 10% of the words
    list_90 = words[:pos_90] #Creation of the 90% list
    list_10 = words[pos_10:] #Creation of the 10% list
    uniq_10 = set(list_10) #List of unique words out of the 10% list
    split_10 = uniq_10.split()
    sorted_10 = split_10.sort()
    print(sorted_10)

I get an error saying that split cannot be applied to set, so I assume my mistake must be in the last lines of code. Any idea about what I'm missing here?

7
  • 1
    What do you expect uniq_10.split() to do? Commented Oct 30, 2018 at 18:46
  • 1
    Possible duplicate of Sorting a set of values Commented Oct 30, 2018 at 18:48
  • I was thinking of separating all the words to have them sorted later, though I understand it might be redundant. In any case, the error I get doesn't have to do with that, I think Commented Oct 30, 2018 at 18:49
  • 2
    uniq_10 is already a set, split is a function you apply on string in order to make them list. Commented Oct 30, 2018 at 18:49
  • Note: As noted in this answer, ignoring your actual exception, your code has a logic error. pos_10 is an index ~10% of the way into words, so words[pos_10:] says "give me everything from 10% in through the end", which is ~90% of all the words (the last 90%). So list_90 ends up being the first ~90% of words, and list_10 ends up as the last ~90% of words. At no point do you take 10% of the words. Commented Oct 30, 2018 at 19:03

1 Answer 1

0

split only makes sense when converting from one long str to a list of the components of said str. If the input was in the form 'word1 word2 word3', yes, split would convert that str to ['word1', 'word2', 'word3'], but your input is a set, and there is no sane way to "split" a set like you seem to want; it's already a bag of separated items.

All you really need to do is convert your set back to a sorted list. Replace:

split_10 = uniq_10.split()
sorted_10 = split_10.sort()

with either:

sorted_10 = list(uniq_10)
sorted_10.sort()  # NEVER assign the result of .sort(); it's always going to be None

or the simpler one-liner that encompasses both listifying and sorting:

sorted_10 = sorted(uniq_10)  # sorted, unlike list.sort, returns a new list

The final option is generally the most Pythonic approach to converting an arbitrary iterable to list and sorting that new list, returning the result. It doesn't mutate the input, doesn't rely on the input being a specific type (set, tuple, list, it doesn't matter), and it's simpler to boot. You only use list.sort() when you already have a known list, and don't mind mutating it.

Sign up to request clarification or add additional context in comments.

3 Comments

@Austin: It's not a dupe though, given that the OP is asking why their code doesn't work, not merely "How do I do this?" It's a relevant link, but not a dupe.
@ShadowRanger this works, though I'm getting more words than just the unique words. Should I modify anything else?
@MeAll: Did you see my note about the logic error? list_10 should be initialized with either words[:pos_10] or words[pos_90:] (depending on whether it should overlap the contents of list_90 or not).

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.