4

I have three arrays as listed below:

  1. users — Contains the id of 50000 users ( all distinct )
  2. pusers — Contains the id of users who own some posts (contains repeated id's also, that is, one user can own many posts) [ 50000 values]
  3. score — Contains the score corresponding to each value in pusers.[ 50000 values]

Now I want to populate another array PScore based on the following calculation. For each value of users in pusers, I need to fetch the corresponding score and add it to the PScore array in the index corresponding to the user.

Example,

if users[5] = 23224
and pusers[6] = pusers[97] = 23224 
then PScore[5] += score[6]+score[97]

Items of note:

  • score is related to pusers (e.g., pusers[5] has score[5])
  • PScore is expected to be related to users (e.g., cumulative score of users[5] is Pscore[5])
  • The ultimate aim is to assign a cumulative score of posts to the user who owns it.
  • The users who don't own any posts are assigned a score of 0.

Can anyone help me in doing this? I tried a lot but once I run my different trials, the output screen remains blank until I Ctrl+Z and get out.

I went through all of the following posts but I couldn't use them effectively for my scenario.

I am new to this forum and I'm a beginner in Python too. Any help is going to be really useful to me.

Additional Information

  • I'm working on a small project using StackOverflow data.
  • I'm using Orange tool and I'm in the process of learning the tool and python.

Ok I understand that something is wrong with my approach. So shouldn't I use lists for this scenario? Can anyone please tell me how I should proceed with this?

Sample of the data that i have arrived at is as shown below.

PUsers  Score
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
-1  0
13  0
77  1
77  4
77  3
77  0
77  2
77  2
77  3
102     2
105     0
108     2
108     2
117     2

Users
-1
1
2
3
4
5
8
9
10
11
13
16
17
19
20
22
23
24
25
26
27
29
30
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
48
49
50

All that I want is the total score associated with each user. Once again, the pusers list contains repetition while users list contains unique values. I need the total score associated with each user stored in such a way that, if I say PScore[6], it should refer to the total score associated with User[6].

Hope I answered the queries.

Thanks in advance.

9
  • 2
    post your code so we can help Commented Nov 4, 2013 at 14:07
  • 4
    Note: you are most likely talking about lists; arrays are a different datatype in Python, located in the dedecated array module, which can only hold homogeneous primitive values. Commented Nov 4, 2013 at 14:12
  • 3
    @Anu145 it would help us if you could add some sample data to your post. Obviously you can't paste in data for 50000 users but you could make up a small amount of data (users, pusers, score) that represents, say, 10 users. If you could add the data as actual Python code (e.g. users = [123, 123, 456]) so that it has the same form as your real data that would be awesome. Commented Nov 4, 2013 at 14:39
  • 2
    Given the numbers (50K items per list) I assume this is not homework... But then using lists seems really really wrong, you should have some database to take care of your data. Commented Nov 4, 2013 at 14:41
  • 1
    These kinds of data are perfectly suited to a relational database. Finding the information you need from a rdb would be trivial. Take a look at sqlite Commented Nov 4, 2013 at 14:57

2 Answers 2

2

From how you described your arrays and since you're using python, this looks like a perfect candidate for dictionaries.

Instead of having one array for post owner and another array for post score, you should be able to make a dictionary that maps a user id to a score. When you're taking in data, look in the dictionary to see if the user already exists. If so, add the score to the current score. If not, make a new entry. When you've looped through all the data, you should have a dictionary that maps from user id to total score.

http://docs.python.org/2/tutorial/datastructures.html#dictionaries

Sign up to request clarification or add additional context in comments.

2 Comments

this sounds like exactly what i want.. I ll try and get back to you.. Thanks !!
Thank You sooooo much.. Ur idea worked.. Now my code looks simpler and readable. Less complex and serves the purpose too.. Thank u so much for suggesting dictionaries..!! Thanks
1

I think your algorithm is either wrong or broken. Try to compute it's complexity. If it's N^2 or more you are likely using an inefficient algorithm. O(N^2) with 50.000 elements should take a few seconds. O(N^3) will probably take minutes. If you're sure of your approach try running it with some small fake data to figure out if it does the right thing or if you accidentally added some infinite loop.

You can easily get it working in linear time with dictionaries.

2 Comments

ya my code takes minutes to run.. I ll try using dictionaries and get back to you.. Thanks !!
thanks for making me think in the view of complexity. Now i used dictionaries and i find my code simpler at the same time it takes less time to run.. Thanks !!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.