2

I have the following input table (df):

ColumnA ColumnB Blocks
A 12 1
B 32 1
C 44 1
D 76 2
E 99 2
F 123 2
G 65 2
H 87 3
I 76 3
J 231 3
k 80 4
l 55 4
m 27 5
n 67 5
o 34 5

I would like to perform block randomization such that, it pick one value from each blocks ( one value from 1,2,3,4,5) and create that as a separate table.

The output should look something like the following:

ColumnA ColumnB Blocks Groups
B 32 1 A1
E 99 2 A1
I 76 3 A1
l 55 4 A1
m 27 5 A1
A 12 1 A2
F 123 2 A2
k 80 3 A2
m 27 4 A2
n 67 5 A2
C 44 1 A3
H 87 2 A3
J 231 3 A3
n 67 4 A3
o 34 5 A4
D 76 1 A4
G 65 2 A4

Randomly selected rows such that each group has all the blocks (evenly distributed).

What I tried so far?


df = df.groupby('blocks').apply(lambda x: x.sample(frac=1,random_state=1234)).reset_index(drop=True)
treatment_groups = [f"A{i}" for i in range(1, n+1)]
df['Groups'] = (df.index // n).map(dict(zip(idx, treatment_groups)))

This doesn't randomize according to the blocks column. How do I do that?

3 Answers 3

2

Let us try by defining a function to generate random samples from each block:

def random_samples(n):
    for i in range(1, n+1):
        for _, g in df.groupby('Blocks'):
            yield g.sample(n=1).assign(Groups=f'A{i}')

sampled = pd.concat(random_samples(4), ignore_index=True)

>>> sampled

   ColumnA  ColumnB  Blocks Groups
0        A       12       1     A1
1        D       76       2     A1
2        I       76       3     A1
3        k       80       4     A1
4        n       67       5     A1
5        C       44       1     A2
6        G       65       2     A2
7        J      231       3     A2
8        l       55       4     A2
9        m       27       5     A2
10       B       32       1     A3
11       G       65       2     A3
12       H       87       3     A3
13       l       55       4     A3
14       m       27       5     A3
15       B       32       1     A4
16       F      123       2     A4
17       I       76       3     A4
18       l       55       4     A4
19       m       27       5     A4
Sign up to request clarification or add additional context in comments.

2 Comments

Thank you very much! But, I get the following error "ValueError: No objects to concatenate". How do I deal with that?
Not sure may be something related to pandas version. What is your pandas version? Can you please check sampled = pd.concat(list(random_samples(4)), ignore_index=True) ?
0

In your code .sample(frac = 1) returns 100% of samples which is all of them. You want the version with .sample(n=1), for example if you just want one group you would do

df.groupby('Blocks').apply(lambda x: x.sample(n=1, random_state=1234))

to get


        ColumnA ColumnB Blocks
Blocks              
1   0   A       12      1
2   3   D       76      2
3   7   H       87      3
4   10  k       80      4
5   12  m       27      5

to get, say, 5 groups you would use n=5 with replace=True (as some groups have fewer than 5 elements) and then some re-arranging:

df = df.groupby('Blocks').apply(lambda x: x.sample(n=5, replace = True, random_state=1234)).reset_index(drop = True)
df['Groups'] = 'A' + df.groupby('Blocks').cumcount().astype(str)
df.sort_values('Groups')

produces

    ColumnA      ColumnB    Blocks  Groups
--  ---------  ---------  --------  --------
 0  C                 44         1  A0
20  o                 34         5  A0
 5  G                 65         2  A0
15  l                 55         4  A0
10  J                231         3  A0
 1  B                 32         1  A1
21  n                 67         5  A1
 6  G                 65         2  A1
16  l                 55         4  A1
11  I                 76         3  A1
22  m                 27         5  A2
17  k                 80         4  A2
12  H                 87         3  A2
 7  F                123         2  A2
 2  A                 12         1  A2
13  H                 87         3  A3
 8  E                 99         2  A3
18  l                 55         4  A3
 3  A                 12         1  A3
23  m                 27         5  A3
14  H                 87         3  A4
 9  D                 76         2  A4
19  k                 80         4  A4
 4  A                 12         1  A4
24  m                 27         5  A4

8 Comments

Thank you very much! Is it possible to start the groups from A1 and not A0?
of course, replace the relevant line with df['Groups'] = 'A' + (df.groupby('Blocks').cumcount()+1).astype(str)
Not sure why I get duplicate rows in each groups. For me C is present in two groups A1 and A3.
there are only three rows in block 1 so some will have to be repeated if you want 5 groups?
Is it possible to avoid those duplicates or repeats?
|
0
  • it's truly not random, but will group as you want
  • order of source dataframe will influence order that blocks are assigned to a Group
df = pd.DataFrame({'ColumnA': ['A', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'k', 'l', 'm', 'n', 'o'], 
              'ColumnB': [12, 32, 44, 76, 99, 123, 65, 87, 76, 231, 80, 55, 27, 67, 34], 
              'Blocks': [1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 4, 4, 5, 5, 5]})

# generate a series that is "A{n}" where "n" is occurrence# of "Block"
sg = df.groupby("Blocks").apply(lambda g: pd.Series([f"A{i+1}" 
                                                for i in range(len(g))],name="Group")).explode().reset_index(drop=True)

df.join(sg)

ColumnA ColumnB Blocks Group
0 A 12 1 A1
1 B 32 1 A2
2 C 44 1 A3
3 D 76 2 A1
4 E 99 2 A2
5 F 123 2 A3
6 G 65 2 A4
7 H 87 3 A1
8 I 76 3 A2
9 J 231 3 A3
10 k 80 4 A1
11 l 55 4 A2
12 m 27 5 A1
13 n 67 5 A2
14 o 34 5 A3

4 Comments

Thank you very much! But, I get the following error: 'TypeError: explode() missing 1 required positional argument: 'column''
what version of pandas are you using> pd.__version__
version '1.1.2'
mostly unto date, I'm using 1.2.1. pandas.pydata.org/docs/reference/api/pandas.Series.explode.html was changed in 1.1.0, haven't checked regression bugs

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.