1

By default, columns are all set to zero. Make entry as 1 at (row,column) where column name string present on URL column

L # list that contains column names used to check if found on URL

Dataframe Image

def generate(statement,col):
    if statement.find(col) == -1:
      return 0
    else:
      return 1

for col in L:
  df3[col].apply(generate, args=(col))

I am a beginner, it throws and error:

/usr/local/lib/python3.6/dist-packages/pandas/core/series.py in f(x)
4195 4196 def f(x): -> 4197 return func(x, *args, **kwds) 4198 4199 else:

TypeError: generate() takes 2 positional arguments but 9 were given

Any suggestions would be helpful

Edit 1:

after,

df3[col].apply(generate, args=(col,))

got error:

> --------------------------------------------------------------------------- AttributeError                            Traceback (most recent call
> last) <ipython-input-162-508036a6e51f> in <module>()
>       1 for col in L:
> ----> 2   df3[col].apply(generate, args=(col,))
> 
> 2 frames pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
> 
> <ipython-input-159-9380ffd36403> in generate(statement, col)
>       1 def generate(statement,col):
> ----> 2     if statement.find(col) == -1:
>       3         return 0
>       4     else:
>       5         return 1
> 
> AttributeError: 'int' object has no attribute 'find'

Edit 2: "I missed to emphasize on URL column in for loop code will rectify that"

Edit 3: Updated and fixed to,

def generate(statement,col):
    if col in str(statement):
        return 1
    else:
        return 0

for col in L:
  df3[col] = df3['url'].apply(generate, col=col)

Thanks for all the support!

2
  • This error means that statement is an int, so it has no method .find(). Different columns in your dataframe have objects with different types, so you could either check that type(statement) == str, or convert statement to string with str(statement) (this could fail for some other types, so the first method is better). Commented Nov 20, 2020 at 15:32
  • Yes, cause instead of url column I took, df[col] which are all zeros it was definitely passed as integer, I must have passed df['url'] to function. I will make changes accordingly. Thankyou. Commented Nov 20, 2020 at 16:09

2 Answers 2

2

When creating a 1 element tuple, you need a comma after the element: args=(col,), otherwise the parentheses are just ignored.

Sign up to request clarification or add additional context in comments.

1 Comment

thanks a lot for that knowledge. I will update the question after proceeding accordingly.
0

This seems to be a problem with passing parameter in args. args in apply function will take the input as tuples and the same will be passed to the function.

Lets see one example to describe it,

df = pd.DataFrame([['xyz', 'US'],['abc', 'MX'],['xyz', 'CA']], columns = ["Name", "Country"])

print(df)

Name    Country
xyz     US
abc     MX
xyz     CA

Create a function as required with extra arguments,

def generate(statement,col):
    if statement.find(col) == -1:
        return 0
    else:
        return 1

Consider L as the list, ['Name', 'Country']

Now, Lets apply the function generate with extra arguments in loop

for col in L:
    print(df[col].apply(generate, args=(col)))


TypeError: generate() takes 2 positional arguments but 5 were given

Now, we could see the error occurs because (col) is a single element in tuple and so the args will take input as args=('N', 'A', 'M', 'E'). Along with statement now extra 4 inputs were given instead of just 1.

To avoid this situation, you can follow either of the below options

  1. Assign the col value to the parameter itself directly
df[col].apply(generate, col=col)
  1. Pass the arguments in tuple separated by commas. Note that for a single element tuple add one comma at the end.
df[col].apply(generate, args=(col,))

3 Comments

Really nice explanation, thanks a lot. Though now I get an attribute error. I will update it in the question.
@NarendranSNair, The Error ` AttributeError: 'int' object has no attribute 'find'` is because of the code statement.find(col) in generate function. It means your statement should be string object because .find is the attribute of string object. When using all columns in loop, Non-object data type columns( int/float) will be throwing this error as it doesn't find an attribute .find on those data types. To avoid this, either you can have only object dtype columns in your List L or add one more line in loop which will convert your series to string. df[col] = df[col].astype('str')
Yes, actually df[col] is type int, I should have used df['url']. I have corrected it now. Thanks a lot for taking the time to frame the answer.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.