1

I am very new to Python. I am trying to write a function that does the following, and reuse the function in future parts of the code: (what the function does):

  • find the cosineValue between the elements of two list
  • add the values to a list and calculate the mean
  • append the mean values to a list
  • return the list of means

I would then like to make calculations based on the list that is returned by the above function. However, the function (i.e. knearest_similarity(tfidf_datamatrix)) does not return anything. The print commands in the second function (i.e. threshold_function())do not show anything. Can someone please have a look at the code and tell me what I am doing wrong.

def knearest_similarity(tfidf_datamatrix):
    k_nearest_cosineMean = []
    for datavector in tfidf_datamatrix:
        cosineValueSet = []
        for trainingvector in tfidf_vectorizer_trainingset:
            cosineValue = cx(datavector, trainingvector)
            cosineValueSet.append(cosineValue)
        similarityMean_of_k_nearest_neighbours = np.mean(heapq.nlargest(k_nearest_neighbours, cosineValueSet))    #the cosine similarity score of top k nearest neighbours
        k_nearest_cosineMean.append(similarityMean_of_k_nearest_neighbours)        
    print k_nearest_cosineMean
    return k_nearest_cosineMean


def threshold_function():  
    mean_cosineScore_mean = np.mean(knearest_similarity(tfidf_matrix_testset))
    std_cosineScore_mean = np.std(knearest_similarity(tfidf_matrix_testset))
    threshold = mean_cosineScore_mean - (3*std_cosineScore_mean)
    print "The Mean of the mean of cosine similarity score for a normal Behaviour:", mean_cosineScore_mean #The mean will be used for finding the threshold
    print "The standard deviation of the mean of cosine similarity score:", std_cosineScore_mean  #The standstart deviation is also used to find threshold
    print "The threshold for normal behaviour should be (Mean - 3*standard deviation):", threshold
    return threshold

EDIT

I tried defining two global variables for the functions to use (i.e. tfidf_vectorizer_trainingset and tfidf_matrix_testset).

#fitting tfidf transfrom for training data
tfidf_vectorizer_trainingset = tfidf_vectorizer.fit_transform(readfile(trainingdataDir)).toarray()

#tfidf transform the test set based on the training set
tfidf_matrix_testset = tfidf_vectorizer.transform(readfile(testingdataDir)).toarray().

However the print commands in threshold_function() appear as below:

 The Mean of the mean of cosine similarity score for a normal Behaviour: nan
The standard deviation of the mean of cosine similarity score: nan
The threshold for normal behaviour should be (Mean - 3*standard deviation): nan

EDIT2 I found that the first value in the k_nearest_cosineMean was nan. After deleting the value I managed to get valid calculations.

1
  • When you say that "the print commands ... do not show anything" do you literally mean that nothing is printed at all, or just that what is printed doesn't contain the numbers you want? It would be easier for others to help you if you could provide a minimal reproducible example and specific information about what you expect to see versus what you actually see. Commented Oct 27, 2015 at 4:53

1 Answer 1

3

I the first line of threshold_function() you call knearest_similarity(tfidf_matrix_testset) however you never define what tfidf_matrix_testset is. You do that in the second line also. In the third line you use the output from the second line. Give tfidf_matrix_testset a value.

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks, when I do give it values as global variables the threshold_function() still doesn't give me anything. I get values 'nan' for mean_cosineScore_mean , std_cosineScore_mean and threshold.
actually you were correct. I had to give it a value. although there was a 'nan' value in the 'k_nearest_cosineMean' that was causing the issue with invalid calculation result. Seems to work now. Thanks again.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.