0

I want to use an array of centers in my k-means algorithm. This is my array

[[-5.158116189420494, -6.135869490272887, -7.112943870919113, -4.719408271488777, -8.652736411771516, -5.115898856180194, -9.444466710512513, -6.721183141827832, -8.187939363193856, -4.866007421496122, -4.498541424902005, -6.05955187591462], [2.4503788682948797, 4.136767712097715, 3.800113452319174, 1.7263996510061559, 6.204316437195861, 3.199580908124732, 5.4996984541468565, 3.504064521222991, 1.7285485126344595, 1.9327954130937557, 4.491668242286317, 2.4442089524354818], [8.91661735243092, 8.19164570547311, 7.28941813144091, 11.01087393409493, 9.666237508380636, 7.689372230181427, 10.796081659572991, 10.587480247869069, 12.490792204659163, 9.146059052365413, 4.077223320288767, 8.748918676524138], [-5.007715234440542, -5.201881954076602, -2.990066071487654, -6.50605352762039, -6.097032315522047, -4.81206434114537, -5.453803052692122, -5.968516137674577, -4.087403530804171, -4.9456413319696315, -3.748488710268994, -3.8879845624490703]]

And this is the way I am defining k-means to use it

km = KMeans(n_clusters=4, init=cluster_centers, max_iter=30)
km.fit(Xnorm)
km.predict(Xnorm)
y_kmeans = km.predict(Xnorm)

I get this error

ValueError: init should be either 'k-means++', 'random', a ndarray or a callable, got '[[-5.158116189420494, -6.135869490272887, -7.112943870919113, -4.719408271488777, -8.652736411771516, -5.115898856180194, -9.444466710512513, -6.721183141827832, -8.187939363193856, -4.866007421496122, -4.498541424902005, -6.05955187591462], [2.4503788682948797, 4.136767712097715, 3.800113452319174, 1.7263996510061559, 6.204316437195861, 3.199580908124732, 5.4996984541468565, 3.504064521222991, 1.7285485126344595, 1.9327954130937557, 4.491668242286317, 2.4442089524354818], [8.91661735243092, 8.19164570547311, 7.28941813144091, 11.01087393409493, 9.666237508380636, 7.689372230181427, 10.796081659572991, 10.587480247869069, 12.490792204659163, 9.146059052365413, 4.077223320288767, 8.748918676524138], [-5.007715234440542, -5.201881954076602, -2.990066071487654, -6.50605352762039, -6.097032315522047, -4.81206434114537, -5.453803052692122, -5.968516137674577, -4.087403530804171, -4.9456413319696315, -3.748488710268994, -3.8879845624490703]]' instead.

From the message I'm reading I think I need to use a specific format for the array. How can I make the convertion?

Calculation of centers to be used

for i in range(0,100):
    X=dataML
    X = X[np.random.default_rng(seed=i).permutation(X.columns.values)]   
    #X = X.sample(frac=1).reset_index(drop=True)
    Xnorm=mms.fit_transform(X)         
    km=KMeans(n_clusters=4,n_init=10,max_iter=30,random_state=42)    
    y_kmeans=km.fit_predict(Xnorm)
    print('aqui')
    print(km.cluster_centers_)
    print('aqui 2')
    center_cluster01.append(km.cluster_centers_[0])
    center_cluster02.append(km.cluster_centers_[1])
    center_cluster03.append(km.cluster_centers_[2])
    center_cluster04.append(km.cluster_centers_[3])

meanC01=[]
for i in range(0,12):
    sum=0
    for j in range(0,100):        
        sum = sum + center_cluster01[j][i]            
    mean01 = sum/2
    meanC01.append(mean01)    

meanC02=[]
for i in range(0,12):
    sum=0
    for j in range(0,100):        
        sum = sum + center_cluster02[j][i]            
    mean02 = sum/2
    meanC02.append(mean02)    

meanC03=[]
for i in range(0,12):
    sum=0
    for j in range(0,100):        
        sum = sum + center_cluster03[j][i]            
    mean03 = sum/2
    meanC03.append(mean03)    

meanC04=[]
for i in range(0,12):
    sum=0
    for j in range(0,100):        
        sum = sum + center_cluster04[j][i]            
    mean04 = sum/2
    meanC04.append(mean04)

Xnorm

Xnorm=mms.fit_transform(dataML)     
cluster_centers = [meanC01, meanC02, meanC03, meanC04]
km = KMeans(n_clusters=4, init=cluster_centers, max_iter=30)
km.fit(Xnorm)
km.predict(Xnorm)
y_kmeans = km.predict(Xnorm)
3
  • 1
    Please add your entire code/context to make it easier to answer. For example, showing where xnrom is initialized. Commented Jun 18, 2022 at 22:52
  • Here's the solution duckduckgo.com/?q=How+to+convert+an+array+to+np.array. If it doesn't work, edit your question to add what you have tried. Commented Jun 18, 2022 at 22:54
  • What are the outputs of type(cluster_centers) and type(cluster_centers[0])? If the first one is list and the second one is ndarray, then try this: stackoverflow.com/questions/7200878/… Commented Jun 19, 2022 at 1:33

1 Answer 1

0

Can you check the type of the first array you have? I'm seeing two sets of open brackets meaning it is multidimensional.

You might want to try flattening your array with numpy.ndarray.flatten().

I have had this issue with data not being passed in correctly because of dimension issues.

NumPy method description is here.

Sign up to request clarification or add additional context in comments.

5 Comments

How I identify the type of the array?
Do print(type("name of array variable, in this case Xnorm")).
I just checked the documentation for Sklearn for the min-max-scaler fit_transform(). That is what it looks like you are using. So if you pass in your array into the numpy.ndarray.flatten() then your output should be that same array in one dimension.
I tried this way but I'm getting this error TypeError: descriptor 'flatten' for 'numpy.ndarray' objects doesn't apply to a 'list' object
As it’s currently written, your answer is unclear. Please edit to add additional details that will help others understand how this addresses the question asked. You can find more information on how to write good answers in the help center.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.