Python construct a matrix iterating over arrays

Question

from numpy import genfromtxt, linalg, array, append, hstack, vstack

#Euclidean distance function
def euclidean(v1, v2):
    dist = linalg.norm(v1 - v2)
    return dist

#get the .csv files and eliminate heading and unused columns from test
BMUs = genfromtxt('BMU3.csv', delimiter=',')
data = genfromtxt('test.csv', delimiter=',')
data = data[1:, :-2]

i = 0
for obj in data:
    D = 0
    for BMU in BMUs:
        Dist = append(euclidean(obj, BMU[: -2]), BMU[-2:])
    D = hstack(Dist)

Map = vstack(D)

#iteration counter
i += 1
if not i % 1000:
    print (i, ' of ', len(data))

print (Map)

What I would like to do is:

Take an object from data
Calculate distance from BMU (euclidean(obj, BMU[: -2])
Append to the distance the last two items of the BMU array
create a 2d matrix that contains all the distances plus the last two items of all the BMU from a data object (D = hstack(Dist))
create an array of those matrices with length equal to the number of objects in data. (Map = vstack(D))

The problem here, or at least what I think is the problem, is that hstack and vstack would like as input a tuple of an array and not a single array. It's like I'm trying to use them as I use List.append() for lists, sadly I'm a beginner and I have no idea how to do it differently.

Any help would be awesome, thank you in advance :)

hpaulj · Accepted Answer · 2016-12-12 21:12:44Z

1

First a usage note:

Instead of:

from numpy import genfromtxt, linalg, array, append, hstack, vstack

use

import numpy as np
....
data = np.genfromtxt(....)
....
     np.hstack...

Secondly, stay away from np.append. It too easy to misuse. Use np.concatenate so you get the full flavor of what it is doing.

list append is better for incremental work

alist = []
for ....
    alist.append(....)
arr = np.array(alist)

==================

Without sample arrays (or at least shapes) I'm guessing. But (n,2) arrays sound reasonable. Taking the distance of each pair of 'points' from each other, I can collect the values in a nested list comprehension:

In [121]: data = np.arange(6).reshape(3,2)
In [122]: [[euclidean(d,b) for b in data] for d in data]
Out[122]: 
[[0.0, 2.8284271247461903, 5.6568542494923806],
 [2.8284271247461903, 0.0, 2.8284271247461903],
 [5.6568542494923806, 2.8284271247461903, 0.0]]

and make that an array:

In [123]: np.array([[euclidean(d,b) for b in data] for d in data])
Out[123]: 
array([[ 0.        ,  2.82842712,  5.65685425],
       [ 2.82842712,  0.        ,  2.82842712],
       [ 5.65685425,  2.82842712,  0.        ]])

The equivalent with nested loops:

alist = []
for d in data:
    sublist=[]
    for b in data:
        sublist.append(euclidean(d,b))
    alist.append(sublist)
arr = np.array(alist)

There are ways of doing this without loops, but let's make sure the basic Python looping approach works first.

===============

If I want the difference (along the last axis) between every element (row) in data and every element in bmu (or here data), I can use array broadcasting. The result is a (3,3,2) array:

In [130]: data[None,:,:]-data[:,None,:]
Out[130]: 
array([[[ 0,  0],
        [ 2,  2],
        [ 4,  4]],

       [[-2, -2],
        [ 0,  0],
        [ 2,  2]],

       [[-4, -4],
        [-2, -2],
        [ 0,  0]]])

norm can handle larger dimensional arrays and takes an axis parameter.

In [132]: np.linalg.norm(data[None,:,:]-data[:,None,:],axis=-1)
Out[132]: 
array([[ 0.        ,  2.82842712,  5.65685425],
       [ 2.82842712,  0.        ,  2.82842712],
       [ 5.65685425,  2.82842712,  0.        ]])

edited Dec 12, 2016 at 21:12

answered Dec 12, 2016 at 20:02

hpaulj

233k14 gold badges260 silver badges392 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Bradipo Eremita Over a year ago

Thank you very much, will be waiting for your advice :)

hpaulj Over a year ago

What's the shape (and dtype) for BMU and data? It's easier to replicate and test your code with samples. Otherwise I have to guess and make up sample arrays (like data=np.arange(24).reshape(12,2)).

Bradipo Eremita Over a year ago

(243, 7) BMUs.shape (19219, 5) data.shape

Bradipo Eremita Over a year ago

Type: they are both numpy arrays

Bradipo Eremita · Accepted Answer · 2016-12-13 12:53:15Z

Thanks to your help, I managed to implement the pseudo code, here the final program:

import numpy as np


def euclidean(v1, v2):
    dist = np.linalg.norm(v1 - v2)
    return dist


def makeKNN(dataSet, BMUSet, k, fileOut, test=False):
    # take input files
    BMUs = np.genfromtxt(BMUSet, delimiter=',')
    data = np.genfromtxt(dataSet, delimiter=',')

    final = data[1:, :]
    if test == False:
        data = data[1:, :]
    else:
        data = data[1:, :-2]

# Calculate all the distances between data and BMUs than reorder BMU with the distances information

    dist = np.array([[euclidean(d, b[:-2]) for b in BMUs] for d in data])
    BMU_K = np.array([BMUs[np.argsort(d)] for d in dist])

    # median over the closest k BMU
    Z = np.array([[np.sum(b[:k].T[5]) / k] for b in BMU_K])

    # error propagation
    Z_err = np.array([[np.sqrt(np.sum(np.power(b[:k].T[5], 2)))] for b in BMU_K])

    # Adding z estimates and errors to the data
    final = np.concatenate((final, Z, Z_err), axis=1)

    # print output file
    np.savetxt(fileOut, final, delimiter=',')
    print('So long, and thanks for all the fish')

Thank you very much and I hope that this code will help someone else in the future :)

Collectives™ on Stack Overflow

Python construct a matrix iterating over arrays

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related