I am trying to read data from a csv file into a numpy array. Since the csv file contains empty fields, I read all of the data into an array of dtype=str, and plan to convert rows/columns into appropriate numerical types. The example below is my unsuccessful at converting these array dtypes.
import numpy as np
x = np.array([
['name', 'property', 'value t0', 'value t1', 'value t2'],
['a', 0.5, 1, 2, 3],
['b', 0.2, 5, 10, 100],
['c', 0.7, 3, 6, 9],
], dtype=str)
First, let's view the original array.
# print("\n .. x (shape={}, dtype={}):\n{}\n".format(x.shape, x.dtype, x))
[['name' 'property' 'value t0' 'value t1' 'value t2'] ['a' '0.5' '1' '2' '3'] ['b' '0.2' '5' '10' '100'] ['c' '0.7' '3' '6' '9']]
Then, let's make sure the numerical entries (taken from the first row down and second column right) can be converted into type <int>.
# print(x[1:, 2:].astype(int))
[[ 1 2 3] [ 5 10 100] [ 3 6 9]]
So, I tried to put these concepts together.
# # x[1:, 2:] = x[1:, 2:].astype(int)
# x[1:, 2:] = np.array(x[1:, 2:], dtype=int)
print(x)
[['name' 'property' 'value t0' 'value t1' 'value t2'] ['a' '0.5' '1' '2' '3'] ['b' '0.2' '5' '10' '100'] ['c' '0.7' '3' '6' '9']]
Why are the selected entries remaining strings? I saw similar questions posted, for which the accepted solution appears to be using named-fields. But, I prefer numerical indexing to named-fields for my use-case.
[('name','U1'),('property',float), ...]dtype. An alternative is object dtype, where elements are stored in a list-like manner. Otherwise you can't have a mix of dtypes. Apandasdataframe would also have named columns, and a separate Series for each column.