You can't change the dtype in-place.
In [59]: arr = np.array(list_of_lists)
In [60]: arr
Out[60]:
array([['Africa', '1990', '0', '', '32.6'],
['Asia', '2006', '32.4', '5.5', '46.6'],
['Europe', '2011', '5.4', '', '55.4']], dtype='<U6')
The common dtype of the inputs is a string.
replacing the "" with nan puts the string representation in the array:
In [62]: arr[arr == ""] = np.nan
In [63]: arr
Out[63]:
array([['Africa', '1990', '0', 'nan', '32.6'],
['Asia', '2006', '32.4', '5.5', '46.6'],
['Europe', '2011', '5.4', 'nan', '55.4']], dtype='<U6')
Look at a portion of the underlying databuffer:
In [64]: arr.tobytes()
Out[64]: b'A\x00\x00\x00f\x00\x00\x00r\x00\x00\x00i\x00\x00\x00c\x00\x00\x00a\x00\x00\x001\x00\x00\x009\x00\x00\x009\x00\x00\....'
See the actual characters.
A slice of the array is a view, but the astype conversion is a new array, with its own data buffer.
In [65]: arr[:,2:]
Out[65]:
array([['0', 'nan', '32.6'],
['32.4', '5.5', '46.6'],
['5.4', 'nan', '55.4']], dtype='<U6')
In [66]: arr[:,2:].astype(float)
Out[66]:
array([[ 0. , nan, 32.6],
[32.4, 5.5, 46.6],
[ 5.4, nan, 55.4]])
You can't write Out[66] back to arr without it being converted back to string.
You could make an object dtype array:
In [67]: arr = np.array(list_of_lists, dtype=object)
In [68]: arr
Out[68]:
array([['Africa', '1990', '0', '', '32.6'],
['Asia', '2006', '32.4', '5.5', '46.6'],
['Europe', '2011', '5.4', '', '55.4']], dtype=object)
In [69]: arr = np.array(list_of_lists, dtype=object)
In [70]: arr[arr == ""] = np.nan
In [71]: arr
Out[71]:
array([['Africa', '1990', '0', nan, '32.6'],
['Asia', '2006', '32.4', '5.5', '46.6'],
['Europe', '2011', '5.4', nan, '55.4']], dtype=object)
In [72]: arr[:,2:] = arr[:,2:].astype(float)
In [73]: arr
Out[73]:
array([['Africa', '1990', 0.0, nan, 32.6],
['Asia', '2006', 32.4, 5.5, 46.6],
['Europe', '2011', 5.4, nan, 55.4]], dtype=object)
dtype remains object, but the type of the elements can change - that's because object dtype is a glorified (or debased) list. You gain some flexibility, but loose most of the numpy numeric speed.
Structured array (compound dtype) as shown in the other answer is another possibility. It's easy to make this kind of array when loading a csv (with np.genfromtxt). You still can't change dtypes in-place. And you can't do math across fields of a structured array.
pandas
In [153]: df = pd.DataFrame(list_of_lists)
In [154]: df
Out[154]:
0 1 2 3 4
0 Africa 1990 0 32.6
1 Asia 2006 32.4 5.5 46.6
2 Europe 2011 5.4 55.4
In [156]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 3 entries, 0 to 2
Data columns (total 5 columns):
0 3 non-null object
1 3 non-null object
2 3 non-null object
3 3 non-null object
4 3 non-null object
dtypes: object(5)
memory usage: 248.0+ bytes
Convert column dtypes:
In [158]: df[2].astype(float)
In [162]: df[4]=df[4].astype(float)
Column 3 needs the nan conversion before we can convert that.
In [164]: df
Out[164]:
0 1 2 3 4
0 Africa 1990 0.0 32.6
1 Asia 2006 32.4 5.5 46.6
2 Europe 2011 5.4 55.4
In [165]: df.dtypes
Out[165]:
0 object
1 object
2 float64
3 object
4 float64
dtype: object
There are better pandas programmers here; I've focused more on numpy.
insert a total in each row- can't do that in-place. Adding a new column makes a new array.