Summary
lists
and arrays
are data types used for storing data in rows and column. However, lists cannot be used to perform column-wise operations, where as arrays can.
.shape
- used to check the dimension of the data. .type()
- used to check the datatype of the variable.
- If the array contains more than one column then it is known as
matrix
.
.astype()
function is used to change the datatype.
unicode
is a datatype with strings, integers (as a mixture). The mathematical operations can only preformed on integers so we convert from unicode
to int
.
- In numpy we can provide the field name, it supports (in market people believe it cant so we move to pandas), example to demonstrate the field name in numpy,
a
had no field names.
a = numpy.array([[1, "sai", 10], [2, "sia", 20], [3, "ias" ,30]])
a
type(a)
a.shape
a.dtype
# if you observe the dtypes of a ('<U21'), c('int64') and d ('<U1') are different at a: we have mixture of numericals and strings, c: its only numericals and d: has only strings. The dtype values for different mixture and individual datatypes is different.
# inorder to provide the filed names, we need to update it
a.dtype = {'names' : ['StudentRollNo', 'StudentName', 'StudetnAttendance'], 'formats' : [numpy.int64, numpy.unicode, numpy.int64]}
a
t = numpy.array([1, 2, 3])
t
t.dtype = {'names' : ['ID'], 'formats' : [numpy.int64]}
t
formats
keyword is used to mention the data type of a particular column stored in an array.
- In
numpy
array, a 2D array
is called matrix
and a 1D array
is called a vector
. In pandas
is just a library extension of numpy that can help to reduce the complexity induced by writing a code for numpy. Numpy is hard-corded, whereas pandas is built on numpy.
pandas.read_csv()
extension is one of the file formats which are extensively used in numpy or pandas to read the datasets stored in files. It stands for comma separation file format.
- We can use
<variable name>.columns
to retrieve all the names of column fields in an array.
- The
.iloc[]
function helps retrieve data present at particular location of an array.