Skip to content

Latest commit

 

History

History
32 lines (29 loc) · 2.36 KB

week11_readme.md

File metadata and controls

32 lines (29 loc) · 2.36 KB

WEEK 11

Summary

  • lists and arrays are data types used for storing data in rows and column. However, lists cannot be used to perform column-wise operations, where as arrays can.
  • .shape - used to check the dimension of the data. .type() - used to check the datatype of the variable.
  • If the array contains more than one column then it is known as matrix.
  • .astype() function is used to change the datatype.
  • unicode is a datatype with strings, integers (as a mixture). The mathematical operations can only preformed on integers so we convert from unicode to int.
  • In numpy we can provide the field name, it supports (in market people believe it cant so we move to pandas), example to demonstrate the field name in numpy, a had no field names.
a = numpy.array([[1, "sai", 10], [2, "sia", 20], [3, "ias" ,30]])
a
type(a)
a.shape
a.dtype
# if you observe the dtypes of a ('<U21'), c('int64') and d ('<U1') are different at a: we have mixture of numericals and strings, c: its only numericals and d: has only strings. The dtype values for different mixture and individual datatypes is different.
# inorder to provide the filed names, we need to update it
a.dtype = {'names' : ['StudentRollNo', 'StudentName', 'StudetnAttendance'], 'formats' : [numpy.int64, numpy.unicode, numpy.int64]}
a


t = numpy.array([1, 2, 3])
t
t.dtype = {'names' : ['ID'], 'formats' : [numpy.int64]}
t
  • formats keyword is used to mention the data type of a particular column stored in an array.
  • In numpy array, a 2D array is called matrix and a 1D array is called a vector. In pandas is just a library extension of numpy that can help to reduce the complexity induced by writing a code for numpy. Numpy is hard-corded, whereas pandas is built on numpy.
  • pandas.read_csv() extension is one of the file formats which are extensively used in numpy or pandas to read the datasets stored in files. It stands for comma separation file format.
  • We can use <variable name>.columns to retrieve all the names of column fields in an array.
  • The .iloc[] function helps retrieve data present at particular location of an array.