14

This is a question that usually appears in interviews.

I know how to read csv files using Pandas.

However I am struggling to find a way to read files without using external libraries.

Does Python come with any module that would help read csv files?

1
  • A dataframe can be seen as a collection of records or as a list of columns. Numpy (and pandas) are mainly C or Cython optimizations to speedup the processing of large data frames, but you implement everything by hand. Only posting a comment because the current question is rather broad. Commented Mar 28, 2019 at 18:08

5 Answers 5

16

You most likely will need a library to read a CSV file. While you could potentially open and parse the data yourself, this would be tedious and time consuming. Luckily python comes with a standard csv module that you won't have to pip install! You can read your file in like this:

import csv

with open('file.csv', 'r') as file:
    my_reader = csv.reader(file, delimiter=',')
    for row in my_reader:
        print(row)

This will show you that each row is being read in as a list. You can then process it based on index! There are other ways to read in data too as described at https://docs.python.org/3/library/csv.html one of which will create a dictionary instead of a list!

update

You linked your github for the project I took the snip

product_id,product_name,aisle_id,department_id
9327,Garlic Powder,104,13
17461,Air Chilled Organic Boneless Skinless Chicken Breasts,35,12
17668,Unsweetened Chocolate Almond Breeze Almond Milk,91,16
28985,Michigan Organic Kale,83,4
32665,Organic Ezekiel 49 Bread Cinnamon Raisin,112,3
33120,Organic Egg Whites,86,16
45918,Coconut Butter,19,13
46667,Organic Ginger Root,83,4
46842,Plain Pre-Sliced Bagels,93,3

Saved it as file.csv and ran it with the above code I posted. Result:

['product_id', 'product_name', 'aisle_id', 'department_id']
['9327', 'Garlic Powder', '104', '13']
['17461', 'Air Chilled Organic Boneless Skinless Chicken Breasts', '35', '12']
['17668', 'Unsweetened Chocolate Almond Breeze Almond Milk', '91', '16']
['28985', 'Michigan Organic Kale', '83', '4']
['32665', 'Organic Ezekiel 49 Bread Cinnamon Raisin', '112', '3']
['33120', 'Organic Egg Whites', '86', '16']
['45918', 'Coconut Butter', '19', '13']
['46667', 'Organic Ginger Root', '83', '4']
['46842', 'Plain Pre-Sliced Bagels', '93', '3']

This does what you have asked in your question. I am not going to do your project for you, you should be able to work it from here.

Sign up to request clarification or add additional context in comments.

8 Comments

What if I am supposed to use only Input and Output Libraries. Can I use an import CSV library?
@MosaliHarshaVardhanReddy What do you mean by "Input and Output Libraries"? csv comes with a csv.reader() and csv.writer() method. Does this make it qualify as an "Input and Output Library"?
Instead of using the CSV reader. I may have to use the file.reader("file.csv") and convert it into a DataFrame
I am confused. You want a DataFrame, but you refuse to use numpy. I don't think you can have it both ways...DataFrames are numpy specific as far as I'm aware.
@MosaliHarshaVardhanReddy I would truly urge you to use the csv module unless specified otherwise (which in your post you say only numpy and pandas are excluded). Then you can either make an sql database using sqlite3 or make a list of lists or a list of dictionaries to represent your data for analysis. I see no reason you should not be able to import anything at all. If that is the case though you're in for a helluva hard project that will be tedious and time consuming and neglect the best part of python: not having to reinvent the wheel with each program
|
2

When one's production environment is limited by memory, being able to read and manage data without importing additional libraries may be helpful.

In order to achieve that, the built in csv module does the work.

import csv

There are at least two ways one might do that: using csv.Reader() or using csv.DictReader().

csv.Reader() allows you to access CSV data using indexes and is ideal for simple CSV files (Source).

csv.DictReader() on the other hand is friendlier and easy to use, especially when working with large CSV files (Source).

Here's how to do it with csv.Reader()

>>> import csv
>>> with open('eggs.csv', newline='') as csvfile:
...     spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|')
...     for row in spamreader:
...         print(', '.join(row))
Spam, Spam, Spam, Spam, Spam, Baked Beans
Spam, Lovely Spam, Wonderful Spam

Here's how to do it with csv.DictReader()

>>> import csv
>>> with open('names.csv', newline='') as csvfile:
...     reader = csv.DictReader(csvfile)
...     for row in reader:
...         print(row['first_name'], row['last_name'])
...
Eric Idle
John Cleese

>>> print(row)
{'first_name': 'John', 'last_name': 'Cleese'}

For another example, check Real Python's page here.

Comments

2

I got a similar question that was made more complicated than this one on making a data structure without using pandas. This is the only relevant question I found. If I take this question, then what I was asked was: put the product id as keys to a dictionary and then put list of tuples of aisle and department ids as values (in python). The dictionary is the required dataframe. It is hard for me to think of outside of numpy and pandas.

import csv
file =  open('data.csv', 'r')
reader = csv.reader(file)

items = []  # put the rows in csv to a list
aisle_dept_id = []  # to have a tuple of aisle and dept ids
mydict = {} # porudtc id as keys and list of above tuple as values in a dictionary

product_id, aisle_id, department_id, product_name = [], [], [], []

for row in reader:
    items.append(row)

for i  in range(1, len(items)):
    product_id.append(items[i][0])
    aisle_id.append(items[i][1])
    department_id.append(items[i][2])
    product_name.append(items[i][3])

for item1, item2 in zip(aisle_id, department_id):
    aisle_dept_id.append((item1, item2))
for item1, item2 in zip(product_id, aisle_dept_id):
    mydict.update({item1: [item2]})

With the output,

mydict:
{'9327': [('104', '13')],
 '17461': [('35', '12')],
 '17668': [('91', '16')],
 '28985': [('83', '4')],
 '32665': [('112', '3')],
 '33120': [('86', '16')],
 '45918': [('19', '13')],
 '46667': [('83', '4')],
 '46842': [('93', '3')]}

Comments

1

Had a similar requirement and came up with this solution; a function that converts csv to json (needed json for readability and to make querying the data easier without having access to Pandas). If the headers arguement of the function is True, the first row of the csv is used keys in the json, otherwise value indices are used as keys.

from csv import reader as csv_reader

def csv_to_json(csv_path: str, headers: bool) -> list:
  '''Convert data from a csv to json'''
  # store json data
  json_data = []
  
  try:
    with open(csv_path, 'r') as file:
      reader = csv_reader(file)
      # set column names using first row
      if headers:
        columns = next(reader)
      
      # convert csv to json
      for row in reader:
        row_data = {}
        for i in range(len(row)):
          # set key names
          if headers:
            row_key = columns[i].lower()
          else: 
            row_key = i
          # set key/value
          row_data[row_key] = row[i]
        # add data to json store 
        json_data.append(row_data)
        
  # error handling
  except Exception as e:
    print(repr(e))
    
  return json_data

Given a csv containing the following

+------+-------+------+
| Year | Month | Week |
+------+-------+------+
| 2020 |    11 |   11 |
| 2020 |    12 |   12 |
+------+-------+------+

The output with headers is

[
  {"year": 2020, "month": 11, "week": 11},
  {"year": 2020, "month": 12, "week": 12}
]

The ouput without headers is

[
  {"0": 2020, "1": 11, "2": 11},
  {"0": 2020, "1": 12, "2": 12}
]

Comments

0

The following solutions are inspired by this answer. The output content in the examples below is generated using the following input data:

data.csv

Id,name,age,height,weight
1,Alice,20,62,120.6
2,Freddie,21,74,190.6
3,Bob,17,68,120.0

In case you would like to pretty print the output in the examples given below, you could use the following:

import json
print(json.dumps(data, indent=4, sort_keys=True, default=str))

Solution 1 - Use csv.reader() to get a list of list objects

import csv


def read_csv(filepath: str):
    data = []
    with open(filepath, 'r') as f:
        reader = csv.reader(f, delimiter=',')
        for row in reader:             
            data.append(row)  
        
    return data
        
        
data = read_csv('data.csv')
print(data)

Output

[['Id', 'name', 'age', 'height', 'weight'], ['1', 'Alice', '20', '62', '120.6'],
 ['2', 'Freddie', '21', '74', '190.6'], ['3', 'Bob', '17', '68', '120.0']]

To print the data line by line, you could also use the following:

print('\n'.join(', '.join(map(str,row)) for row in data))

Output:

Id, name, age, height, weight
1, Alice, 20, 62, 120.6
2, Freddie, 21, 74, 190.6
3, Bob, 17, 68, 120.0

Solution 2 - Use csv.DictReader() to get a list of dict objects

import codecs
import csv


def read_csv(filepath):
    with open(filepath, 'rb') as f:
        reader = csv.DictReader(codecs.iterdecode(f, 'utf-8'))
        data = list(reader)
        
    return data
        
        
data = read_csv('data.csv')
print(data)

Output

[{'Id': '1', 'name': 'Alice', 'age': '20', 'height': '62', 'weight': '120.6'}, 
 {'Id': '2', 'name': 'Freddie', 'age': '21', 'height': '74', 'weight': '190.6'}, 
 {'Id': '3', 'name': 'Bob', 'age': '17', 'height': '68', 'weight': '120.0'}]

Solution 3 - Use csv.DictReader() to get a dictionary of dict objects based on a primary key

import codecs
import csv


def read_csv(filepath):
    data = {}
    with open(filepath, 'rb') as f:
        reader = csv.DictReader(codecs.iterdecode(f, 'utf-8'))
        for row in reader:             
            key = row['Id']  # Assuming a column named 'Id' to be the primary key
            data[key] = row  
        
    return data
        
        
data = read_csv('data.csv')
print(data)

Output

{'1': {'Id': '1', 'name': 'Alice', 'age': '20', 'height': '62', 'weight': '120.6'}, 
 '2': {'Id': '2', 'name': 'Freddie', 'age': '21', 'height': '74', 'weight': '190.6'}, 
 '3': {'Id': '3', 'name': 'Bob', 'age': '17', 'height': '68', 'weight': '120.0'}}

Pretty printed output (using the code mentioned at the top of this answer):

{
    "1": {
        "Id": "1",
        "age": "20",
        "height": "62",
        "name": "Alice",
        "weight": "120.6"
    },
    "2": {
        "Id": "2",
        "age": "21",
        "height": "74",
        "name": "Freddie",
        "weight": "190.6"
    },
    "3": {
        "Id": "3",
        "age": "17",
        "height": "68",
        "name": "Bob",
        "weight": "120.0"
    }
}

Comments

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.