How to fix this 'ValueError: too many values to unpack'

Refresh

December 2018

Views

1.1k time

1

I'm trying to parse a large .tsv-File with 500k rows into single .txt files for each row. My Script is now able to iterate till id 11533, then it stops and prints out following error:

File "goldfish.py", line 18, in filename, text = prev_row

ValueError: too many values to unpack

My Script looks like that:

import csv
import sys

csv.field_size_limit(sys.maxsize)

with open('id_descr.tsv', 'rb') as f:
reader     = csv.reader(f, delimiter='\t')
fieldnames = next(reader)

prev_row = next(reader)

for row in reader:
    if not row:
        continue
    if len(row) == 1 or not row[0].isdigit():
        prev_row[-1] += row[0]
    else:
        filename, text = prev_row
        filename = filename + ".txt"
        with open(filename, 'wb') as output:
            output.write(text)
            output.write('\n')
            prev_row = row

The Following .tsv file contain the last iterated row (id=11533) and the following row, which isn't parsed (thats the point where the script stops) https://www.dropbox.com/s/8mizthp8n0kduax/sample.tsv?dl=0

So my questions are:

is there a way to ignore this kind of error, or how do I have to change the script to avoid this error?

3 answers

1

If catching and possibly discarding/logging the anomaly is not an option, extract the data with slices rather than unpacking.

    else:
        filename = prev_row[0]
        text = '\t'.join(prev_row[1:])
0

I'm not sure I get your question fully. Why can't you just do something like this?

import csv
import sys

with open('sample.tsv', 'rb') as f:
    reader = csv.reader(f, delimiter='\t')
    fieldnames = next(reader)

    orig_stdout = sys.stdout
    stuff = []
    rowNUM = 0
    for row in reader:
        if len(row)==0 #some checking
            pass
        #

        sys.stdout = open('file'+str(rowNUM), 'w') #direct output here
        print row # print to file specified above

        stuff = stuff + row # make an ongoing list?
    #
2

Line 3 of your input file has three tab characters, delimiting four fields:

  • 11534
  • "The Shift[…]for the World"
  • "I don’t get[…]Great Flash of "
  • "2012. I was[…]free with lyrics "

I don’t know how best you’d work around it, since it seems to be a problem with your data.