[SVCS-531] Separate csv and tsv function and remove use of sniff by AddisonSchiller · Pull Request #285 · CenterForOpenScience/modular-file-renderer

AddisonSchiller · 2017-10-11T13:50:56Z

refs: https://openscience.atlassian.net/browse/SVCS-531

Purpose

Csv.sniff could cause random characters or spaces to be used
as the delimiter. Separating these functions and using a hard coded
dialect fixes this display problem.

Summary of changes

Broke up the csv/tsv function into two new ones, with the bulk of it in a helper function.
Instead of using Csv.sniff, just use csv.excel, or csv.excel_tab to set the delimiter. This stops things like spaces, characters, numbers etc from being detected as the delimiter.

QA/Testing notes

There is a zip file of csv/tsv files on the JIRA ticket that display the error. Trying them on staging/prod will show you what the error looks like. These display errors should not be present with the changes.

cslzchen

As discussed, looks good. I will locally test it.
Double check tests.

coveralls · 2017-11-16T19:21:41Z

Coverage increased (+0.2%) to 68.227% when pulling 5704d33 on AddisonSchiller:feature/tsv-svcs-531 into 8bb2dd4 on CenterForOpenScience:develop.

AddisonSchiller · 2017-11-16T19:41:10Z

@cslzchen Added new test file to look over

coveralls · 2017-11-16T19:42:30Z

Coverage increased (+0.2%) to 68.227% when pulling 9554e49 on AddisonSchiller:feature/tsv-svcs-531 into 8bb2dd4 on CenterForOpenScience:develop.

cslzchen

As discussed, there is some error handling issues (not from your code though) but worth taking a look at. 🔥 🔥 .

coveralls · 2017-11-22T17:42:11Z

Coverage increased (+0.2%) to 68.245% when pulling 50fd471 on AddisonSchiller:feature/tsv-svcs-531 into 8bb2dd4 on CenterForOpenScience:develop.

coveralls · 2017-11-22T18:03:52Z

Coverage increased (+0.3%) to 68.336% when pulling 20405f6 on AddisonSchiller:feature/tsv-svcs-531 into 8bb2dd4 on CenterForOpenScience:develop.

Csv.sniff could cause random characters or spaces to be used as the delimiter. Separating these functions and using a hard coded dialect fixes this display problem.

coveralls · 2017-11-22T18:13:11Z

Coverage increased (+0.3%) to 68.336% when pulling 93235e1 on AddisonSchiller:feature/tsv-svcs-531 into 8bb2dd4 on CenterForOpenScience:develop.

coveralls · 2017-11-22T18:15:11Z

Coverage increased (+0.3%) to 68.336% when pulling 93235e1 on AddisonSchiller:feature/tsv-svcs-531 into 8bb2dd4 on CenterForOpenScience:develop.

cslzchen

Looks good and works as expected. 🎆 🎆 Move to PCR.

coveralls · 2017-11-22T19:02:49Z

Coverage increased (+0.3%) to 68.336% when pulling b1083c5 on AddisonSchiller:feature/tsv-svcs-531 into 8bb2dd4 on CenterForOpenScience:develop.

felliott

@TomBaxter, sending this one to you. Needs some fixups:

some .seek()s need to switch back to .read()s
what are we losing by removing sniffing from the csv detector?
remove quoting for tsv
minor error handling de-duplication

felliott · 2017-12-12T19:19:51Z

mfr/extensions/tabular/libs/stdlib_tools.py

-    :return: tuple of table headers and data
-    """
-    data = fp.read(2048)
+    data = fp.seek(2048)


seek just returns the offset the pointer was advanced to. This should probably be read.

Complete. PR308

felliott · 2017-12-12T19:26:37Z

mfr/extensions/tabular/libs/stdlib_tools.py

+    return parse_stdlib(reader)

+def tsv_stdlib(fp):
+    data = fp.seek(2048)


seek => read

Complete. PR308

felliott · 2017-12-12T19:28:42Z

mfr/extensions/tabular/libs/stdlib_tools.py

-    except csv.Error:
-        dialect = csv.excel
-    else:
        _set_dialect_quote_attrs(dialect, data)


I don't think I've ever seen a tsv with quoting in it. Has anyone else? Maybe we leave quoting alone until it's reported as an issue.

Complete. PR308

felliott · 2017-12-13T03:14:04Z

mfr/extensions/tabular/libs/stdlib_tools.py

+    # on certain exceptions
+    except Exception as e:
+        raise TabularRendererError('Cannot render file as csv/tsv. '
+                           'The file may be empty or corrupt',


Nitpick: indentation is weird here.

felliott · 2017-12-13T03:18:15Z

mfr/extensions/tabular/libs/stdlib_tools.py

+                                       'The file may be empty or corrupt',
+                                       code=HTTPStatus.BAD_REQUEST,
+                                       extension='csv') from e
+


Since this is identical to the error raised in the next stanza, could we just throw the error instead?

felliott · 2017-12-13T03:27:37Z

mfr/extensions/tabular/libs/stdlib_tools.py

                'Please download and view it locally.',
-                code=400,
+                code=HTTPStatus.BAD_REQUEST,
                extension='csv',


Since both the csv and tsv parser call this, can we make sure the correct extension is being passed?

Complete. PR308

felliott · 2017-12-13T03:54:00Z

mfr/extensions/tabular/libs/stdlib_tools.py

    fp.seek(0)
+    # set the dialect instead of sniffing for it.
+    # sniffing can cause things like spaces or characters to be the delimiter
+    dialect = csv.excel


Hmmm. I'm not sure how I feel about this. I like that it solves the issue of really long lines, but it bugs me a bit that we're throwing out support for alternative delimiters. If we use the csv.excel dialect, do we still support tab- and pipe-delimited text? If not, can we document that in a comment, so we'll know what to fix if we encounter it?

Complete. PR308

TomBaxter

This PR has been merged in Complete. PR308

TomBaxter · 2017-12-21T16:59:11Z

mfr/extensions/tabular/libs/stdlib_tools.py

                'Please download and view it locally.',
-                code=400,
+                code=HTTPStatus.BAD_REQUEST,
                extension='csv',


Complete. PR308

TomBaxter · 2017-12-21T19:31:54Z

mfr/extensions/tabular/libs/stdlib_tools.py

-    :return: tuple of table headers and data
-    """
-    data = fp.read(2048)
+    data = fp.seek(2048)


Complete. PR308

TomBaxter · 2017-12-21T19:32:03Z

mfr/extensions/tabular/libs/stdlib_tools.py

    fp.seek(0)
+    # set the dialect instead of sniffing for it.
+    # sniffing can cause things like spaces or characters to be the delimiter
+    dialect = csv.excel


Complete. PR308

TomBaxter · 2017-12-21T19:32:08Z

mfr/extensions/tabular/libs/stdlib_tools.py

+    return parse_stdlib(reader)

+def tsv_stdlib(fp):
+    data = fp.seek(2048)


Complete. PR308

TomBaxter · 2017-12-21T19:32:14Z

mfr/extensions/tabular/libs/stdlib_tools.py

-    except csv.Error:
-        dialect = csv.excel
-    else:
        _set_dialect_quote_attrs(dialect, data)


Complete. PR308

cslzchen · 2018-01-05T15:10:11Z

PR closed and replaced by #308.

cslzchen added the Code Review label Nov 15, 2017

cslzchen reviewed Nov 15, 2017

View reviewed changes

cslzchen requested changes Nov 21, 2017

View reviewed changes

AddisonSchiller added 2 commits November 22, 2017 13:12

Separate csv and tsv function and remove use of sniff

617306f

Csv.sniff could cause random characters or spaces to be used as the delimiter. Separating these functions and using a hard coded dialect fixes this display problem.

Add tests

0ff8589

AddisonSchiller force-pushed the feature/tsv-svcs-531 branch from 94d40a2 to 93235e1 Compare November 22, 2017 18:12

cslzchen approved these changes Nov 22, 2017

View reviewed changes

cslzchen added Final Review and removed Code Review labels Nov 22, 2017

Better error handling

b1083c5

AddisonSchiller force-pushed the feature/tsv-svcs-531 branch from d9dc5f3 to b1083c5 Compare November 22, 2017 18:18

felliott requested changes Dec 13, 2017

View reviewed changes

TomBaxter reviewed Dec 21, 2017

View reviewed changes

cslzchen closed this Jan 5, 2018

Conversation

AddisonSchiller commented Oct 11, 2017

Purpose

Summary of changes

QA/Testing notes

Uh oh!

cslzchen left a comment

Choose a reason for hiding this comment

Uh oh!

coveralls commented Nov 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

AddisonSchiller commented Nov 16, 2017

Uh oh!

coveralls commented Nov 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cslzchen left a comment

Choose a reason for hiding this comment

Uh oh!

coveralls commented Nov 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Nov 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Nov 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coveralls commented Nov 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cslzchen left a comment

Choose a reason for hiding this comment

Uh oh!

coveralls commented Nov 22, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

felliott left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

TomBaxter left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cslzchen commented Jan 5, 2018

Uh oh!

coveralls commented Nov 16, 2017 •

edited

Loading

coveralls commented Nov 16, 2017 •

edited

Loading

coveralls commented Nov 22, 2017 •

edited

Loading

coveralls commented Nov 22, 2017 •

edited

Loading

coveralls commented Nov 22, 2017 •

edited

Loading

coveralls commented Nov 22, 2017 •

edited

Loading

coveralls commented Nov 22, 2017 •

edited

Loading