Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interesting how negative your reaction is. Also how far off target all that anger is.

I am not selecting for a tribe, I am selecting for a job. The questions are loaded, of course. Among the many duties, the jobs do require processing large files, sometimes with cut, Python or C. I want the candidate to use the most appropriate tool as needed. I'd rather not have people implement functionality that already exists in the 'comm' command.

Of course, I want the candidate to ask me what the column separator is. That's why the question is formulated that way.

The right answer will depend on the column separator. Proposing the UNIX cut if the file is CSV is not such a good answer, but for tab-separated files, it is just fine. If the file is CSV and they tell me about cut, my next question would be if that is a good universal solution for CSV files in general.

When someone that knows about the pitfalls of using cut when parsing CSV it shows me they have indeed had experience with that.

Do you see why this question is the best... the possibilities are endless, and the rabbit hole much deeper than it may seem



> The right answer will depend on the column separator. Proposing the UNIX cut if the file is CSV is not such a good answer, but for tab-separated files, it is just fine. If the file is CSV and they tell me about cut, my next question would be if that is a good universal solution for CSV files in general.

TSV and CSV have the same limitations. A tab-separated file could still have tabs inside a field depending on the quoting convention. Either separator could be used with cut. I can't believe you are so confident in your partially truthful answers.


The only true Unix column format is ascii delimited text.

Oddly no one has heard of that, the only reason why I found out about it is because I had to read in punched tapes with 7 character ascii from an experiment done in the 80s during my undergrad.

https://en.wikipedia.org/wiki/Delimiter#ASCII_delimited_text


Fascinating, thanks. In all of the large projects I've worked on, the CSV format variants and their inconsistencies have caused disasters. Who knew that Pandas and Spark handle CSV settings differently by default and spark has a hard time with newlines in its CSV output. CSVs inconsistencies result in a lot of data corruption, doubly so in large teams.

Replacing CSV with flat JSON or parquet depending on the use case has been a good move for avoiding these issues. The risks of CSV are usually just too high.


Quad-spaced facepalm.... How come these were dropped out of usage ?!?


Seems like you're completely missing the point of the interview question which is to see how someone would approach a problem, investigate its requirements, propose a solution, examine its drawbacks and how they would take feedback on that solution and its possible advantages and disadvantages.


I did not miss that. I was pointing out that glofish said he asked questions like these repeatedly to candidates, but he doesn't even understand the basics of the subject matter. That would not make for a good interview.


Bingo!


As someone with background in Mechanical Engineering, software interview questions as above seem so wild and absurd to me. Why would you care about someone knowing the intricacies of CSV or bash? I would expect a good engineer to provide best possible solution to your problem within an hour of googling / research. I really don’t see the point of asking such specific questions on interviews as it has no correlation with finding a good engineer. I wish software field would move closer to interview process of other engineering disciplines but it seems to be getting wilder each year.


Because it's used in the real world all the time?

The amount of times I've had to write my own sorting algorithm in my career: 0.

The amount of grep/sed/awk I've used? Countless.

Someone who is familiar with how powerful and flexible these tools are is likely to accomplish something that can benefit from them quicker than someone who isn't aware of them.

Also in my experience software devs that shy away from the command line because they don't like it rarely pan out.


Come on man ` cut -d "," ` I like the question and how you think about the rabbit hole, but you need to sharpen your knife A LOT before being able to ask such questions, you need to be prepared for all the kinds of answers, which might be right even if you don't have a clue about what the candidate is talking about...


But that is exactly why you can't use cut in general for parsing CSVs - the usual CSV syntax includes quoted fields, in which commas don't separate the values. But cut only examines the input charwise:

    % echo 'a,"b,c",d' | cut -f 1-3 -d ","
    a,"b,c"
I don't think there's a good solution to this using standard tools, but I'm sure there are various CSV packages available. (Which I've never used! - I'm just familiar with this problem from seeing people try to work with CSV data in code using exactly the char-by-char approach taken by cut.)


I think the interviewer-escape-hatch for that is to only have delimiter commas in the file (so cut will work in this situation), and withhold that information to see if the candidate asks about it. If they do ask, that's points in their favor.


I had to read a few times to figure out what “column-oriented” meant before figuring it out. May have not have been able to do it under pressure in an interview. If you’d said “ordered by column” I’d have understood much more quickly.

i.e. Be careful with your phrasing. That is a bias in itself.


I would assume column oriented would mean the data would be formatted something like:

    R1C1,R2C1,R3C1
    R1C2,R2C2,R3C2
So values that have the same column are clustered together. Whereas normally in a file values that have the same row are clustered together. This is what column-oriented usually means when you are referring to databases.

I guess this is not what the questioner meant because they referred to using cut | sort | head as a solution. Though, I don't understand why head would be at the end of either problems solution so maybe I'm missing something. head could be a useful way of peeling out the column you want in the column-oriented problem.


Me too, I thought a “column oriented” file was a file where the data for column #1 comes before #2; ie, structure of arrays rather than array of structures. “cut” doesn’t work with that afaik. I’m not sure I’d ask for clarification here (as to me, this is what “column oriented” means), and probably fail the interview.


I don't see anger or negativity. I see valid feedback for identifying bias with passion. To paint it in a negative light is to introduce bias.


Well the person was exposed as someone just validating themselves in interviews, so legitimate criticism works be painted as an attack.

All tech interviews start with a need to legitimize and reinforce the interviewers as successful and talented ....

Even if we're basically all terrible


Can I just use csvcut?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: