You should add country of residence, otherwise you might get distorted data. Income varies greatly from country to country.
For example you might have someone with a lot of experience and education but low income in India, and then have someone with not as much education or experience but a lot higher income in the US.
It would be cool if Google had an option for strictly numeric fields, and then auto-calc averages and distribution graphs etc. Without that restriction input, there is a curation problem, e.g. ages "19-30".
But google doesn't make any money from this, so it's hard for them to iterate as quickly as (e.g.) one of us could. I guess the deeper problem is: is there a revenue model for simple survey generation?
There's Wufoo http://wufoo.com/gallery/templates/surveys/business-demograp... (a YCer, no less). I'm sure they'd do a better job, but they won't let you get started without signing up. Google's little form tool is disrupting these guys, albeit at the very low end.
Cool that we have some results, but the display of the age and wage values seems simply to be a list. Is that normal? Having read another response, I guess it is.
I assume you'll be doing your own analysis as well and will deal with the problem in the summary you post.
Yes, the data will be available to the general public on my website, and if anyone wants to mirror it they may.
I might hold on to the data overnight just to get a good story together, but other than that, I have every intention of sharing. I have no doubt there are people who could make a lot more of this data than I will. I'm excited to see the results.
I was planning to wait until the link drops off the front page since people are still taking the survey. I think I'll take the other posters advice however, and let everyone see the summary now.
Becuase the university signs my paycheck, not the state. Some of the money to fund me comes from Federal grants, some from federal contracts, some from private contracts, some from private donations and some from state funds. (I work on several projects). So while the institution is subject to state regulation, and can be claimed by the state it is a separate entity. Thats why claiming government is wierd, it is not the only source of my income. Most people in academia have a similar mashup of funding.
If your paycheck comes from a government organization, then you work for the government.
If you were a software engineer for the government would you complain that you can't state that you work for a business?
I worked on it for a few hours to get it down to some simple things that might co-relate to a persons income -- I only needed simple data to test on. I think it will still be interesting to analyze nonetheless. Perhaps in a few months we could put together a slightly better survey? My email is in my profile if you have suggestions, but with over 300 responses so far (WOW!) it'd be hard to change this current survey.
What's the difference between a corporation ("the term denotes a body corporate formed to conduct business" - Wikipedia) and a (large or small) business?
I'd say that a "corporation" is a "large business" that has offices in more than one country and is traded on an exchange/has investors in some other way.
I would answer "who knows" to both of those things ('s not like i'm in HR) but I'd still be able to get a sense of whether I was working at a business or a corporation from the culture.
The data being collected is purely demographic. And I don't think that the data set would be very big either. Around a thousand entries even if most of the HNers take interest?
I am not able to imagine anything significant that can be done by using this data as training set in a machine learning algorithm. Is it possible for you to elaborate a little bit on what you intend to train your algorithms to achieve?
This is the actual assignment. The data set doesn't need to be huge, it's more about the ideas than refining the algorithms (so we can cover more ground). The more entries the better, and so far (in one hour) I've gotten 400. I suspect this may work out for (some) interesting data.
It's a shame you didn't take the opportunity to gather ZIP codes. That would allow for some adjustment of your salary figures based on avg income modifiers for an area (hell, you could use the GS modifiers the Govt. uses even for ease-of-use).
But this is a great idea, I commend you on the experiment!
I think to perform the learning you need to have some property of a hacker that you will learn. Maybe you need also some non-hackers to fill your survey so that you can perform classification.
Directly: How do you plan to use ML on this dataset? What will you learn?
Running a startup = small business rather than self employed, I guess, but possibly disproportionately high working hours and low income for the actual founders...
I had a play with the google survey (a nice viral touch of having a "Create your own form" link - simple and unobtrusive, but notable because there's so little other text).
It's a very nice little app, you can view the responses right from the creation page (either with cute little pie-charts, or as a spreadsheet).
But I agree there's a shameful bug in it for styling: when I click "Theme: Plain", I get a "Not Found Error 404"
> You know the answer to this question.
> Wow- it is so impressive that you run
> two businesses in two industries. You're
> my hero.
Well, my sarcasm detector just got triggered, but I'll answer seriously anyway.
How many times have you been trying to complete a form, only to discover that there aren't enough places for your phone number, or your name is too long to fit in the provided box, or the field for the address will only take 32 characters, etc. It's a pitfall that so many web sites fall into again and again - not anticipating the wide range of inputs that it will have to deal with.
This is a major issue, and I don't think I've ever seen it covered properly in a college/school/university course. The OP is doing a project - it's maybe a good thing he sees this problem early and learns to deal with it.
And I really don't see why you had to be so obnoxiously childish. Perhaps you could explain what you resent about my running two businesses in two different industries. After all, if you're an employer programmer you probably take home more money than I do.
For example you might have someone with a lot of experience and education but low income in India, and then have someone with not as much education or experience but a lot higher income in the US.