The Bullshit Web

cheezymoogle · on July 31, 2018

I've said this before, but it bears repeating:

Moby Dick is 1.2mb uncompressed in plain-text. That's lower than the "average" news website by quite a bit--I just loaded the New York Times front page. It was 6.6mb. that's more than 5 copies of Moby Dick, solely for a gateway to the actual content that I want. A secondary reload was only 5mb.

I then opened a random article. The article itself was about 1,400 words long, but the page was 5.9mb. That's about 4kb per word without including the gateway (which is required if you're not using social media). Including the gateway, that's about 8kb per word, which is actually about the size of the actual content of the article itself.

So all told, to read just one article from the New York Times, I had to download the equivalent of ten copies of Moby Dick. That's about 4,600 pages. That's approaching the entirety of George R.R. Martin's A Song of Ice and Fire, without appendices.

If I check the NY Times just 4 times a day and read three articles each time, I'm downloading 100mb worth of stuff (83 Moby-Dicks) to read 72kb worth of plaintext.

Even ignoring first-principles ecological conservatism, that's just insanely inefficient and wasteful, regardless of how inexpensive bandwidth and computing power are in the west.

EDIT: I wrote a longer write-up on this a while ago on a personal blog, but don't want it to be hugged to death:

http://txti.es/theneedforplaintext

ChuckMcM · on July 31, 2018

I like this rant, you should go the next step:

All you need to 'fix' this is a fast loading news website that gets enough paid subscribers to earn enough margin from subscriptions that you can pay for a news staff, an office, and various overheads.

That is a longish way of saying that 99.9% of the overhead in any modern web site can be traced almost entirely to the mechanisms by which that web site is attempting to extract value from you for visiting/reading.

If people would visit with a 56K modem and deal with a 3 - 10 second page load, then that is the bar. And any spare bandwidth you might have is available for the web site to exploit in some way to generate revenue. The more bandwidth between you and them, the more ways they can come up with to exploit that bandwidth for additional surveillance, ads, or analytics that will get them more money.

When you are the customer, which to say it is your purchasing of a subscription or articles is the only revenue the site needs in order to survive, then the things that retain you as a customer have the highest priority (like fast page load times, minimal bandwidth usage).

But when you are a data cow, a random bit of insight into a picture much bigger than you can comprehend, a pixel in a much larger tapestry, or an action droplet in a much larger river of action. Well then there isn't really any incentive to make your life better, as long as the machine we have milking you for data can get even a couple of molecules more of that precious data milk without scaring you out of the barn. Well we'll build right up to that limit.

komali2 · on July 31, 2018

Hilariously, the New York times tries both: you get five or so article reads (with shitloads of tracking), and then you have to pay to read more per month.

But if you're paying, the pages don't load any different. You're paying to be mined.

Joeri · on Aug 1, 2018

In paper newspapers this is the norm. You can read the front page for free, have to pay to get the rest, and the rest is still filled with ads.

The difference is the tracking. I don’t think ads are really the problem. It’s the tracking that bloats pages and intrudes on privacy, and the tracking doesn’t need to be there because other media have ads without tracking and manage just fine.

There’s a race to the bottom here. Tracking earns more revenue, so to be competitive you have to do it. Most sites won’t stop tracking until forced to by either the basic infrastructure of the web or legal requirements. I hope GDPR will lead to the disappearance of tracking, but so far most sites seem to pretend tracking is compatible with GDPR.

simsla · on July 31, 2018

Not really. You're paying for the additional content. The tracking is external to that deal.

I'm not saying I like it that way, but you're conflating two unrelated things.

StillBored · on Aug 1, 2018

This is why I've never subscribed to cable TV. I'm not going to pay for the privilege of watching 20 mins of commercials an hour.

dragonwriter · on Aug 1, 2018

That's not what you are paying for with basic cable; you get that with broadcast for free.

What you are paying for is a broader choice in the filler between the commercials.

thegayngler · on July 31, 2018

I agree in theory. However, I haven't noticed the slowness in their website and the ads are well done and blend in with the webpage. They may have to redesign/rearchitect their whole website to get what you are asking for.

It should be noted that traditional newspapers include ads alongside news content and no one complains. In fact people used to sift through the Sunday NYT simply for the ads.

rootlocus · on Aug 1, 2018

> the ads are well done and blend in with the webpage

That's called "native advertisement" and it's supposed to trick you into thinking you're reading a genuine article instead of an ad. I actively avoid sites that do this.

> In fact people used to sift through the Sunday NYT simply for the ads.

Back when people couldn't google for stuff, and the ads were useful because they mostly came from local businesses you actually needed once in a while.

LeoNatan25 · on July 31, 2018

There is no such thing as "the ads are well done and blend in with the webpage". Especially on a website where you pay for a subscription.

TeMPOraL · on Aug 1, 2018

Moreover, there is a tipping point, beyond which the ad blends too well and becomes plain deception. This is even more of a problem for journalistic publications. See also: native advertising.

LeoNatan25 · on Aug 1, 2018

Indeed. In print, you had dedicated pages to advertisement. Not optimal, but much easier to ignore. Nowadays, you never know if an “article” is simply a marketing agenda.

thegayngler · on Aug 2, 2018

Yes because the signs people put around the "native ads" literally stating that it's an "ad" or "sponsored content" are easily ignorable. Get real. People pay for a newspaper with ads in it. You still have to sift through it. Ignore it? You literally have to turn the page or spend 5 minutes pulling the ads out of the newspaper if you don't want them and in some cases there isn't a way to escape the ad with print because part of the article is there with the ad. Ad block won't save you then.

Aeolun · on Aug 1, 2018

That's no different from paying for a real newspaper subscription though.

But I guess you could say that you are paying for delivery instead.

komali2 · on July 31, 2018

Tbh they could email me articles plaintext and I'd happily hand over my money.

cheezymoogle · on July 31, 2018

Imagine that, a digital newspaper in your digital mailbox!

TeMPOraL · on July 31, 2018

You mean sending you news in a digital letter?

cheezymoogle · on July 31, 2018

How intriguing! I would like to subscribe to your... how do you say... letter of news.

itronitron · on July 31, 2018

but a letter of news may be too small, maybe we can remove that size constraint and call it 'a paper of news' that would be delivered

flukus · on Aug 1, 2018

Maybe they could send multiple letters and I could have some kind of client that shows me the headlines and allows me to open the articles I'm interested in.

ealhad · on Aug 1, 2018

That would be amazing. How come Google hasn't invented something like that yet?

DrScump · on Aug 1, 2018

Over 20 years ago, the San Jose Mercury-News offered exactly such a service.

Called Newshound, it let you set up to face sets of keywords (in the basic $5/month subscription), and it would email you the plain text of every article that matched the criteria, whether generated within the publisher network or from wire services.

viaken · on Aug 1, 2018

It would be trivial to write a script to scrape text.npr.org and send it to you.

Or you can just visit it, I suppose.

MaxBarraclough · on Aug 1, 2018

Firefox and Safari each have a 'Reader Mode' which does exactly what you want: presenting a web-page in the absence of any web design.

It's really the ultimate condemnation of modern web design that this feature is so useful.

Edit: won't help the data-consumption though, as I believe it can only be enabled after the page has loaded

_54qb · on Aug 1, 2018

In reply to your edit. You can use something like umatrix to block almost everything (even css) and reader mode will still work, I do this for most newspapers and works quite well.

Papirola · on Aug 1, 2018

we used to have usenet...

derekp7 · on Aug 1, 2018

And when I first got online in the early 90's, my ISP had an additional subscription option to get real newspaper-type articles delivered in a special usenet newsgroup hierarchy (can't remember the name of the news service itself though).

Endy · on Aug 1, 2018

What do you mean "used to"? Usenet still exists.

TuringTest · on Aug 1, 2018

And how good is it?

Endy · on Aug 1, 2018

Like any community, that's defined by the people in it. Some groups are excellent, some groups are mostly dead, some display varying levels of toxicity. Overall my experience hasn't been a bad one.

TuringTest · on Aug 2, 2018

What kinds of groups are there?

Given the lack of popularity and commercial support, combined with the complexity to connect a client with an available server (compared with downloading an app from the store), I'd expect that they are populated mostly by old tech people or passers-by from universities; and any special interest group would have small user base coming from that demographic profile. Are my assumptions correct?

MaxBarraclough · on Aug 1, 2018

Reader Mode, my friend.

bartread · on July 31, 2018

> That is a longish way of saying that 99.9% of the overhead in any modern web site can be traced almost entirely to the mechanisms by which that web site is attempting to extract value from you for visiting/reading.

Well, that and the fact that front-end developers just can't seem to exist without pulling in hundreds of kilobytes, or even megabytes, of JS libraries. You actually don't need all that crap to serve adverts, or really even to do tracking: people managed without it in the 90s. It's just that it's more work to get the same effect without a buttload of JS in this day and age, and most third party tracking services involve their own possibly bulky JS lib[1]. The thing is, given the slim margins on ad-serving - whilst I don't condone it - I can see why people don't bother to put in the extra effort to slim their payloads.

[1]And if you have a particularly idiotic marketing department they might want, say, tracking to be done in three or four different ways, requiring three or four different libraries/providers. This is not merely cynicism: I encountered exactly this situation at a gig a few years back.

JdeBP · on Aug 1, 2018

Enjoy a recent Hacker News discussion of a 404 error page that employed a 2.4MiB JavaScript framework and consumed significant CPU time to display.

* https://news.ycombinator.com/item?id=17383464

MrEldritch · on July 31, 2018

> "You actually don't need all that crap to serve adverts, or really even to do tracking: people managed without it in the 90s. It's just that it's more work to get the same effect without a buttload of JS in this day and age"

While I sympathize with this sentiment, this is also the entire history of computing in a nutshell. Moore's Law has driven us orders of magnitude beyond where we were when personal computers first came into existence; but Wirth's law[1] has kept pace. The laptop I'm typing this on right now has 8 GB of RAM, and that's already become pathetically tiny, pretty much the minimum viable for a consumer PC; I have to keep checking my memory usage or I'll spill over into swap (on a mechanical drive) and have to wait several minutes while my computer recovers.

Performance in computer applications fundamentally doesn't improve. Stuff gets prettier, sure, and applications do more. But things will still run about as slowly as they always have, sometimes a little worse. (There are exceptions - some things like loading programs from tape, or loading things from an HDD once SSDs were invented, were so painfully slow compared to their replacement that you'd have to actively try to write slow code to get anywhere near that performance.) It's ease of programming, flexibility, and freedom of design (in aesthetics and interface) that the advance of computing technology has always enabled. And all of those are extremely valuable in their own way, and can make applications genuinely better - even allowing qualitatively new things to come into existence that wouldn't have been feasible before - even if they don't run faster or take up less of your memory.

(To understand why, think about the development of - say - computer games since the 90s. For all that we mock poorly optimized games, how inaccessible would game development be if we required them to be coded as efficiently as Carmack built Doom? For all that mindlessly chasing "better graphics" has ballooned costs and led developers to compromise on gameplay, how many games simply couldn't be translated to 90s-era graphics without fatally compromising the experience? How many projects would never have been started if we set the skill floor for devs so high that a hobbyist couldn't just download Unity and start writing "shitty" code?)

(Or think about something like Python. Python is a perfect example of something that allows devs to massively sacrifice performance just to make programming less work. If we kept our once-higher-by-necessity standards for efficient usage of resources, something like Python's sluggish runtime would be laughable. But I think you, and I, and everyone else can agree that Python is a very good thing.)

[1] "What Intel giveth, Microsoft taketh away."

MrEldritch · on July 31, 2018

(All that being said, I'm also fairly salty about having 8 GB of RAM and a mechanical hard drive rendering my computer incredibly painful to use as technology has marched on. Discord - which I use almost exclusively as an IRC chatroom with persistent while-you-were-gone chat history, embedded media, and fun custom emotes - is an entire Electron app that eats over 100 MB minimum; Firefox is eating 750 MB just keeping this single tab open while I type this. Even with no other applications but those open, Windows 10 and assorted background processes already push me to 5.7 GB allocated. Various Windows background processes will randomly decide they'd like to peg my disk usage to 100% for ten to fifteen minutes at a time, which I imagine is because spinning rust disks are considered deprecated.

I saw a discussion on HN a few months back about a survey of computer hardware, and one dev in the comments was shocked - shocked! - to find out that the typical user didn't have 16 GB and a 4k screen. That definitely rustled my jimmies a bit.)

flukus · on Aug 1, 2018

> I saw a discussion on HN a few months back about a survey of computer hardware, and one dev in the comments was shocked - shocked! - to find out that the typical user didn't have 16 GB and a 4k screen. That definitely rustled my jimmies a bit.)

This is extremely common in dev circles, it's an area where we're completely detached from average users. Just to make the point, here is the mozilla hardware survey that shows >50% of users having 4GB or less: https://hardware.metrics.mozilla.com/ .

If we look at the more technical users on steam (https://store.steampowered.com/hwsurvey) then only ~15% of users have 4GB or less, along with 40% having 8GB.

There's a good reason macbooks top out at 16GB.

MrEldritch · on Aug 1, 2018

Oh, I recognize that Mozilla survey as actually the specific one that user was talking about! Let me see if I can track down the actual comment thread; it's probably less ridiculous than I actually remember it being.

Ah, found it. https://news.ycombinator.com/item?id=16735354

flukus · on Aug 1, 2018

I'm really surprised at the reaction to the resolution. I like 1080p for movies on my (too) big TV but for coding I was more than satisfied once we got to 1024x768 and haven't thought of it since. My home coding machine is a cheap dell at 1366x768 and I've always been happy with it.

JBiserkov · on Aug 1, 2018

I agree with everything you said. I just have a somewhat different experience with Firefox on Windows 10:

>Firefox is eating 750 MB just keeping this single tab open while I type this.

I have 127 tabs open on Firefox Quantum 61.0.1 (64 bit). It uses ~ 1100 MB spread among 7 processes. I have 6 addons enabled (Decentraleyes, Firefox pioneer, I don't care about cookies, Tab counter, Tree style tab and uMatrix).

tomc1985 · on Aug 1, 2018

Why do you hate spinning disks so much? And no they are not considered "deprecated", they're really the only way to affordably store large volumes of data. It's an old, venerable, and still-very-useful tech

c22 · on Aug 2, 2018

I'm also on a system with 8Gb of ram at the moment. Firefox is using up a hilarious 4.6Gb keeping a few dozen web pages open, but the entire rest of my Linux system, including Inkscape, qCAD, and SketchUp under Wine are using only a combined 907 megabytes. So it's possible part of your problem is just Windows 10.

michaelmrose · on Aug 1, 2018

My i3wm environment doesn't randomly start anything I don't ask it to start and runs very comfortably with 8gb of ram heck it would run fine with 4gb and no swap. Maybe you are running the wrong os.

TeMPOraL · on Aug 1, 2018

A tiling window manager won't save you when dealing with Electron apps.

I run Linux and StumpWM on my desktop, and recently I had upgrade to 12GB of RAM, because it turns out 8GB is very easy to exhaust these days. I currently have 9.3GB tied up, mostly by browser processes.

NoGravitas · on Aug 1, 2018

Yeah, this. I'm mostly using swaywm instead of Gnome in order to free up about 1 extra gigabyte of RAM for apps, but that equals about one Electron app. The only Electron app I haven't eliminated from my daily usage, though is Patchwork, so it's not so bad.

lphnull · on Aug 1, 2018

Funny thing is I use my i3wm environment on 16gb of RAM and a 4k screen. I'm actually migrating to rat poison because it's so incredibly simple and basic that it has been making me drool. I mean, look at the source code. It doesn't get much simpler than that for a tiled WM.

jshevek · on Aug 1, 2018

If you are interested in rat poison, you may also enjoy xmonad. I've used both and much preferred xmonad

noch · on Aug 2, 2018

Load the entire GHC garbage collected runtime just for my window manager? Isn't this the same philosophy that causes people to use Electron? And that results in unnecessarily large memory footprints and runtime performance penalties?

jshevek · on Aug 6, 2018

Xmonad is rock solid, lightening fast, and perfect for many who prefer to minimize their reliance on a mouse

eswoo · on Aug 2, 2018

I don't think esoteric linuxes are really necessary--I'm running vanilla Ubuntu 16.04 with 16GB RAM and top three processes are only Crashplan (~800MB), Dropbox (~460) and Chrome (327 resident, 1380 shared, per htop, with 6 tabs running). My total usage at the moment is 4.04GB out of 15.4 available, again per htop. Some of the other numbers in this thread are baffling to me.

But I don't run any Electron apps, so there's that.

But, yeah, I guess I wouldn't be able to run this same workload with 4GB RAM. That's what lubuntu is for.

MrEldritch · on Aug 1, 2018

Oh, no, I'm aware these are very much Win10 specific problems and I look at Linux people with a not insignificant degree of envy. Unfortunately, I do quite like PC gaming.

StillBored · on Aug 1, 2018

Modern gnome is a pig, some of this could be tracked back to gnome-shell's use of javascript and css styling if you were so inclined. Most linux users don't really notice it due to the fact that most x86 machines are crazy fast.

OTOH, try starting a modern full blown distro on something like an rpi instead of raspian and you will quickly discover that you _NEED_ more than 4G of ram and a lot of CPU just to start firefox. Its even worse if you don't have hardware GL acceleration.

OTOH, the lightweight desktops (lxqt, lxdt, xfce) really are..

ovao · on Aug 1, 2018

While it would seem convenient to simply switch operating systems from the popular, widespread options to...whatever i3wm is, practically speaking that is seldom possible.

askl56 · on Aug 1, 2018

I3wm is a window manager you can run on any Linux distribution

theelous3 · on Aug 1, 2018

Tiling wm does not an OS make.

_ofdw · on Aug 1, 2018

Can you run recent versions of Photoshop? How about Premiere, or Final Cut?

SolidWorks? CATIA? Matlab?

lphnull · on Aug 1, 2018

I find that the best way to do this is to just VNC to a windows or Mac dedicated slave computer to do graphic arts work. VNC is so good nowadays that I can use my QHD phone screen as a second monitor for all my Adobe apps- and the lag nowadays is almost non-existent thanks to super fast wifi.

chupasaurus · on Aug 1, 2018

FreeRDP for connecting to Windows is a good option too.

Should say that KRDC is a good RD connection manager, but it's KDE only unless you wanna install a third of it.

michaelmrose · on Aug 1, 2018

Matlab works on linux.

Premier or Final cut don't but DaVinci Resolve and Lightworks do.

Obviously if your workflow and thus your livelihood depends on a particular tool you should run a platform that runs it but I would question why someone with money would pick windows over mac.

StillBored · on Aug 1, 2018

Those are commercial apps for people with real money. OTOH, there are a number of unbeatable free apps that don't have linux ports. Fusion360 comes to mind, its not solidworks level , but its light years ahead of freecad.

kennxfl · on Aug 1, 2018

Smart marketing treads the fine line providing rich experiences for their users based on the average internet speed vs annoyance. If the service is free they should try to get 'some value' for the content.

As highlighted, some take it way too far by trying to extract 'maximum value' which ends up being counterproductive.

TeMPOraL · on July 31, 2018

Hence the rant mentioning Molochian economy, under which we operate. And that reference explains in depth why this is a very hard problem.

I like the rant too. Except maybe the bit about sending content at the speed of humans - I for one would like to take lightweight, bullshit-free content as fast as it can be sent, to pipe it to further processing on my end, in the never-ending quest to automate things in my life.

exolymph · on July 31, 2018

Background for anyone who hasn't read it: https://slatestarcodex.com/2014/07/30/meditations-on-moloch/

meowface · on July 31, 2018

I highly recommend everyone give this a read if they haven't. It's probably the best post on Slate Star Codex.

cheezymoogle · on July 31, 2018

I mean, if you're just reducing the content even further, just request that they make the reduction possible server-side and everybody wins.

TeMPOraL · on July 31, 2018

I was more thinking about e.g. running a script to fetch 3 different lightweight sites, run some personal code on it and combine the data. If the script would spend 99.9% of its time waiting on IO because of "human speeds", I wouldn't be too happy.

That said, I would be willing to bite the bullet and accept speed limits across the board if it resulted in lean web.

JetSpiegel · on July 31, 2018

I achieve this with an RSS reader, in my case Miniflux.

Runs on a RPi under my TV and I stay well below my 300MB data cap, while consuming dozens of news sources.

voltagex_ · on Aug 1, 2018

Which news sources? How did you find the ones that still provide RSS? How much of it do you actually read?

JetSpiegel · on Aug 1, 2018

I read virtually all of them. Most of them provide RSS feeds, some are a bit hidden but it's Googlable.

I started with the basics BBC, NYT, Guardian, The Intercept for general news, The Conversation for science news without sensationalism, and some tech blogs. Then I just read most of it, and follow some links to find new sources. Most of the times news start with "As reported by X", or just a link, so you can discover new sources like that.

You can also browse HN (and n-gate for the highlights) and Reddit to discover new sources.

If you add so much you can't keep up, remove some, or change the feeds into section feeds. Most online newspapers provide them. I miss Yahoo Pipes, and have yet to find a simple hosted alternative. There is also RSS Bridge for sites without feeds (Twitter, Facebook), but I still haven't found the time to set it up.

You can also add paywalled sources to read the headlines only. You can mark them as read from the index.

cheezymoogle · on July 31, 2018

I think this is why batch jobs and crontabs exist. Just do what the BBSes used to do for syncing--wait until anti-peak-time and then let loose.

nickjj · on Aug 1, 2018

I don't think 3-10 seconds should be the bar. I spent years using a 14.4k / 28k / 56k modem.

That was during the mid and late 90s.

Browsing the web where you need to wait 3-10 seconds for everything to load is not a good user experience. It's a colossal waste of time, and today we have so many more reasons to view more pages compared to back then.

We should strive for an improvement instead of trying to stick with limitations from 20 years ago.

The real problem is people developing sites now give zero fucks about resource constraints. This is exactly like lottery winners who went from being poor to having 50 mil in their pocket but then end up broke in 3 years because they have no idea how to deal with constraints.

It's also a completely different type of person who is running these sites today. Back in the day you made a site to share what you've learned. Now you have "marketing people" who want to invade your privacy, track every mouse movement and correlate you to a dollar amount instead of providing value.

mkirklions · on Aug 1, 2018

>Browsing the web where you need to wait 3-10 seconds for everything to load is not a good user experience.

Im a lone owner of a website. I have enough time to accomplish one of 3 things before January

>Finish my Finance App

>update 200 pages to have pictures load based on screen type so you can load in 3 seconds instead of 6.

>Collect and compile data to create 20 more pages, all of which my 3000 subscribers actually come to my page for.

Very quickly you can see why an extra 2 seconds of loading time is not on our mind. Its important to allocate resources effectively, changing my website to load faster is limited value added vs creating content that my users actually want.

nickjj · on Aug 1, 2018

> Very quickly you can see why an extra 2 seconds of loading time is not on our mind. Its important to allocate resources effectively, changing my website to load faster is limited value added vs creating content that my users actually want.

I understand. I'm also a sole owner of a website where I'm selling a product (video courses for software developers).

My priorities are to give as much value as possible for free and also sell some courses if I can.

According to Google's network tab the DOMContentLoaded time is about 250ms to load any page on my site (which are typically 1,000 to 5,000 word blog posts with some images). From the user's POV, the page loads pretty much instantly. Then about a second later Google Analytics and a social sharing widget pop up, but those happen after the content.

The interesting thing is I really didn't try hard to make this happen. I just stuck to server rendered templates and compressed my assets. I also made an effort to avoid heavy front end libraries and only add javascript / CSS when I needed to. I basically run a heavily modified theme based on Bootstrap with a couple of third party javascript libs (including jquery).

There's a lot of room for improvement but I haven't bothered because it seems good enough. It's very possible to get the perceived load speed of a page to be under 1 second without dedicating a lot of time to it.

tjoff · on Aug 1, 2018

So am I and I just don't buy it.

What on earth do people do to get over 1 second load time? Remember that you have to actively spend time to bloat a site.

eksemplar · on Aug 1, 2018

The most profitable news paper in my country, didn’t have a website with articles or news stories on it until earlier this year. Their page was something from the 90ies (it was probably newer), and all it really offered was info about the paper and a way to buy it.

What they do, that other Danish papers don’t, is write lengthy meaty articles that takes time to read because they actually teach you something new. There was a story on Trumps connections to Russia, it was three full pages long, and we’re talking old school news paper format, so that’s what? 10 a4 pages worth of text?

The paper only comes out once a week, because it takes time to write, but also because it takes time to read.

I’m not sure where I am going with this, I just think it’s interesting how they’ve increased their subscription amounts while not really giving two shits about the bullshit web.

They may give two shits about the bullshit web now of course, having gotten a webpage with articles. I don’t know though, I’m a subscriber, but I haven’t visited their site yet.

tomjen3 · on Aug 1, 2018

What is the name of this newspaper?

ksec · on Aug 1, 2018

Or what if, we allow tracking and ads, with native speed?

There is the problem of Network that won't / cant be fixed in any short term. Then the problem with Rendering and Reflow.

The first one being a long time before everyone gets 1Gbps internet, the 2nd being even if you have 1Gbps internet it will still be slow due to all the mini scripts.

What if the fonts were there in the first place?

What if the browser actively tack every mouse movement, links etc, bringing 80-90% of all the tracking scripts datapoint, and doing so natively, sending back the data to website as requested.

No more 3 - 5MB of Scripts downloaded per site, no more CPU running of these scripts, no more 1MB of fonts. And they don't cause the page to jank. You get Butterly smooth webpages while still getting Ads.

My biggest problem is with the idea of extending the Web via Javascript and everything should be Javascript only. Rather than extending the Web Browser native function.

Unfortunately this is an idea that Apple may not like, even if the data are anonymised.

ChuckMcM · on Aug 1, 2018

FWIW, you have effectively just described the 'app' solution.

In that solution all of the tracking and analytics, fonts, and other 'baseline' content are part of the app, which then fetches the unique content (the few Kb of story text and Mb of images) and then renders it all locally. There is even some ability to do A/B testing in that setup.

The "App" itself is basically a browser with none of the non-content UI controls that browsers normally have, that can only go to a specific URL (the content supplier).

ksec · on Aug 2, 2018

Precisely, but we don't want to be bounded by the App Store ecosystem. And we want to improve the UX of Website, which so far hasn't been great.

As a matter of fact, may be these API for tracking, analytics should be the same across Apps and Browsers.

We tech nerds keeps throwing out new terms, Web App, Web Pages, etc, but to our user they are all the same. They want to consumer information in Text, Video and Images, and in a fast and smooth way without Jank.

madeofpalk · on Aug 1, 2018

> All you need

You say that as if it's easy. "All you need" is to have a product that users what to pay for. But I think, all the various tries and attempts has proven that users aren't really that keen to pay for content online, at scale.

birksherty · on Aug 1, 2018

They don't need paid subscribers. Physical papers and tv have been supported by ads without tracking for decades. Companies pay for space or time.

Internet allowed advertisers to track so now we have this BS. They had many years to fix this. But tracking is the business of many companies like google and facebook. Now lot of people uses ad blocker and it's increasing.

ChuckMcM · on Aug 1, 2018

The price an advertiser pays to be a full page ad in the NYTimes and printed with the paper is on the order of $150,000, the price an advertiser pays to obscure your entire screen with an ad is as little as $1.00.

What you're missing is that advertising rates for television and print are several decimal orders higher than the rates for internet advertising. Why that is is more complicated than you might guess, but the economics of "printing" a newspaper by sending you the text is a couple of orders of magnitude cheaper than running a printing press. Between those two realities a lot of news web sites are being crushed.

c22 · on Aug 2, 2018

If 600,000 [0] people see that $150,000 ad, the advertiser has paid $4 per impression. At $1 CPM, 600k impressions is $60,000, but as you said this cost is at the lowest end. An ad that obscures your entire screen might cost as much as $8 CPM or more [1] and now buying the newspaper ad is sounding like a better deal.

[0] https://www.nytco.com/the-times-sees-circulation-growth-in-f...

[1] https://www.buysellads.com/buy/leaderboard/id/17/soldout/1/c...

ChuckMcM · on Aug 2, 2018

Consider the fate of Dr. Dobbs journal - print magazine (https://news.ycombinator.com/item?id=8758915)

sogen · on Aug 1, 2018

NPR text?

jiveturkey · on Aug 1, 2018

data cow.

thank you

ChuckMcM · on Aug 1, 2018

Your welcome, but it is oil89's invention of 10 months ago : https://news.ycombinator.com/item?id=15350778 I just love it though.

soared · on July 31, 2018

I don't think thats a meaningful comparison. Moby Dick is a book, written by 1 guy and maybe an editor or two. NYT employs 1,300 people.

When you read a book all you get is the text. NYT has text, images, related articles, analytics, etc. Moby Dick doesn't have to know what pages you read. NYT needs to know how long you spent, on which articles, etc. They need data to produce the product and you can only achieve that with javascript tracking pixels (Server logs aren't good enough).

If Moby Dick was being rewritten and optimized every single day it would be a few mb. Its not, so you can't compare the two.

Yes NYT should be lighter, no your comparison is not meaningful. A better comparison would by Moby Dick to the physical NYT newspaper.

throwawaymath · on July 31, 2018

> NYT needs to know how long you spent, on which articles, etc. They need data to produce the product and you can only achieve that with javascript tracking pixels (Server logs aren't good enough).

No they don't. They really don't need to know any of that. They don't even get a pass on tracking because they're providing a free whatever - I pay for a subscription to the NYT. The business, or a meaningfully substantial core of it, is viable without tracking.

It would be nice if the things I pay for didn't start stuffing their content with bullshit. What and who do I have to pay to get single second page loads? It's not a given that advertising has to be so bloated and privacy-invasive. Various podcasts and blogs (like Daring Fireball) plug the same ad to their entire audience each post/episode for set periods of time. If you're going to cry about needing advertising then take your geographic and demographic based targeting. But no war of attrition will get me to concede you need user-by-user tracking.

You want me to pay for your content? Fine, I like it well enough. You want to present ads as well? Okay sure, the writing and perspectives are worth that too I suppose. But in addition to all of this you want to track my behavior and correlate it to my online activity that has nothing to do with your content? No, that's ridiculous.

notheguyouthink · on July 31, 2018

> No they don't. They really don't need to know any of that. They don't even get a pass on tracking because they're providing a free whatever - I pay for a subscription to the NYT. The business, or a meaningfully substantial core of it, is viable without tracking.

Clearly they disagree. Or maybe you should let them know that they don't need that.

To say it without sarcasm, what you feel you are entitled as a paying customer and what they feel they need/want to understand their customers are clearly at odds. Ultimately, what you think matters nothing in isolation and what they think matters nothing in isolation. What you two agree upon, is the only thing that matters. That is to say, if you think they shouldn't track you but you use their tracking product anyway, you've compromised and agreed to new terms.

I imagine you could come up with a subscription that would adequately compensate them for a truly no tracking experience. But I doubt you two would agree on a price to pay for said UX.

throwawaymath · on July 31, 2018

You're correct of course, but I don't really see how this isn't a vacuous observation. Yes clearly our perceptions are at odds, but that has nothing to do with the reality of whether or not they need to be doing that tracking. Obviously they think they need to, or they wouldn't do it. But I think I've laid out a pretty strong argument that they actually don't need to, which leads me to believe that they actually haven't considered it seriously enough to give it a shot.

Would they be as profitable? Maybe, maybe not. Would they become unprofitable? No, strictly speaking. I'm confident in that because the NYT weathered the decline of traditional news media before the rise or hyper-targeted ads, and because I've maintained a free website in the Alexa top 100,000 on my own, with well over 500,000 unique visitors per day. That doesn't come close to the online audience of a major newspaper, but it's illustrative. There is a phenomenal amount of advertising optimization you can do using basic analytics based on page requests and basic demographic data that still respects privacy and doesn't track individual users. I outlined a few methods, such as Daring Fireball's.

Maybe instead of this being a philosophical issue of perspective between a user and an organization, it's an issue of an organization that hasn't examined how else it can exist. Does the NYT need over 10,000 employees? Is there a long tail of unpopular and generally underperforming content that nevertheless sticks around, sucking up money and forcing ever more privacy-invasive targeting? If the NYT doesn't know its audience well enough to present demographic-targeted ads on particular articles and sections, what the hell is it doing tracking users individually? It's just taking the easy way out and giving advertising partners the enhanced tracking they want. But they don't need to do that, and whether or not they think they need to do it is orthogonal to the problem itself.

notheguyouthink · on Aug 1, 2018

> You're correct of course, but I don't really see how this isn't a vacuous observation. Yes clearly our perceptions are at odds, but that has nothing to do with the reality of whether or not they need to be doing that tracking. Obviously they think they need to, or they wouldn't do it. But I think I've laid out a pretty strong argument that they actually don't need to, which leads me to believe that they actually haven't considered it seriously enough to give it a shot.

It most definitely is. But so is the word need, in this context. How would we define what they need to do, and what they don't need to do?

My argument is simply such that, of course they don't need to (by my definition), but nothing will change that unless they see a different, more lucrative offer. Ie, "oh hey, here's 2 million readers who will only read the page in plain html and will pay an extra $20/m". It just seems like a needless argument, as I don't believe there's anything that can change their behavior without us changing ours. Without the market changing.

Rather, I think the solution lies not in them, but in you. In us. To use blockers and filters to such an extreme degree that it's made clear that UX wins here, and they need to provide the UX to retain the customers.

Thus far, we've not done enough to change their "need". If a day comes that they do need to stop tracking us, well, they'll either live or die. But the problem, and solution, lies in us. My 2c.

TeMPOraL · on July 31, 2018

> What you two agree upon, is the only thing that matters.

That's precisely why many of us use (and promote the use of) adblockers and filtering extensions.

VLM · on Aug 1, 2018

Classic narrowcasting mistake that dying companies make.

Statista claims 2.3 million digital subscribers. NYT is trying to milk that 2.3M for everything they got, squeeze the last drops of blood from the stone while they still can.

That's a great way to go out of business, when 99.97% of the world population is not your customer and your squeezing labors are not going to encourage them to sign up.

If you hyperoptimize to squeeze every drop out of a small customer base, eventually you end up with something like legacy TV networks where 99% of the population won't watch a show even for free, and the tighter the target focus on an ever shrinking legacy audience, the smaller the audience gets, until the whole house of cards collapses.

Its similar to the slice of pie argument; there are many business strategies that make a pie slice "better" at the price of shrinking it, and eventually the paper-thin slice disappears from the market because the enormous number of the employees can't eat anymore, but that certainly will be the most hyperoptimized slice of pie ever made, right before it entirely disappears.

NYT is going to have a truly amazing spy product right before it closes.

wuliwong · on July 31, 2018

Why is that doubtful? There's all kinds of examples of tiered subscriptions in the world. I think it would be doubtful because the NYT wouldn't want to explicitly admit all the tracking they are doing.

notheguyouthink · on July 31, 2018

> Why is that doubtful? There's all kinds of examples of tiered subscriptions in the world. I think it would be doubtful because the NYT wouldn't want to explicitly admit all the tracking they are doing.

Many reasons, one of which you said. What would the price tag be for them to admit all they are tracking?

DmenshunlAnlsis · on July 31, 2018

Currently the price is free, and comes bundled with uMatrix, and a cookie flush. I’d like to pay the NYT for their journalism, but only with money, not the ability to track me. As a result they get no money, and no tracking.

notheguyouthink · on Aug 1, 2018

> Currently the price is free, and comes bundled with uMatrix, and a cookie flush. I’d like to pay the NYT for their journalism, but only with money, not the ability to track me. As a result they get no money, and no tracking.

You misunderstood me. I mean, what would they like you to pay them, for them to be 100% transparent about what they're doing for tracking, what their advertisers are doing and who they are, and possibly stopping all that entirely. Ie, what is it worth to them.

koolba · on July 31, 2018

Interestingly if you pay them, and thus are logged in when you view an article, then they can better track you.

In contrast if you never sign up, disable JS, and periodically clear your cookies, then the entire site works fine and none of the third party trackers work. At best they can link your browser user agent and IP to a hit on the server side.

closeparen · on July 31, 2018

NYT needs to produce and recommend content that people find engaging to continue earning their subscription dollars.

The idea that tracking is purely or primarily there to support a business model of selling user data is a strawman invented by self-righteous HNers. You need to know what parts of your product are effective to make it competitive in today’s marketplace.

bumholio · on July 31, 2018

90% of that can be accomplished with server-side stats. Do you really need to track mouse movements and follow readers with super-cookies across the web to find out what articles people find engaging on your site?

> The idea that tracking is purely or primarily there to support a business model of selling user data

Purely, no. Primarily? You can bet your sweet ass.

chickenfries · on July 31, 2018

I agree in general but there are some things which I don’t see going away any time soon that publishers need. Online advertisers want to know that their ads are being viewed by a human and not a bot, and that they were on screen for long enough and that the user didn’t just scroll past. Publishers want to know how far down you make it in their article, so they know where to put the ads in the body of the article.

throwawaymath · on July 31, 2018

I'm not accusing anyone of selling my data and I'm not trying to champion a crusade against the entire advertising industry. I'm asserting that the NYT can achieve the substantial majority of the advertising optimization and targeting it needs to do to be profitable 1) without doing user-specific tracking and 2) without making page loads extremely slow.

Like I said, serve me an ad. I'm not an idealist, I understand why advertising exists. But don't justify collecting data about which articles I read to serve to some inscrutable network of advertisers by saying that it has to be this way. We don't need this version of advertising.

themacguffinman · on July 31, 2018

> I'm asserting that the NYT can achieve the substantial majority of the advertising optimization and targeting it needs to do to be profitable

Majority, not all. Why should they leave money on the table, exactly?

throwawaymath · on Aug 1, 2018

Because it's disrespectful of user privacy, performance inefficient and computationally wasteful?

Most companies are not achieving the platonic maximalization of profit or shareholder value. They leave money on the table for a variety of reasons. It's not beyond the pale that this would be one of them. If you don't agree, then frankly it's probably an axiomatic disagreement and I don't think we can reason one another to a synthesis.

themacguffinman · on Aug 1, 2018

There's nothing axiomatic about our disagreement here, it's not like I'm unaware of the existence of inefficient businesses. Individual companies may choose to leave money on the table, but industries and markets as a whole do not (not intentionally, anyway).

You've just described the status quo, where businesses have to sacrifice their lifeblood to achieve your ideals. Those businesses tend to be beaten by more focused competitors, which results in the industry you see today, filled with winners that don't achieve your ideals.

But good luck trying to champion an efficient web industry by essentially moralizing.

throwawaymath · on Aug 1, 2018

I'm not trying to champion anything, I'm speaking my mind. I don't expect the NYT to change because I'm writing an HN comment. If market forces or legislation are insufficient to force companies to respect user privacy across unrelated domains, then I'll rely on my own setup: a Pi-Hole VPN for mobile devices, and uBlock Origin for desktop devices. I happily whitelist domains with non-intrusive ads and respect for Do-Not-Track.

But more to the point, you're presenting an argument which implies the NYT is a business which will be beaten by its competitors if it doesn't track users through their unrelated web history. I don't think that kind of tracking is an existential necessity for the NYT. It's not their core competency. Their core competency is journalism - if they are beaten by a competitor it won't be because the competitor has superior tracking, for several reasons:

1. Journalism is not a winner take all environment,

2. Newspapers were surviving in online media well before this tracking was around,

3. The NYT already has sufficiently many inefficiencies that if they actually cared about user privacy, they could trim the fat elsewhere so they wouldn't have to know to within 0.001% precision whether or not a user will read an entire article just to be profitable.

I really don't think this is too idealistic. It's not like I'm saying they need to abandon advertising altogether. I don't even have a problem with the majority of advertising. It's the poor quality control and data collection that I take issue with. All I'm saying is that they don't need to do what they're doing to be profitable.

themacguffinman · on Aug 2, 2018

> Journalism is not a winner take all environment

So? I'm not sure how this means that news orgs won't suffer from losing business to competitors with superior tracking.

> Newspapers were surviving in online media well before this tracking was around

Markets change. Advertisers have different expectations. Readers have more news to choose from. This is a silly argument.

> The NYT already has sufficiently many inefficiencies that if they actually cared about user privacy, they could trim the fat elsewhere

Sure, but why? Why would they do that? Why wouldn't they trim the fat elsewhere AND keep the tracking to make more money?

The point you make doesn't really make sense. Yeah, it's theoretically possible for news orgs to stop tracking in the same way that it's theoretically possible for me to take out a knife right now and cut off my legs. News orgs can make up their losses elsewhere and survive in the same way that I can still get around with a wheelchair.

But why on earth would I or the NYT do that?

I respond to you with these questions because it seems to me that both you and the OP speak out against these practices because you feel they are unnecessary. My point is that they are necessary. You just don't acknowledge the forces that make them so.

Firadeoclus · on Aug 2, 2018

> Majority, not all. Why should they leave money on the table, exactly?

It may be that they are, in fact, driving users away. Tracking user behaviour can become a distraction.

mikeash · on Aug 1, 2018

NYT used to exist only in paper form which had no tracking abilities at all. They may benefit from this but I’m skeptical that they “need” it.

closeparen · on Aug 1, 2018

The marketplace for your attention was a lot less competitive then. Editors could even feed you true and nuanced reporting, out of a sense of professional obligation, and you had no choice but to sit there and take it.

Nasrudith · on Aug 1, 2018

That sounds like a case of acute metrics-itis personally - looking for things to measure as a yardstick while forgetting that you get what you measure for instead of the core of the business.

While it may give some insight does it give anything of meaning to know most people skim into the first few paragraphs before clicking away? Does it improve the writing quality or fact checking? Is it worth the risk of alienating customers over? To give a deliberately extreme and absurd example Victoria's Secret could hire people to stalk their customers to find out more about product failure conditions in the field but that would be a horrifying and stupid idea.

"Everybody is doing it." is a poor rationale for implementing a business practice.

soared · on July 31, 2018

Except that the content is directly related to user behavior. If they see no one reads the style section, they'll cut it and move resources to financial news. If they didn't have tracking they'd never know that, be wasting resources and having a comparatively inferior product.

They can't do UX anaylsis, nothing.

foepys · on July 31, 2018

For the first, it's sufficient enough to just look at the number of page requests.

For the second, I never got explained to me how UX analysis really works for news sites. Isn't it enough to put 2 or 3 people in a room and show them a few variations of the UI? There isn't really much to publishing text, images, and a few graphs. Graphs are a very well explored field, I don't think you can learn more about them by just watching hot maps and click through rates.

TeMPOraL · on July 31, 2018

I suspect there's lot of bullshitting happening around "UX analysis", with third-party "experts" offering analyses which may, or may not, show something significant. As long as everyone in the chain can convince their customer/superior that their work is useful, the money will flow, whether or not the work is actually useful.

cheezymoogle · on July 31, 2018

That's one of the fundamental problems in tech today, namely:

"It is difficult to get a man to understand something, when his salary depends on his not understanding it.”

rainonmoon · on July 31, 2018

It absolutely isn't sufficient to look at the number of page requests. How do you discern like-reads vs hate-reads? How do you determine whether someone clicked on an article, read the first line, and then bailed vs read the whole thing? There are a heap of metrics used to determine engagement which factor into the material decisions referred to in the grandparent.

throwawaymath · on Aug 1, 2018

Why do you need to track user behavior across unrelated domains to achieve any of that?

user5994461 · on July 31, 2018

It's pretty simple, try a few different designs with A/B testing and you will see which one has the most revenues.

However the result will usually be a lot of dark patterns. For instance, that's why you get popup to ask you to register.

soared · on July 31, 2018

Server logs just aren't sufficient, no matter how many times hn says so. You don't get enough data to make data-driven decisions. Thats like giving a data scientist 1/3rd of the available data, and saying "thats good enough".

You'd expect ux to be a small unit, but that includes everyone who works on revenue-generating ads. Moving one tiny thing has a direct impact on revenue, which affects every person employed by nyt.

pathseeker · on July 31, 2018

>Thats like giving a data scientist 1/3rd of the available data, and saying "thats good enough".

And it could easily be enough. Having 1/3 of a quantity of something doesn't mean you barely had enough before.

ErikAugust · on July 31, 2018

They could certainly track what you mention here (which pages are being accessed) via logging requests - without any use of additional front-end assets.

zajd · on July 31, 2018

You can do all of those analytics server side, there's no reason to deliver it via JS and have the client do the computation. You're already sending all the required info to track that sort of thing via the request itself.

smolder · on July 31, 2018

It's amazing to me that no one out there seems to do server-local handling of ads, either... If you put ads directly into your page instead of relying on burdensome external systems, suddenly blocking isn't a thing anymore. ALL of the functionality supposedly needed for analytics and an ad-driven business model can happen server side, without the page becoming sentient and loading a billion scripts and scattered resources, with the one exception being filtering out headless browsers. If external systems need to be communicated with, most of that can happen before or after the fact of that page load. Advertising and analytics is implemented in the most lazy, user hostile way possible on the majority of sites.

themacguffinman · on Aug 1, 2018

I don't think it's very surprising. Advertisers won't let publishers serve ads directly because that requires trust in publishers to not misrepresent stats like impressions and real views. I don't know how you'd solve that trust problem when publishers are actually incentivized to cheat advertisers.

smolder · on Aug 1, 2018

I think you may have identified the biggest issue, and it's a shame the pragmatic solution is an unpleasant technical solution.

SilasX · on Aug 1, 2018

Couldn’t they eg have some trusted proxy server that routes some requests to the real-content NYTimes server and some to the ad server?

smolder · on Aug 1, 2018

That sounds like a viable solution to the trust issue. They don't need to respond to the requests, just see copies they can be sure are real requests.

themacguffinman · on Aug 2, 2018

For advertisers to trust this proxy server, the NYT cannot control this proxy server to preserve its integrity. So now you're asking the NYT to base their business on an advertiser-controlled server?

What happens when the proxy goes down? What happens when there are bugs? Do you think publishers can really trust advertisers to be good stewards of the publisher's business? Think for a moment about publishers that are not as big as the NYT.

Okay, maybe they do trust an advertiser-controlled proxy server. This means that both tracking scripts and NYT scripts are served from the same domain, meaning they no longer have cross origin security tampering protection. What's stopping the NYT from serving a script that tampers with an advertiser's tracking script?

SilasX · on Aug 2, 2018

Those are issues, but not insurmountable, especially when the benefit is "obviate any adblocker".

They can use a trusted third party to run the proxy and use industry standards/SLAs for site reliability/uptime. And they can still use different subdomains with no obvious pattern (web1.nytimes.com vs web2.nytimes.com -- which is the ad server?) or audit the scripts sent through the proxy for malice.

TeMPOraL · on July 31, 2018

The way it's implemented has several "benefits":

- It externalizes resource usage - the waste happens on users' machines. Who cares that it adds up to meaningful electricity consumption anyway?

- It makes it easier for marketing people and developers to independently work on the site. Developers can optimize, marketers can waste all that effort by including yet another third-party JS.

- It supports ad auctions and other services in the whole adtech economy.

- You don't have to do much analytics yourself, as the third party provides cute graphs ideal for presenting to management.

harshreality · on July 31, 2018

There used to be an open source self-hosted (php) ad application called openx. It worked well for quite a while. In its later years, it suffered a number of high-profile security vulnerabilities, and the open source version was poorly maintained since OpenX [the company] was focused more on their hosted solution which probably had migrated to a different codebase or at least was a major version past the open source codebase.

The open source version has been renamed "Revive Adserver", and it looks maintained, but I don't think it's used nearly as much as the openx [open source version] of old.

If you use Revive Adserver or you design a server-local ad system in-house, it won't be as sophisticated as gigantic ad-providers who can do all sorts of segmentation and analysis (producing pretty reports which execs and stakeholders love even if that knowledge adds no value to the business).

smolder · on Aug 1, 2018

Funny that you mention that --in a former life I had to develop around and maintain an openx system.

zajd · on July 31, 2018

It's because they use systems that identify the client via js to deliver the most "expensive" ad possible. It's complete garbage of course, Google/Facebook should be held liable for what they advertise, not run massive automated systems full of fraud. If Google delivers malware they shouldn't be able to throw their hands up and go "well, section 230!".

mikec3010 · on July 31, 2018

> They can't do UX anaylsis, nothing

They could, but that would require paying people and firms like Nielsen to gather data. Instead they engage in the same freeloading that the industry derides users for.

confounded · on July 31, 2018

Reading without cookies or JavaScript enables seems to fix every problem the NYT has.

reitanqild · on July 31, 2018

FWIW: ars technica turns off tracking for paying customers (and provide full articles in the rss feed if you pay for it.)

excalibur · on July 31, 2018

I need to like this comment more than my single upvote allows.

cheezymoogle · on July 31, 2018

A random archive of the New York Times frontpage in 2005 is 300kb. Articles were probably comparable in size.

Are you honestly saying that the landscape of the internet and/or the staffing needs of the NY Times has changed so drastically that they actually needed a 22x increase in size to deliver fundamentally text-based reporting?

kazagistar · on July 31, 2018

I mean, if most of that is a few images, then those images could just be bigger today for nicer screens and faster internet.

Not that that is the case.

paulsutter · on July 31, 2018

You’re right about the problem: web pages tend to scale with the size of the organization serving them, not the size of the content. But this is the failure, not a defense.

It’s a big problem on mobile and the reason I read HN comments before the article.

> NYT employs 1,300 people

soared · on July 31, 2018

Definitely a good point. You'd imagine they'd have at least 1 person who optimizes the site for page size / load speed.

mr_toad · on July 31, 2018

And 100 other people who’s job it is to cram more features in.

ryandrake · on July 31, 2018

> Moby Dick is a book, written by 1 guy and maybe an editor or two. NYT employs 1,300 people.

Totally irrelevant. Why should the number of employees in the company have any bearing on the size or cost of the product? Ford has 5x as many employees as Tesla. Should their cars be 5x as big or 5x more expensive?

> NYT needs to know how long you spent, on which articles, etc. They need data to produce the product

They may want this but they don’t need it. They successfully produced their product in the past without it.

> If Moby Dick was being rewritten and optimized every single day it would be a few mb.

Irrelevant and likely false. If anything, books and other text media tend to get smaller after subsequent editing and revising.

> A better comparison would by Moby Dick to the physical NYT newspaper.

Comparing a digital text product (Moby Dick) with a digital text product (a NYT article) is as close as it gets.

dwild · on July 31, 2018

> Totally irrelevant. Why should the number of employees in the company have any bearing on the size or cost of the product? Ford has 5x as many employees as Tesla. Should their cars be 5x as big or 5x more expensive?

If the cost or the size wasn't a constraint, for sure Ford would build a car 5x as big or 5x as expensive.

The website size isn't a constraint here, if it was, they would works on it and make it smaller. It's only a constraint for highly technical people here. Currently at my job I'm optimizing some queries that takes way too long. It has been like that for years but we hit a wall recently, our SQL Server can't take it anymore. I always found it stupid that it took so long to optimize it... but at the end of the day, the clients just didn't care that it took 3 seconds to load the page. I could be working on more features right now, something that the client actually care about.

What makes the number of employees relevant to the size? Well if you were the only one building that website, you would know everything about it right? You would always use the exact same component, reuse everything you can, you already know every single part of the code. Add a second employee, now you don't know exactly what he does, you do know some of it but some time you forget and you may duplicate something or do it badly or whatever. At one point, something is just too big to be understood by a any single employee and you get code badly reused, stuff that serve no direct purpose too but make maintenance easier, etc... You never decrease the size simply because it's never worth it to but each and every single one of the employee add stuff to it.

mkirklions · on Aug 1, 2018

>Why should the number of employees in the company have any bearing on the size or cost of the product? Ford has 5x as many employees as Tesla. Should their cars be 5x as big or 5x more expensive?

This is a bad example. Tesla is a failing/failed car company that cant produce 200k cars per year. Ford has their truck program that sells that in a few months.

>Comparing a digital text product (Moby Dick) with a digital text product (a NYT article) is as close as it gets.

Comparing a book vs a timely article is unfair. NYT produces content daily to encourage shares and people clicking on various links on the website. Links, Images, Videos, comments, etc... None of those are available in dumb text.

jaredklewis · on July 31, 2018

> NYT needs to know how long you spent, on which articles, etc. They need data to produce the product and you can only achieve that with javascript tracking pixels (Server logs aren't good enough).

Nonsense. I subscribe to the NYT so that I can read the news. Nothing about that necessitates tracking which users read which articles.

If the NYT uses page view data for anything other than statistics for their advertising partners, it's a shame. I don't want the NYT to tailor write their articles to maximize page views, time spent, or any other vanity statistic; if I felt like reading rage bait fed to me by an algorithm personally customized for all my rage buttons, there is plenty of that elsewhere.

NYT's differentiating factor is that they are one of the few businesses left that pays people to conduct actual journalism. If they give up on that, then I imagine their customers will just go to buzzfeed or wherever.

lazyasciiart · on July 31, 2018

So, how does the NYT style section fit into your idea of 'actual journalism'?

jaredklewis · on July 31, 2018

It doesn't. Not every word printed in the NYT is journalism, but it doesn't change the fact that they are one of the few websites that have any journalism at all.

If the NYT cut their paper down to just the style section, horoscopes, and other garbage, they would be just another Buzzfeed and are probably not equipped to compete.

On the other hand, Info Wars, Mother Jones and friends offer publications with basically no journalism at all. That's the space the NYT, WSJ, Miami Herland, Chicago Tribune, and so on fill. They do Pulitzer prize worthy reporting. If these papers become run-of-the-mill click farms, I'm sure silicon valley will run them out of business, as rage bait is not really their core competency.

goatlover · on July 31, 2018

> NYT needs to know how long you spent, on which articles, etc. They need data to produce the product and you can only achieve that with javascript tracking pixels (Server logs aren't good enough).

This just seems like such an abuse of what the web was meant to be. I can imagine the horror people in the 90s would have experienced if they new what JS was going to be used for when perusing news sites.

Sometimes I wonder if it would have been better keeping the web as a document platform without any scripting, and creating a separate one for apps.

Anyway, an alternative model news sites could use is to let users choose which content they want to pay for. That's a way to track which content users prefer.

jstarfish · on July 31, 2018

> I can imagine the horror people in the 90s would have experienced if they new what JS was going to be used for

We understood the horror no less than we do now. Javascript in the 90s gave us infinite pop-ups, pop-unders, evasive controls, drive-by downloads and otherwise hijacked your browser and/or computer. There's a reason Proxomitron and other content blockers hit the scene by the early 2000s-- the need to shut that shit off was clear.

mahranch · on July 31, 2018

Yeah, people under 33 seem to have a romanticized view of the internet. They believed there was no ads and it was flush with the kinds of content we enjoy today. Nope. Content existed but it was scarce/thin. Many of the internet users just stayed on AOL/Prodigy/Compuserve and never left to explore the WWW side of things. Those service providers were essentially national level BBSs.

There was no youtube, wikipedia, itunes or reddit. No instagram, twitter or google earth. The internet was basically geocities where most webpages were fan pages or pages/forums about niche interests.

I think people want to believe that because they believe that if ads were to disappear off the internet tomorrow, nothing would change. They don't realize that ads subsidize the content they consume, whether it's a youtube video they're watching or a reddit thread, ads are paying for that content. Nothing is free.

goatlover · on July 31, 2018

The web was intended to be a free exchange of knowledge, not ad driven, regardless of how JS was abused in the late 90s, early 2000s. A scripting language was added to the web because of Netscape’s commercial interests in Creating an alternative to MS products.

mahranch · on Aug 1, 2018

> The web was intended to be

I'm sorry, but this is ridiculous. You cannot say what the web was intended to be because you had no hand in inventing it. You do not know the mind of Tim Berners-Lee.

In fact, I argue the opposite - he had a vision of a global hyperlinked information system. While he wanted the protocol itself to be free (a move away from gopher), the information itself had no such protections. And that is precisely what we have today; it doesn't cost anything to use the WWW protocol. His vision has been fulfilled.

Now, the information (the content) itself is another matter. IP laws exist for a reason, people want to be paid for the content they create. They have ownership of that content. Whether it's the latest episode of game of thrones, a video game IP, or a book I wrote, the law protects my intellectual property. If I want to charge for access to that content, I'm more than within my legal right. Whether it's accessed over WWW, a cable box, or purchased from a book store, it makes no difference to my legal protections.

pwinnski · on Aug 1, 2018

You write as if Tim Berners-Lee died in 1990.

I think it's quite possible to know what TBL thinks about privacy-invasive javascript, given that he's very much alive and writing about such things to this day.

https://webfoundation.org/2017/03/web-turns-28-letter/

SllX · on Aug 1, 2018

> Sometimes I wonder if it would have been better keeping the web as a document platform without any scripting, and creating a separate one for apps.

The app platform would consume the document platform because it is easier to enforce DRM requirements in an app platform than in a document platform.

I'm just thankful that I can still Print to PDF nearly anything. I don't think it will last much longer though because "web" "designers" are driven to destroy anything and everything that was actually good about the web in their quest to monetize every pixel on my screen.

mvdwoord · on July 31, 2018

They need data to produce the product and you can only achieve that with javascript tracking pixels (Server logs aren't good enough).

I disagree. They need journalists, and they need to find some way to monetize. Your argument implies there is no other way than to add user tracking. Sure, images take up space, but I refuse to believe the current way of the web is the only viable option.

kingbirdy · on July 31, 2018

The New York Times existed for 145 years from its founding in 1851 to the creation of its website in 1996, and it got by just fine without tracking pixels in all those years.

freediver · on Aug 1, 2018

This is a fallacy. Humans got by just fine without smartphones, Internet, electricity for thousands of years. While you could do just fine without those things today it is impractical. Times change (pun intended).

LolNoGenerics · on July 31, 2018

> NYT needs to know how long you spent, on which articles, etc. They need data to produce the product and you can only achieve that with javascript tracking pixels (Server logs aren't good enough).

That's analytics. The marketing stuff that is causing the bloat isn't doing that much in comparison. Trackers are often coded very wasteful and are redundant by nature. You can have easily dozens to hundreds of them all doing the same stuff just with different APIs. It is insane and out of control and has absolutely nothing to do with gathering insights about your app and improving. It is pure external 3rd party marketing.

Joeri · on Aug 1, 2018

Newspapers did just fine for centuries without tracking. The business is viable without tracking.

zajd · on July 31, 2018

> They need data to produce the product

[citation needed]

There's no reason they need to use Javascript to track user behavior down to "how long have they read this article".

kdl20kxkrk · on July 31, 2018

> NYT employs 1,300 people.

See 2 in the article. A sample:

“As Graeber observed in his essay and book, bullshit jobs tend to spawn other bullshit jobs for which the sole function is a dependence on the existence of more senior bullshit jobs:“

I work for one of the vacation rentals

No reason private owners couldn’t be doing the work over email. But certain fetishized models of doing, in this case cloud and web apps, get the focus.

It’s all for eyeballs and buy in at scale to justify the bullshit. “Look everyone is watching us talk up this shit! Better keep justifying it, bringing them into our flock!”

It’s turned us all into corporate sycophants. Religious conviction isn’t limited to belief in sky wizards

Anything sufficiently magical to the layman will instill blind allegiance

And despite all the smart people here, life as is seems magical and there’s a lot of blind buy-in

kodablah · on July 31, 2018

This dovetails into an idea I had [0]. Basically just client side scrape the web as it's used and deliver people this plain text and simple forms. It would have a maintained set of definitions and potentially even logic to put a better "front" on all this bullshit. It's like reverse ad block where you only whitelist some content instead of blacklisting it. You could argue sites will get good at fighting it, but if used enough by the common user, they'd just alienate them (e.g. my scrape/front for Google search makes it clear which results the app has a friendly scrape/front for).

0 - https://github.com/cretz/software-ideas/issues/82

cheezymoogle · on July 31, 2018

I've often toyed with the idea of using multi-user systems over SSH running Gopher and Lynx to achieve something like this.

In the process, it would also decentralize communities and establish digital equivalents of coffee shops (i.e. places to work in public and meet strangers)--basically SDF, but deployable on Raspberry Pis with more modern userland toys (i.e. software actually designed to be multi-user on the same system).

[1] https://en.wikipedia.org/wiki/Gopher_(protocol)

[2] https://en.wikipedia.org/wiki/SDF_Public_Access_Unix_System

kodablah · on July 31, 2018

My main reason for client side is to skirt legal troubles that can result from running a web-filtering proxy (not whether it's legal or not, but whether you will be in legal fights). Either way, needs to be as transparent as possible and as usable by the less-tech-savvy as possible.

But that's really all it is, a web server (or an app, or an extension, or a combo) that serves you up the web looking like Craigslist. Would require strongly curated set of "fronts"/"recipes".

Pete_D · on Aug 1, 2018

Sounds a bit like tedunangst's miniwebproxy[0]. I've been wondering about writing either something like it or a youtube-dl-like "article-dl" for my own use, but haven't quite been annoyed enough into doing it yet.

[0] https://www.tedunangst.com/flak/post/miniwebproxy (self-signed cert)

dredmorbius · on Aug 1, 2018

Nice. My own somewhat similar concept:

https://old.reddit.com/r/dredmorbius/comments/6bgowu/what_if...

Gravityloss · on July 31, 2018

I don't know how many people here read usenet or were on old mailing lists. You could have removed a some hard edges in usability and everything would have been connected to phones with monochrome text displays back in the nineties already. They didn't even provide decent email experience.

But instead we got the technology developing through some ringtone stuff advertised on TV.

I guess it's something that you can instantly show to your friends.

cheezymoogle · on July 31, 2018

That's one of the wildest things! We had the technological capacity to run Unix systems with several hundred users simultaneously and access at the speed of thought with 26kbps modems back in 1992, complete with instant messaging and personal directories! What happened?!

TeMPOraL · on July 31, 2018

Another wild thing like this is what you'll notice when you read up on Lisp machines. We had development environments in the 70s/80s that would seem magical today.

Joeri · on Aug 1, 2018

I’m reading the book valley of genius where the xerox parc people basically make the same argument. The Xerox Alto’s smalltalk environment still isn’t matched today and the PC experience is much weaker for it.

The problem with those kinds of environments (where everything is editable at runtime using highly expressive langiages) is they assume everyone is a power user and there are no malevolant actors trying to mess up your machine. That’s not what the modern landscape is like.

rickycook · on Aug 1, 2018

kinda so did reasonably modern ideas. take active desktop for example! sure it’s more high level, but i believe the quote is that they wanted websites to do “cool things” with the desktop. cringeworthy by today’s standards...

things get more locked down as we develop abstractions that we have more control over

dri_ft · on July 31, 2018

It's heartbreaking.

Gravityloss · on Aug 1, 2018

Shows we are not limited by technology but somehow get distracted by other things.

scroot · on Aug 1, 2018

The "other things" are the short-termism and appealing to the lowest common denominator that go with the pursuit of profit before anything else.

mr_toad · on July 31, 2018

In Usenet’s particular case it was its open, unmoderated nature that killed it - once it became 99% spam, warez and CP most ISPs dropped it.

Gravityloss · on Aug 1, 2018

There were moderated groups but IIRC they were updating more slowly because every message was reviewed.

Anyway, there was a certain barrier for entry so there were less users and messages. But some really good experts posted there. And some really fun jokers.

krrrh · on July 31, 2018

I support your effort to make Moby-Dicks the football-field-like unit of measurement for text-focused data. It’s close enough to the 1.44 MB floppy disk to handle easy mental conversion of historical rants, and half of the people reading this have probably never held one of those. I still remember downloading a text version of a 0.9 Moby-Dick book from some FTP site and carrying it around on a floppy so I could read it on whatever computer was handy.

That aside, the most shocking part of your analysis is how inefficient the nytimes was at caching resources for your reload.

userbinator · on Aug 1, 2018

For a rather more technical comparison, 4,600 pages is more than the size of Intel's x86/64 Software Developer's Manual, which is ~3-4k pages.

textmode · on July 31, 2018

"I just loaded the New York Times front page. It was 6.6mb."

   ftp -4o 1.htm https://www.nytimes.com

   du -h 1.htm

   206K

For the author, 206K somehow grew to 6.6M.

Could it have anything to do with the browser he is using?

Does it automatically load resources specified by someone other than the user, without any user input?

Above I specified www.nytimes.com. I did not specify any other sources. I got what I wanted: text/html. It came from the domain I specified. (I can use a client that does not do redirects.)

But what if I used a popular web browser to download the front page?

What would I get then? Maybe I would get more than just text, more than 206K and perhaps more from sources I did not specify.

If the user wants application/json instead of text/html, NYTimes has feeds for each section:

    curl  https://static01.nyt.com/services/json/sectionfronts/$1/index.jsonp

where $1 is the section name, e.g., "world".

The user can use the json to create the html page she wants, client-side. Or she can let a browser javascript engine use it to construct the page that someone else wants, probably constructing it in a way that benefits advertisers.

smadge · on Aug 1, 2018

I don’t think there is anything wrong with user agents downloading resources (like images and stylesheets) linked to by an html document. It is the providers, not the user agents, who have violated the trust of users by including unnecessary scripts, fonts, spyware, advertisements, etc.

textmode · on Aug 1, 2018

"I don't think there is anything wrong with user agents downloading resources (like images and stylesheets) linked to by an html document."

Neither do I. For some websites, this is both necessary and appropriate.

However, in cases where the user does not want/need these resources, or where she does not trust the provider, I do not think there is anything wrong with not downloading images, stylesheets, unnecessary scripts, fonts, spyware, advertisements, etc.

dri_ft · on July 31, 2018

My pet comparisons for everything being too big nowadays are Mario 64 (8mb!), Super Mario World (512kb!), and Super Mario (32kb!!).

liquidwax · on July 31, 2018

I first realised how heavy these pages are when I disabled javascript. Things load in the blink of an eye. \Most\ pages work and the web remains largely usable.

cjohansson · on Aug 1, 2018

Yes this is my experience as well, JavaScript is often the key antagonist. Unfortunately many websites require JavaScript to function

freediver · on Aug 1, 2018

bbc.co.uk will load just fine without JS and actually be more enjoyable (IMO) than JS version.

cnn.com fails miserably without JS.

adamsea · on July 31, 2018

IMHO you are confusing data with information with knowledge. And mixing mediums. You can't compare a novel - the plainest of plain-text mediums, with the front page online of a major news organization in 2018 - of course it will be interactive content, its an entirely different medium, a different market, and different sets of user expectations and competition.

https://www.quora.com/What%E2%80%99s-the-difference-between-...

DATA: a "given" or a fact; number; picture represents something in real world raw materials in production of information

INFORMATION: Data that have meaning in context Data related Data after manipulation

KNOWLEDGE: familiarity, awareness and understanding of someone or something acquired through experience or learning it is a concept mainly for humans unlike data and information.

userbinator · on Aug 1, 2018

but don't want it to be hugged to death:

Incidentally, this is also another reason for keeping pages small --- bandwidth costs. I remember when free hosts with quite miniscule monthly bandwidth and disk space allotments were the norm, and kept my pages on those as small as possible.

yathern · on July 31, 2018

I just loaded up a nytimes[1] article too - and only weighed in at 1.0MB. For a 1000 word article. Subsequent reloads dropped it to ~1000KB. I don't think that's too bad, considering there are images in there as well.

Now of course, I'm running an ad blocker. I assume the remaining MB that you noticed had come from advertising sources. In which case, bloat isn't the issue, ads are.

[1] - https://www.nytimes.com/2018/07/31/us/politics/facebook-poli...

rhizome · on July 31, 2018

weighed in at 1.0MB. For a 1000 word article. Subsequent reloads dropped it to ~1000KB.

You really can't beat savings like that.

welly · on Aug 1, 2018

That'll knock dollars... no, cents... no half-cents off his ISP bill!

iforgotpassword · on July 31, 2018

That distinction is nonsense. The ads are part of the page and are no more or less bloat then the rest of the useless junk that gets embedded. It's deliberately put there by the NY times, they don't end up there by accident.

cheezymoogle · on July 31, 2018

Are you also running noscript? I'm running a DNS sinkhole and still get 3mb on reloads.

pc86 · on July 31, 2018

Comparing the raw text of a fiction novel to the code of a website is a pretty asinine comparison, honestly.

Pete_D · on July 31, 2018

Maybe you'd prefer comparing the code of a website to the amount of useful content on the website, which OP also did. Taking "I'm downloading 100mb worth of stuff (83 Moby-Dicks) to read 72kb worth of plaintext" at face value, we could also say that 0.072% of the data transferred is useful, or, equivalently, that 99.928% of it is crap.

pc86 · on July 31, 2018

You don't have to download the typography of a physical book but it still plays a huge role in the readability and enjoyment of it. So I guess the typography of websites is "crap" because it has to be downloaded?

It's a ridiculous apples-and-hammers comparison thinly veiled as an intelligent critique.

mrob · on July 31, 2018

I've already downloaded everything I need for perfect typography. I can apply it to most websites with Firefox's Reader Mode. Websites cannot possibly improve on this, because the best and most legible typography is the typography you're most familiar with. I don't care about branding or image or whatever bullshit "designers" use to justify their jobs. Web fonts and CSS have negative value to me. I disable them as far as possible.

TeMPOraL · on July 31, 2018

How much of the data downloaded is actually for typography?

Also, browsers have good enough typography by default, which can be controlled with CSS.

cheezymoogle · on July 31, 2018

Everyone's favorite example of that kind of "brutalist" design:

http://bettermotherfuckingwebsite.com/