> It would be really nice to compose objects hierarchally: for example, if I had a widget class composed of several different renderable Rectangles, I want to be able to automatically generate the rendering code for my widget based on the existing rendering logic for its constituent shapes.
So one thing that I've learned about programming: Don't overthink it. How many different object types do you want to serialize? Maybe a hundred? So just write those 100 lines of boilerplate. Not a huge amount of work, even when there is the the occasional modification. An additional benefit is that you're not painting yourself in a corner - for example, you can leave out that additional bit of redundant information that you hacked in as an optimization.
What's really huge for maintainability is this thing called aspect-oriented programming: Don't litter each of your classes with a line of boilerplate for this and another line for that. Make a file "serialize.cpp" that contains the serialization code for all the types, and nothing else.
In other words, object-oriented is the wrong tool, for almost everything (sorry if that's old news).
C++ has "built in" serialization the same way that C does.. Except its not acceptable to use anymore, even though its orders of magnitude faster than the alternatives.. AKA you write your structures out directly as binary data. Bonus if you align them. The limitation is that you can't have classes/structures with actual pointers in them (rather using offsets) and you have to encourage your compiler to pack/align them consistently. Bonus if you override the new/placement allocator so that you can serialize gigabytes of data (including pointers) with a simple map/unmap call.
Frankly, the idea that data should be human readable is complete crap, especially for things which are going to be machine processed far more than human processed. AKA use XML for your config file, not for the gigabytes of data you send over the wire/store on disk/etc, that will never be read by a human. Its funny that HTML/etc has everyone in this strange mindset that somehow its more efficient to build human readable markup, which 99.9999% of the time is never read by a human, and has to be compressed with algorithms that further burn up tons of CPU time in order to squash the resulting data. The untold tons of carbon wasted transfering more data than necessary and/or compressing/decompressing it is insane for the possibility that someone will want to read it as it flys over the wire, rather than decoding it at the endpoint and viewing it in a debugger...
Dumping raw binary is a problem if you want your data to be readable across platforms or versions of the same code.
It is also a bad idea if reading a complete structure into memory is not possible, or a poor use of time. For example, if you need to read a single part of a gigabyte data set.
None of this is a show stopper, especially with a little planning. AKA pad the structures a bit initially, or put version numbers in them. Even if you fail to do any of that early on, it can be bolted on with a bit of preprocessing as part of a version upgrade. Its really no different than a schema change between version numbers. Plus, if your really paranoid, you verify field size and offsets at startup time against a previously generated bit of metadata.
Also, if you happen to need to pick out a tiny bit of data from some huge tree/whatever data structure then map it with appropriate madvise() and simply let the kernel page in the portions you need to access. The whole dataset doesn't need to be read to pick out a couple bytes here and there. Further if this is a common use case you probably shouldn't be using this method anyway, rather storing the index separately from the data in a manner which allows them to be accessed independently. This mechanism holds up for multiple TB sized datasets, and can be tuned by playing with the amount of physical ram in the machine vs the speed of the disk. Then when storage class memory takes off, you don't even have to rewrite your application.
Finally, for a single GB dataset, doing a bulk read is likely well under a 1 second operation on most reasonable storage mechanisms available for server use. Compared with picking off a couple hundred thousand fields from an object storage mechanism this is going to seem pretty speedy.
Yes, binary data is good. It's better to explicitly serialize/deserialize it, though. That way endianness won't become an issue ever, and the programmer does not need to be concerned over runtime memory layout of the structs.
> AKA you write your structures out directly as binary data.
What does that even mean? Depends on what the output should be and how much control you need, I guess.
To comment on the binary vs text digression - computation overhead is probably not that much if the data is converted to a (non-portable) internal DOM representation and then typeset / rasterized anyway. And is bandwidth/storage overhead? I assume a short video quickly outweighs the costs for 100%, or in extreme cases 1000%, overhead compared to a compressed DOM representation. And the human factors are definitely there.
> > It would be really nice to compose objects hierarchally: for example, if I had a widget class composed of several different renderable Rectangles, I want to be able to automatically generate the rendering code for my widget based on the existing rendering logic for its constituent shapes.
Actually, what's interesting is that both Lisp macros and C++ templates allow this to work pretty easily if you set up the boiler plate well enough. All this article is really talking about is making it easier to set up and use that boilerplate (e.g. put it in the language).
> What's really huge for maintainability is this thing called aspect-oriented programming: Don't litter each of your classes with a line of boilerplate for this and another line for that. Make a file "serialize.cpp" that contains the serialization code for all the types, and nothing else.
And that works if your project is not meant to ever be extended. But generally the point of having object oriented systems is being able to make new objects that can take the place of old ones. Of having abstraction.
If by "abstraction" you mean ADTs - of course there are some (not many) places where they are useful.
And you can easily have them or change to them later. It's just another few lines of boilerplate to make a few method tables. But I don't like to have the implementation logic in the places where method dicts are wired, especially for interfaces with very little coherency, like GameModel3d: render, click, contextString, serialize... Doing that instead of aspect-oriented is a violation of the "separation of concerns" principle.
Isn't the dream of a reflection library to build serialization using an Observer pattern, where the serialization logic is unbounded and business logic never mixes? It would be great if the standard came up with something that would be so broadly useful and allow developers to focus on their domain instead of serialization.
Although I agree with your ide, I've written some boilerplate code before to do something very similar. As the project grew its become a constant source of bugs where someone has updated the code for one type and not another. At this point they're too far gone to have any hope of merging them, but had I known two years ago how much hassle it would have caused I would have refactored it into common code
Yeah - I agree with both of you. First start with the boilerplate approach, and once there is boilerplate in place for a low number of entity types, say, 2<n<10, figure out if that can be refactored into something less boilerplatey.
But yeah, very seldom starting with a complex abstraction for a code you've never written before is a good idea.
Hopefully the next version of the C++ standard will incorporate some form of the unified call syntax proposal, which will make the use of this style of code organization more consistent with traditional one-per-class declaration files.
I learned some things from reading that article. I hadn't heard about those other schemes to do rtti. I just want it to be as easy as objective c was, and I still like those smalltalk like api calls that were in that.
But the article reminds me why I still feel frustrated that my programming work is divided between java and c++. There are so many things that are easy in java and hugely painful in c++, and c++ slowly adds those ideas yet it makes the language more painful to use over time. I basically want to throw c++ away and replace it with something that is more regular and doesn't expose so many complexities to the end programmer like the endless nuances of memory allocation, moving, &&, r values etc. But there's no hope of that. For the rest of my life every year or two there will be additions to c++ that I will have learn about and use and gradually discover their own warts, yet java will still keep cruising along.
What I want is a language over C++, as typescript is over JavaScript, that keeps compatibility with C and C++ code (Qt, boost...) and provides a clear syntax and some advanced features (as reflection) even paying a price for it. This is it, no more zero cost abstraction.
The closest thing I can think of would be the subset of C++ used in Stepanov + McJones' Elements of Programming, with the addition C++11 stuff like lambdas and std::function, plus (hopefully) C++20 stuff like compile-time reflection, concepts and ranges.
No, D is a separate language. A bit too much. Ideally you will be able to move between a version of the language or another at Class level to alllow a incremental migration of the code that must be "modernized".
Cruising along with frequent pit stops for garbage collection.
All jokes aside I agree with you to some extent: C++ can be frustrating to work with, and there are a lot of quirks that make it really easy to blow your foot off.
These quirks are not without benefit though. Being able to control how memory is allocated is hugely important in performance sensitive applications where pointer chasing is not tolerable. The same can be said about r values/move semantics.
You can get those benefits without the complexity of C++ if you're willing to break with the past (and as for the newer features, it's possible C++ could have implemented them more straightforwardly, though maybe they were already designed into a corner).
For example, look at C++'s concept of move semantics. The language still runs destructors regardless of whether a value has been moved. This means every movable class has to have a valid "empty" state the destructor can check to avoid double frees. That means you need move constructors which can run arbitrary code. And that means generic code that does moves has to care about exception safety.
Instead, the language could statically avoid running the destructors of moved values. No more rvalue zoo needed to trigger a move, no more mandatory "empty" states because the destructor can assume a valid object, no more move constructors because they all reduce to memcpy, and thus no more exception safety problems and an easier job for the optimizer.
If you implement a move as a swap (as in the examples I saw when I first heard of move constructors back when it was called c++0x) you don't need an empty state.
Not being able to keep track of what cases call a destructor and which ones do not sounds like a much more huge problem than the one you are talking about. Doesn't seem too hard to avoid the temptation throw in a move constructor.
You can't always swap. For example, you may want to use a movable but non-copyable type to keep un-aliased access to a single resource (see linear/affine types) and the compiler treating the move source as dead is useful there. Or maybe you don't want to allocate anything for the move target because the resource you're managing is expensive.
Keeping track of when a destructor gets called is not a huge problem at all. From the user perspective it's airtight, proper uninitialized value analysis in the compiler make it impossible to screw up. From the implementation perspective all you need is the occasional flag on the stack (whose value is already calculated) for situations like `if ... { use_by_move(obj) }`, and a rule against moving things out of fields that you're otherwise not responsible for destructing (you can still swap there).
> If you implement a move as a swap (as in the examples I saw when I first heard of move constructors back when it was called c++0x) you don't need an empty state.
You do to e.g. default construct the temporary you'll be swapping with in the first place. Problematic if you're trying to construct e.g. a reference-like type with no null state.
I am writing a game engine and C++'s lack of proper reflection is a real hindrance for a variety of serialisation/deserialisation interfaces, be it networking or saving things to disk and loading them. You just end up writing redundant and more brittle code for no reason other than that a language feature is missing. I think it is great that this problem is given attention.
Edit: After thinking on it for a few minutes. All I want is a generic function that takes a class as the generic type T, and a string as an argument, and returns me a completely filled in object of type T, or throws an error on malformed input. I really don't think I am asking for too much here.
i'm not sure about your particular issue, but every time i had to choose between being smart and generic with serialization over being straightforward and boring ( at the cost of writing one small line of code for each and every property), i always choose to stay boring.
Not only is it nicer to the person that's going to maintain your code and may not know about your smart trick, but with generic code there is always an exception that makes you need to write some code somewhere that breaks the abstraction.
> and returns me a completely filled in object of type T
What if type T contains a pointer to type U ? Or a list of pointers to a type U, with some duplicates in the list? What if T contains a floating point number, and the application is compiled with ffast-math?
What if the object contains a reference to some unknown type (a file, or socket)?
There are lots of ways of solving these problems, but each of the solutions has trade offs which may be acceptable to you, but aren't acceptable to me and vice versa.
> All I want is a generic function that takes a class as the generic type T, and a string as an argument, and returns me a completely filled in object of type T, or throws an error on malformed input.
Well, what you need is a way to enumerate the members of a class. Everything else is already done by something like Cereal or Boost.Serialization.
.NET allows you to use reflection to emit bytecode which does the same work as reflection. More complicated to setup, but the speed-up is huge. Not sure if Java has an equivalent? I'd be curious to know.
Yes. You generate bytecode for classes that are loaded by a class loader, and thereby enter the JIT world etc. The tech predates .NET, but it isn't as developer friendly as DynamicMethod etc.
The author made a small mistake in this article which may be worth pointing out: -fno-rtti does not disable the generation of vtables (which would break virtual method dispatch) but only runtime type information. The two are connected in that RTTI is generally stored with an object's vtable, so objects with no virtual methods don't get it. The alternative would be to include a pointer to type information in all objects without a vtable, but the amount of memory this would require would be substantial, and it would break expectations about the size and memory layout of "plain old data" structs.
Is anyone doing a strong constexpr hash for use in compile-time serialization? It would be impressive to combine compile-time serialization with a consistent method of evaluating type differences. I've heard of people doing this with a dedicated compile/link step.
Generally trying to come up with complex compile time higher level abstractions in C++ is a waste of time. A better idea is to figure out a relational schema for the runtime presentation of the data and represent that with a combination of standard datastructures and PODs. I really, really would see mental efforts spent in systems design and simplifying the problem and not in findindg the best way to contort C++ to some other languages paradigm (on the compiler level - runtime behaviour for any complex dynamic data might start to appear like lisp but please don't try to make the compiler understand that).
>$, as a way of getting object metadata out of a class, namespace, etc. (Using $ has been argued out of favor because it is common in legacy code, particularly in code generation and template systems which are not necessarily valid C++ but produce C++ sources.)
you have got to be kidding me, right ?
"$" adding this character too to the C++ ? What left we haven't add ?
Why not use only something like "reflexpr" which is more C++ish than $ character. Why make C++ more difficult to parse and read with every release ?
TBH I genuinely don't understand why we need both ? why not only have "reflexpr".
$ is already a valid identifier character in a clang and gcc extension. And there won't be reflexpr and $, those are two different proposal. And due to the extension, it won't be $ (most likely).
Can someone remind me what that's for? I never use it, I keep trying to look it up and then I forget it again. Maybe because I never quite understood it properly.
I'm batting 0 for 2 so far in this thread, but I'll take a stab at it now I've been reminded of those crazy beasts' existence. :)
struct Foo {
int i;
int inc()
{
return ++i;
}
};
int Foo::*i_ptr = &Foo::i; // Pointer to member
int (Foo::*inc_ptr)() = &Foo::inc; // Pointer to method
Foo my_foo(0);
(&my_foo)->i_ptr; // Error: my_foo has no member "i_ptr"
(&my_foo)->*i_ptr; // OK
my_foo.inc_ptr(); // Error: my_foo has no member "inc_ptr"
my_foo.*inc_ptr(); // OK
// This is also why you can't create references
// to methods or other class members
int (&print)(const char *) = std::puts; // OK
int (Foo::&inc_ref)() = Foo::inc; // Error
int &my_int = my_foo.i; // OK
int Foo::&i_ref = Foo::i; // Error
It's basically a way to use the result of a pointer to a member.
Let's say you have
class Foo
{
int m_bar;
int m_baz;
}
You want to store off which member of Foo you want to reference, but you want to do it in a way not tied to any specific instance of Foo.
int (Foo :: * ptr) = &Foo::m_baz; // Basically offsetof() but safely typed.
Cool, but now what are you going to do with that member? What if I want to set it to 10?
int (Foo :: * ptr) = &Foo::m_baz;
std::vector<Foo*> foos;
... // populate foos;
for(Foo* foo : foos)
{ // For each foo, set whatever member ptr points at to 10
foo->*ptr = 10;
}
I've never seen it really used in practice (usually I see offsetof() and pointer math in this situation because the object type isn't known before hand) but it's an option.
So one thing that I've learned about programming: Don't overthink it. How many different object types do you want to serialize? Maybe a hundred? So just write those 100 lines of boilerplate. Not a huge amount of work, even when there is the the occasional modification. An additional benefit is that you're not painting yourself in a corner - for example, you can leave out that additional bit of redundant information that you hacked in as an optimization.
What's really huge for maintainability is this thing called aspect-oriented programming: Don't litter each of your classes with a line of boilerplate for this and another line for that. Make a file "serialize.cpp" that contains the serialization code for all the types, and nothing else.
In other words, object-oriented is the wrong tool, for almost everything (sorry if that's old news).