Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
An Introduction to Reflection in C++ (jackieokay.com)
140 points by frostmatthew on April 15, 2017 | hide | past | favorite | 64 comments


> It would be really nice to compose objects hierarchally: for example, if I had a widget class composed of several different renderable Rectangles, I want to be able to automatically generate the rendering code for my widget based on the existing rendering logic for its constituent shapes.

So one thing that I've learned about programming: Don't overthink it. How many different object types do you want to serialize? Maybe a hundred? So just write those 100 lines of boilerplate. Not a huge amount of work, even when there is the the occasional modification. An additional benefit is that you're not painting yourself in a corner - for example, you can leave out that additional bit of redundant information that you hacked in as an optimization.

What's really huge for maintainability is this thing called aspect-oriented programming: Don't litter each of your classes with a line of boilerplate for this and another line for that. Make a file "serialize.cpp" that contains the serialization code for all the types, and nothing else.

In other words, object-oriented is the wrong tool, for almost everything (sorry if that's old news).


C++ has "built in" serialization the same way that C does.. Except its not acceptable to use anymore, even though its orders of magnitude faster than the alternatives.. AKA you write your structures out directly as binary data. Bonus if you align them. The limitation is that you can't have classes/structures with actual pointers in them (rather using offsets) and you have to encourage your compiler to pack/align them consistently. Bonus if you override the new/placement allocator so that you can serialize gigabytes of data (including pointers) with a simple map/unmap call.

Frankly, the idea that data should be human readable is complete crap, especially for things which are going to be machine processed far more than human processed. AKA use XML for your config file, not for the gigabytes of data you send over the wire/store on disk/etc, that will never be read by a human. Its funny that HTML/etc has everyone in this strange mindset that somehow its more efficient to build human readable markup, which 99.9999% of the time is never read by a human, and has to be compressed with algorithms that further burn up tons of CPU time in order to squash the resulting data. The untold tons of carbon wasted transfering more data than necessary and/or compressing/decompressing it is insane for the possibility that someone will want to read it as it flys over the wire, rather than decoding it at the endpoint and viewing it in a debugger...


Dumping raw binary is a problem if you want your data to be readable across platforms or versions of the same code.

It is also a bad idea if reading a complete structure into memory is not possible, or a poor use of time. For example, if you need to read a single part of a gigabyte data set.


None of this is a show stopper, especially with a little planning. AKA pad the structures a bit initially, or put version numbers in them. Even if you fail to do any of that early on, it can be bolted on with a bit of preprocessing as part of a version upgrade. Its really no different than a schema change between version numbers. Plus, if your really paranoid, you verify field size and offsets at startup time against a previously generated bit of metadata.

Also, if you happen to need to pick out a tiny bit of data from some huge tree/whatever data structure then map it with appropriate madvise() and simply let the kernel page in the portions you need to access. The whole dataset doesn't need to be read to pick out a couple bytes here and there. Further if this is a common use case you probably shouldn't be using this method anyway, rather storing the index separately from the data in a manner which allows them to be accessed independently. This mechanism holds up for multiple TB sized datasets, and can be tuned by playing with the amount of physical ram in the machine vs the speed of the disk. Then when storage class memory takes off, you don't even have to rewrite your application.

Finally, for a single GB dataset, doing a bulk read is likely well under a 1 second operation on most reasonable storage mechanisms available for server use. Compared with picking off a couple hundred thousand fields from an object storage mechanism this is going to seem pretty speedy.


Yes, binary data is good. It's better to explicitly serialize/deserialize it, though. That way endianness won't become an issue ever, and the programmer does not need to be concerned over runtime memory layout of the structs.


At this point little endian has pretty much won. Even IBM with their POWER architecture now uses little endian for their machines targeting Linux.

Its far more efficient to assume little Endian and let the weird machines do their own translations when loading the data.


> AKA you write your structures out directly as binary data.

What does that even mean? Depends on what the output should be and how much control you need, I guess.

To comment on the binary vs text digression - computation overhead is probably not that much if the data is converted to a (non-portable) internal DOM representation and then typeset / rasterized anyway. And is bandwidth/storage overhead? I assume a short video quickly outweighs the costs for 100%, or in extreme cases 1000%, overhead compared to a compressed DOM representation. And the human factors are definitely there.


> > It would be really nice to compose objects hierarchally: for example, if I had a widget class composed of several different renderable Rectangles, I want to be able to automatically generate the rendering code for my widget based on the existing rendering logic for its constituent shapes.

Actually, what's interesting is that both Lisp macros and C++ templates allow this to work pretty easily if you set up the boiler plate well enough. All this article is really talking about is making it easier to set up and use that boilerplate (e.g. put it in the language).

> What's really huge for maintainability is this thing called aspect-oriented programming: Don't litter each of your classes with a line of boilerplate for this and another line for that. Make a file "serialize.cpp" that contains the serialization code for all the types, and nothing else.

And that works if your project is not meant to ever be extended. But generally the point of having object oriented systems is being able to make new objects that can take the place of old ones. Of having abstraction.


If by "abstraction" you mean ADTs - of course there are some (not many) places where they are useful.

And you can easily have them or change to them later. It's just another few lines of boilerplate to make a few method tables. But I don't like to have the implementation logic in the places where method dicts are wired, especially for interfaces with very little coherency, like GameModel3d: render, click, contextString, serialize... Doing that instead of aspect-oriented is a violation of the "separation of concerns" principle.


Isn't the dream of a reflection library to build serialization using an Observer pattern, where the serialization logic is unbounded and business logic never mixes? It would be great if the standard came up with something that would be so broadly useful and allow developers to focus on their domain instead of serialization.


> So just write those 100 lines of boilerplate.

Although I agree with your ide, I've written some boilerplate code before to do something very similar. As the project grew its become a constant source of bugs where someone has updated the code for one type and not another. At this point they're too far gone to have any hope of merging them, but had I known two years ago how much hassle it would have caused I would have refactored it into common code


Yeah - I agree with both of you. First start with the boilerplate approach, and once there is boilerplate in place for a low number of entity types, say, 2<n<10, figure out if that can be refactored into something less boilerplatey.

But yeah, very seldom starting with a complex abstraction for a code you've never written before is a good idea.


Hopefully the next version of the C++ standard will incorporate some form of the unified call syntax proposal, which will make the use of this style of code organization more consistent with traditional one-per-class declaration files.


I learned some things from reading that article. I hadn't heard about those other schemes to do rtti. I just want it to be as easy as objective c was, and I still like those smalltalk like api calls that were in that.

But the article reminds me why I still feel frustrated that my programming work is divided between java and c++. There are so many things that are easy in java and hugely painful in c++, and c++ slowly adds those ideas yet it makes the language more painful to use over time. I basically want to throw c++ away and replace it with something that is more regular and doesn't expose so many complexities to the end programmer like the endless nuances of memory allocation, moving, &&, r values etc. But there's no hope of that. For the rest of my life every year or two there will be additions to c++ that I will have learn about and use and gradually discover their own warts, yet java will still keep cruising along.


What I want is a language over C++, as typescript is over JavaScript, that keeps compatibility with C and C++ code (Qt, boost...) and provides a clear syntax and some advanced features (as reflection) even paying a price for it. This is it, no more zero cost abstraction.


I'm doing that but it's more a CoffeScript than a Typescript (no semicolons and no commas).

An example:

    fn add(x1 x2)
    {
     return <x1+x2>
    }

    fn <main>(count:int arguments:<const char** const>):int
    {
     (<printf> "%d\n" (add 2 3))

     return 0
    }
Is converted to:

    auto add(const auto x1,const auto x2)
    {
     return x1+x2;
    }

    int main(const int count, const char** const arguments)
    {
     printf("%d\n",add(2,3));

     return 0;
    }
Everything inside the <angle brackets> is left as is in the C++ output.

I use S-expressions for function calls.



The closest thing I can think of would be the subset of C++ used in Stepanov + McJones' Elements of Programming, with the addition C++11 stuff like lambdas and std::function, plus (hopefully) C++20 stuff like compile-time reflection, concepts and ranges.


That is a great idea and great metaphor to describe your new language.


Like D?


No, D is a separate language. A bit too much. Ideally you will be able to move between a version of the language or another at Class level to alllow a incremental migration of the code that must be "modernized".


Cruising along with frequent pit stops for garbage collection.

All jokes aside I agree with you to some extent: C++ can be frustrating to work with, and there are a lot of quirks that make it really easy to blow your foot off.

These quirks are not without benefit though. Being able to control how memory is allocated is hugely important in performance sensitive applications where pointer chasing is not tolerable. The same can be said about r values/move semantics.


You can get those benefits without the complexity of C++ if you're willing to break with the past (and as for the newer features, it's possible C++ could have implemented them more straightforwardly, though maybe they were already designed into a corner).

For example, look at C++'s concept of move semantics. The language still runs destructors regardless of whether a value has been moved. This means every movable class has to have a valid "empty" state the destructor can check to avoid double frees. That means you need move constructors which can run arbitrary code. And that means generic code that does moves has to care about exception safety.

Instead, the language could statically avoid running the destructors of moved values. No more rvalue zoo needed to trigger a move, no more mandatory "empty" states because the destructor can assume a valid object, no more move constructors because they all reduce to memcpy, and thus no more exception safety problems and an easier job for the optimizer.


If you implement a move as a swap (as in the examples I saw when I first heard of move constructors back when it was called c++0x) you don't need an empty state.

Not being able to keep track of what cases call a destructor and which ones do not sounds like a much more huge problem than the one you are talking about. Doesn't seem too hard to avoid the temptation throw in a move constructor.


You can't always swap. For example, you may want to use a movable but non-copyable type to keep un-aliased access to a single resource (see linear/affine types) and the compiler treating the move source as dead is useful there. Or maybe you don't want to allocate anything for the move target because the resource you're managing is expensive.

Keeping track of when a destructor gets called is not a huge problem at all. From the user perspective it's airtight, proper uninitialized value analysis in the compiler make it impossible to screw up. From the implementation perspective all you need is the occasional flag on the stack (whose value is already calculated) for situations like `if ... { use_by_move(obj) }`, and a rule against moving things out of fields that you're otherwise not responsible for destructing (you can still swap there).


> If you implement a move as a swap (as in the examples I saw when I first heard of move constructors back when it was called c++0x) you don't need an empty state.

You do to e.g. default construct the temporary you'll be swapping with in the first place. Problematic if you're trying to construct e.g. a reference-like type with no null state.


How do you handle mode as a swap for something like a socket, or a file handle in that case?


These typically have null-like states in C++. If we were making a std::ifstream clone without templates inheritence, etc. we might write...

  class ifstream {
    FILE* f;
    ifstream(const ifstream&) = delete;
    ifstream& operator=(const ifstream&) = delete;
  public:
    ifstream(): f() {}
    explicit ifstream(const char* path): f(fopen(path, "rb")) {}
    ~ifstream() { if (f) fclose(f); }
    
    ifstream(ifstream&& original): f() {
      std::swap(f, original.f);
    }
    
    ifstream& operator=(ifstream&& original) {
      std::swap(f, original.f);
      return *this;
    }
    
    // ...
  };
This is very similar to the common pattern of implementing vanilla assignment in terms of copy construction and a swap.

EDIT: Added deleted copy ctor/assignment ops because I'm not a savage.


In the move constructor of the object wrapping an open file, swap the integer descriptors.


Have you had a look at D, yet?


A shame that D[1] wasn't mentioned under other alternatives as it has a comprehensive system for compile-time introspection built in!

[1] http://dlang.org/


I am writing a game engine and C++'s lack of proper reflection is a real hindrance for a variety of serialisation/deserialisation interfaces, be it networking or saving things to disk and loading them. You just end up writing redundant and more brittle code for no reason other than that a language feature is missing. I think it is great that this problem is given attention.

Edit: After thinking on it for a few minutes. All I want is a generic function that takes a class as the generic type T, and a string as an argument, and returns me a completely filled in object of type T, or throws an error on malformed input. I really don't think I am asking for too much here.


i'm not sure about your particular issue, but every time i had to choose between being smart and generic with serialization over being straightforward and boring ( at the cost of writing one small line of code for each and every property), i always choose to stay boring.

Not only is it nicer to the person that's going to maintain your code and may not know about your smart trick, but with generic code there is always an exception that makes you need to write some code somewhere that breaks the abstraction.


Same here. Generic serialization often breaks when there are some changes to the data and you have to be compatible with different versions.


> and returns me a completely filled in object of type T

What if type T contains a pointer to type U ? Or a list of pointers to a type U, with some duplicates in the list? What if T contains a floating point number, and the application is compiled with ffast-math?

What if the object contains a reference to some unknown type (a file, or socket)?

There are lots of ways of solving these problems, but each of the solutions has trade offs which may be acceptable to you, but aren't acceptable to me and vice versa.


> All I want is a generic function that takes a class as the generic type T, and a string as an argument, and returns me a completely filled in object of type T, or throws an error on malformed input.

Well, what you need is a way to enumerate the members of a class. Everything else is already done by something like Cereal or Boost.Serialization.


Usually you might want to instead use code generation at compile time to do these things, even in Java.

Reflection is always magnitudes slower.


.NET allows you to use reflection to emit bytecode which does the same work as reflection. More complicated to setup, but the speed-up is huge. Not sure if Java has an equivalent? I'd be curious to know.


Yes. You generate bytecode for classes that are loaded by a class loader, and thereby enter the JIT world etc. The tech predates .NET, but it isn't as developer friendly as DynamicMethod etc.


The author made a small mistake in this article which may be worth pointing out: -fno-rtti does not disable the generation of vtables (which would break virtual method dispatch) but only runtime type information. The two are connected in that RTTI is generally stored with an object's vtable, so objects with no virtual methods don't get it. The alternative would be to include a pointer to type information in all objects without a vtable, but the amount of memory this would require would be substantial, and it would break expectations about the size and memory layout of "plain old data" structs.


True, I will correct this in the article.


Is anyone doing a strong constexpr hash for use in compile-time serialization? It would be impressive to combine compile-time serialization with a consistent method of evaluating type differences. I've heard of people doing this with a dedicated compile/link step.


Be careful with doing this - the serialized identifier for the type needs to be separate from the type name or else you can never rename.


Generally trying to come up with complex compile time higher level abstractions in C++ is a waste of time. A better idea is to figure out a relational schema for the runtime presentation of the data and represent that with a combination of standard datastructures and PODs. I really, really would see mental efforts spent in systems design and simplifying the problem and not in findindg the best way to contort C++ to some other languages paradigm (on the compiler level - runtime behaviour for any complex dynamic data might start to appear like lisp but please don't try to make the compiler understand that).


>$, as a way of getting object metadata out of a class, namespace, etc. (Using $ has been argued out of favor because it is common in legacy code, particularly in code generation and template systems which are not necessarily valid C++ but produce C++ sources.)

you have got to be kidding me, right ? "$" adding this character too to the C++ ? What left we haven't add ?

Why not use only something like "reflexpr" which is more C++ish than $ character. Why make C++ more difficult to parse and read with every release ?

TBH I genuinely don't understand why we need both ? why not only have "reflexpr".


$ is already a valid identifier character in a clang and gcc extension. And there won't be reflexpr and $, those are two different proposal. And due to the extension, it won't be $ (most likely).


Thank you. I really hope so.


> What haven't we added yet?

There's also the @ sign and the backtic `.

I think that's it for ASCII.


And caret (^), although it is used in Microsoft's C++/CLI and C++/CX extensions.


The caret is used for bitwise xor. C-style bitwise ops in general take up a lot of the good characters, in so many languages. :(


Many characters have multiple uses depending on context; e.g. A single asterisk, or a single ampersand


Yes, right, brain-fart. (In my defense bitwise xor is probably the least common operator).


->* begs to differ


Can someone remind me what that's for? I never use it, I keep trying to look it up and then I forget it again. Maybe because I never quite understood it properly.


I'm batting 0 for 2 so far in this thread, but I'll take a stab at it now I've been reminded of those crazy beasts' existence. :)

  struct Foo {
          int i;
          int inc()
          {
                  return ++i;
          }
  };

  int Foo::*i_ptr       = &Foo::i;   // Pointer to member
  int (Foo::*inc_ptr)() = &Foo::inc; // Pointer to method

  Foo my_foo(0);

  (&my_foo)->i_ptr;  // Error: my_foo has no member "i_ptr"
  (&my_foo)->*i_ptr; // OK
  my_foo.inc_ptr();  // Error: my_foo has no member "inc_ptr"
  my_foo.*inc_ptr(); // OK

  // This is also why you can't create references
  // to methods or other class members

  int (&print)(const char *) = std::puts; // OK
  int (Foo::&inc_ref)()      = Foo::inc;  // Error

  int &my_int     = my_foo.i; // OK
  int Foo::&i_ref = Foo::i;   // Error


Ok, lovely, but why is this useful? Like, is there any problem where this is the simplest way to solve it?


It's basically a way to use the result of a pointer to a member.

Let's say you have

  class Foo
  {
    int m_bar;
    int m_baz;
  }
You want to store off which member of Foo you want to reference, but you want to do it in a way not tied to any specific instance of Foo.

  int (Foo :: * ptr) = &Foo::m_baz; // Basically offsetof() but safely typed.
Cool, but now what are you going to do with that member? What if I want to set it to 10?

  int (Foo :: * ptr) = &Foo::m_baz;
  
  std::vector<Foo*> foos;
  ... // populate foos;
  for(Foo* foo : foos)
  { // For each foo, set whatever member ptr points at to 10
    foo->*ptr = 10;
  }

I've never seen it really used in practice (usually I see offsetof() and pointer math in this situation because the object type isn't known before hand) but it's an option.


So you have seen offsetof used in practice? Can I have a pointer? (I will not confirm or deny whether that pun was intentional.)


It's a matter of taste; for me 'reflexpr' is about as disgusting as 'constexpr'.


No it is not matter of taste. I dont have anything against $ in php, perl, or most other languages. But it is not consistent with C++ syntax at all.


AKA consistent.


What blogging engine is that? It looks nice.


Looks like one of the standard jekyll themes. I could be wrong though





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: