Decoupling your code into isolated modules, is a technique that should well be understood in those times of OO programming. All the classes in a module, should minimize the dependancies on the classes of other modules. Yet still, it's perceived in the wrong way, as a tool mostly to be used to improve code reusability and extensibility, or implemented in the wrong way, that's to say, in the wrong places.
Decoupling eases refactoring (local changes do not propagate), everything changes, expecially in games (requirements change, no design can be made totally upfront), eases testing (local testing, does not propagate) that again, eases refactoring, but in some cases, it can also be the best possible design to improve performance, and reduce (art) iteration time.
Many engines (expecially the ones you find in books) are centered around the scene tree, a collection of linked nodes that have a coordinate system, where the connections establish the relations between those systems, the child nodes are expressed in relation to the parent's system. Eventually, certain nodes have a volume (that usually includes the volume of its childs, to form a BVH), so they can be culled for visibility, and are renderable, usually by containing some links to meshes and materials.
The engine then traverses the graph, if a node is visible, then it gets rendered. This is the naive approach. Usually it's slow, as the traversal is not cache friendly, expecially as we're going to compute per node, many different things on different data (visibility, set material, draw meshes). And when you face that problem, the first solution is to add caches, so you can precompute the traversal of all the static elements. That makes your code bloated, but it's not the only problem.
Having the scenetree coupled with all kinds of engine concepts, makes changing it a nightmare. Usually, it's built from artist-authored data, so a file exported from a 3d application is loaded, processed by the engine pipeline to create the scenetree, and then the scenetree is serialized, as usually the exported data has to be heavily processed in order to create render optimized, platform specific objects, and so it's not something that we want to do at load time.
But what if we later want to change how a given object is rendered? We have to change a node implementation, let's say that we add a new kind of node, that is to be used in some given situations. To construct this new kind of node, we have to change the conversion pipeline as well, and to reprocess all the converted assets! Coupling turns change into a nightmare.
Also I wonder why we structured our rendering data structure, around a tree. Most objects do not need the coordinate system relationships, as this is only useful for animated rigid objects with joints, something that is not really the most common use case. Also, visibility computation, can well need different relationships, hierarchical (BVH) or not at all, depending on the algorithm we want to implement.
Rendering is better suited to be described by pipelines. Renderable objects are grouped in lists, update is called to sync render object state with current frame simulation (game) data, visibility function creates selects visible objects in that list, rendering function creates a list of basic renderable entities (mesh,material), then we sort those entities, then we build the command buffer.
Artist authored data usually comes in the form of a scene tree, but that should only be processed in order to create renderable objects, not used as the basis for our engine. Plus, doing so, we could easily choose if to serialize the renderable objects, or the converted scenetrees used to build them. If the second solution is taken, we can always change the renderable object type that is built for a given scene. Of course, serializing less comes with the extra cost of rebuiling more data, but it also makes the (painful) serialization process more important, as we can change more without impacting serialization. So be careful. Benchmark your IO speed and then design your serialization. Most of the times, you need only to save converted meshes (slow to process), textures, animations and materials. But NOT the containers of those things. So later, when the artists change a texture, you can simply reprocess THAT, and not the entire serialized scene.
Usually, a render engine can be seen in layers, the first two places are taken by the hardware and the native APIs, then you build your data abstractions (meshes, shaders) and your render API (crossplatform renderdevice), then you build your engine on top of those (render algorithms, materials, effects etc). Choose wisely at which layer you want your serialization to operate.
The bottom line is that having less "powerful", less "generic" nodes, and decoupling more gives us not only more flexibility, but also faster iteration, and faster performance, as a renderable object can process converted data and order it in the most cache/algorithm/etc friendly way it needs.