Hi,
A more efficient way to do this is without using the Arduino String library. Use standard C arrays/buffers instead.
I've made an example project for you, it's attached to this comment as a .zip. The example packs packet IDs, Wireling port number, and the four data vectors you're interested in (accelerometer, gyroscope, compass, & fusion data). Overall, the data is sent in three fragmented packets totaling 52 bytes for each Wireling (so 104 bytes). 52 bytes probably isn't needed, I'll talk about that later, for now, let's talk about the code.
I mentioned that the packets are fragmented. First, packet & Wireling IDs are each 1 byte and can be retrieved from any one packet after being sent; however, the 9-axis library (RTIMU) uses floats to hold sampled data. Each float is 4 bytes. We're sending accelerometer XYZ, gyroscope XYZ, compass XYZ, and fusion XYZ, that's (4*3)*4 = 48 bytes. There's also the one Wireling port ID to send (1 byte), and 3 packet ID bytes (3 bytes). In total there are 48 + 4 = 52 bytes to send in this configuration. Now, as you've found everything needs to be sent in packets of 20 bytes. Because each packet starts with at least an extra byte, and with the way everything lines up, some floats will be split across two packets instead of being contained to one. See the below diagram to see each packet's layout.
PACKET 0:
| ID0 | PORT | AX[0] | AX[1] | AX[2] | AX[3] | AY[0] | AY[1] | AY[2] | AY[3] | AZ[0] | AZ[1] | AZ[2] | AZ[3] | GX[0] | GX[1] |GX[2] | GX[2] | GY[0] | GY[1] |
PACKET 1:
| ID1 | GY[2] | GY[3] | GZ[0] | GZ[1] | GZ[2] | GZ[3] | CX[0] | CX[1] | CX[2] | CX[3] | CY[0] | CY[1] | CY[2] | CY[3] | CZ[0] | CZ[1] | CZ[2] | CZ[3] | FX[0] |
PACKET 2:
| ID2 | FX[1] | FX[2] | FX[3] | FY[0] | FY[1] | FY[2] | FY[3] | FZ[0] | FZ[1] | FZ[2] | FZ[3] |
Looking above, GY[0] + GY[1] are in packet 0 while the other half of the float is in packet 1, keep that in mind. There are some tricks that could be applied to get rid of the fragmentation, but we'll do it the hard way in case more data needs to be added in the future. That's fine though, the attached example takes care of everything when the above data is sent. In any case, I thought it was a good idea to show you how the packets are laid out in case you need to add more data in the future.
I don't know what the receiving side of your project looks like, but keep in mind when rebuilding the floats from each of their 4 bytes, that you need to know how your receiving hardware represents floats as bytes. Google "Endianness" and look at your hardware's datasheet, for both sending and receiving.
For the attached example, my hardware is a TinyScreen+, Wireling adapter shield, BLE Tinyshield, and 2 x 9DOF Wirelings. The Wirelings are connected to ports 0 and 1 on the adapter shield. You're going to be most interested in functions 'SendIMUData()' at line 332, and function 'SpecialUnpackPackets()' at line 237. Make sure to read the comments in each function. Although the BLE shield is set up, no Bluetooth data is sent in this example by default, but there are some lines in the 'SendIMUData' function you can uncomment to send the packets.
Also, I did not benchmark my example in any way, so see if it works for you in terms of speed. Remove/comment out functions that print to serial when using it in your project, those are there to help you see what's happening in the example. Serial prints slow the program down.
There are more optimizations that can be done. 'malloc' is called for every packet, but since the packets are the same size every time, the memory allocated should only be allocated once and reused every time (a fixed size global buffer). We'll also run into issues with memory fragmentation.
If this still doesn't work, the 9-axis IMU LSM9DS1 on our Wirelings advertises 16-bit output (2 bytes). The RTIMU library must cast this to a float (4 bytes) at some point. If you only needed the accelerometer, gyroscope, and compass data then you could send (2*3)*3 = 18 bytes of data in just one packet! This may mean figuring out how to read the registers of the LSM9DS1 directly or modifying the RTIMU library to be fixed-point and 16-bit (you probably won't be able to do fusion calculations unless that's modified as well to be fixed-point). Or maybe the resulting data from RTIMU can be cast back to 16-bit, we would need to ensure the cast data looks correct. The RTIMU library is supposed to work with many different kinds of IMUs, so floats were probably used to make the library flexible.
There is a lot going on, let me know if you have any problems with the example.
NOTE: See my next reply, the version attached there only allocates memory once for the packets.