Saving and Loading

By Peter Hull
[email protected]
September 2001

Contents

  1. Introduction
  2. The code
  3. Allegro packfiles
  4. Versions and Compatibility
  5. Conclusions

Introduction

This document gives you some hints on implementing a save and load feature in your programs. This is also called persistence or serialization, which is a good way to describe what we're doing. The same procedures apply to I will use simple code examples to illustrate each point. Some of these examples might seem a bit odd or over-complex, but I wanted to show how you might do things in the general case.

The Code

I've written the code in C, as most people understand it, even if they prefer C++. For the first examples, you won't even need Allegro. The system I am implementing is a high score table, which will remember the best scores between sessions. I haven't included any of the actual mechanics of the table here, for example, how to insert a new score, enter initials and so on. I just cover the save and load parts.
The principle is very simple. Just decide what data you need to save, and remember this rule: Do not save any pointers. That's it. Pointers are a location in your computer's memory, so the actual value of a pointer is not the same from session to session.

Stage 1: The Simple Example

For convenience, I like to keep all the data together, in a C structure or array. In this example, there are no pointers, so it's fairly easy. I want to save the best five scores and record a score and initials for each.
#define N_ENTRIES 5
#define N_INITIALS 3
struct hi_entry
{
  int score;
  char initials[N_INITIALS];
} hiscores[N_ENTRIES];

I won't use any Allegro here, just standard C file input and output. If you've got a C book, it's bound to cover these functions in much more detail, so I suggest you get familiar with them first. Using Allegro functions actually makes this a bit easier, as you'll see later. My save and load functions are
void save_hi(struct hi_entry* h, FILE* f)
{
  fwrite( h, sizeof(struct hi_entry), 1, f);
}
void load_hi(struct hi_entry* h, FILE* f)
{
  fread( h, sizeof(struct hi_entry), 1, f);
}

You can see the correspondence between the save and load functions, I hope. The parameter f is a file descriptor, and we save one structure with a size of sizeof(struct hi_entry). What gets written to disk is
{ssss}{ccc}
You use them like this
FILE* fhi;
int i;
fhi=fopen("hi.dat", "rb"); /* open for reading */
for (i=0; i<N_ENTRIES; ++i)
  load_hi(&hiscores[i], fhi);
fclose(fhi);
/* ..rest of program ... */
fhi=fopen("hi.dat", "wb"); /* open for writing */
for (i=0; i<N_ENTRIES; ++i)
  save_hi(&hiscores[i], fhi);
fclose(fhi);

The "rb"/"wb" are the file modes; open in binary read mode or binary write mode. It's important to specify binary mode on DOS/Windows, otherwise the system will try and convert it to a text file. In the real world, you should check for the fopen, and if it fails to read, set up the score table to a sensible default.
I should emphasise that this approach is not very portable, so if someone compiles your program on a different type of computer or even a different compiler, they might not be able to read your saved files. I'll come to this later.

Stage 2: Pointers that are selectors

By this, I mean saving structures with pointers, where you're not going to save the thing it's pointing to. In the hiscore example, suppose it was a space game where you could pick your home planet from a list at the start. I don't want to save the name of the planet in the hiscore table, but I do want to record which one it was. My structure is
struct hi_entry
{
  int score;
  char initials[N_INITIALS];
  char* planet;
} hiscores[N_ENTRIES];
char* planets[]={"Earth", "Mars", "Venus"};

You might think that you could just set, for example, hiscores[0].planet=planets[2]; and save it, since the planets array is part of the program itself. However, you can't rely on this at all. What you must do is convert planet to an identifying number, and save that. In this case, one way is to convert it to an index in the planets array:
void save_hi(struct hi_entry* h, FILE* f)
{
	int index;
	
	fwrite( &h->score, sizeof(int), 1, f);
	fwrite( h->initials, sizeof(char), N_INITIALS, f);
	index=0;
	/* convert pointer to index */
	while (planets[index]!=h->planet)
		++index;
	fwrite( &index, sizeof(int), 1, f);

}
void load_hi(struct hi_entry* h, FILE* f)
{
	int index;
	fread( &h->score, sizeof(int), 1, f);
	fread( h->initials, sizeof(char), N_INITIALS, f);
	fread( &index, sizeof(int), 1, f);
	h->planet=planets[index];
}

You can see I've had to go onto saving each element of the structure individually. The resulting disk file is

Another way (simpler in this case) is to store the index in the structure instead of (or as well as) the pointer.
You may also come across this when you are saving the pointed-to-object, but you're going to do it later. In an adventure game, for example, each collectable item that you can pick up might have a pointer to its location. You will want to save the state of the locations, too, but in another part of the file. If there are two items in the same location, you don't want to save the state of the location twice. You would save all locations, then save all items, using functions like int convert_location_to_identifier(struct location*) and struct location* convert_identifier_to_location(int).

Stage 3: Saving pointers

If, for every structure X, you have functions like save_X(struct X*, FILE*); and load_X(struct X*, FILE*) then it's easy. For example:

struct inner
{
  int a,b,c;
};
struct outer
{
  int x,y;
  struct inner z;
};
void save_outer(struct outer* o, FILE* f)
{
  /* save x and y */
  fwrite( &o->x, sizeof(int), 1, f);
  fwrite( &o->y, sizeof(int), 1, f);
  /* go into the 'inner' member */
  save_inner(&o->z, f);
}
void save_inner(struct inner* i, FILE* f)
{
 /* ... you know how to do this! ... */
}

To use pointers, there's only a slight modification:

struct inner
{
  int a,b,c;
};
struct outer
{
  int x,y;
  struct inner *z;
};
void save_outer(struct outer* o, FILE* f)
{
  /* save x and y */
  fwrite( &o->x, sizeof(int), 1, f);
  fwrite( &o->y, sizeof(int), 1, f);
  /* go into the 'inner' member */
  save_inner(o->z, f);
}
void load_outer(struct outer* o, FILE* f)
{
  /* load x and y */
  fread( &o->x, sizeof(int), 1, f);
  fread( &o->y, sizeof(int), 1, f);
  /* allocate a new inner member */
  o->z=malloc(sizeof(struct inner));
  /* go into the 'inner' member */
  load_inner(o->z, f);
}
(on disk, this is identical to the previous one)

As I'm now using malloc to allocate the memory for my inner structure, I need to use free somewhere to avoid a memory leak (leaving memory allocated when I'm not using it any more.)

If there's a chance that z could be a NULL pointer, I need to handle that. The code, as it stands, would crash in the save_inner function. The solution is to write an extra value, which is zero if there's a NULL pointer and non-zero otherwise.

void save_outer(struct outer* o, FILE* f)
{
  int nullp;
  /* save x and y */
  fwrite( &o->x, sizeof(int), 1, f);
  fwrite( &o->y, sizeof(int), 1, f);
  if (o->z==NULL)
  {
    nullp=0;
    fwrite(&nullp, sizeof(int), 1, f);
  }
  else
  {
    nullp=1;
    fwrite(&nullp, sizeof(int), 1, f);
    /* go into the 'inner' member */
    save_inner(o->z, f);
  }
}
void load_outer(struct outer* o, FILE* f)
{
  int nullp;
  /* load x and y */
  fread( &o->x, sizeof(int), 1, f);
  fread( &o->y, sizeof(int), 1, f);
  fread (&nullp, sizeof(int), 1, f);
  if (nullp)
  {
    /* allocate a new inner member */
    o->z=malloc(sizeof(struct inner));
    /* go into the 'inner' member */
    load_inner(o->z, f);
  }
  else
    o->z=NULL;
}
(when z is not NULL)
(when z is NULL)
This is getting quite complicated; you need to have good grasp on what's a pointer and what's an embedded structure.

Stage 4: Variable length arrays

This is the final complication. Sometimes a pointer to X is actually a pointer to the start of an array of Xs. The classic example is, of course, the C string. A C string is just an array of characters where the last one is a zero to mark the end. So, "hello" is represented in memory as {'h','e','l','l','o','\0'}, six characters in total. Let's replace the initials in the hiscore structure with a string, so we can have longer names, of variable length

struct hi_entry
{
  int score;
  char* name;
};
I just use an extra number, as before. It holds the length of the string plus 1. The +1 allows me to distinguish between an empty string "" (in memory {'\0'}), and a null pointer.

void save_string(char* s, FILE* f)
{
  size_t l;
  if (s)
  {
  /* not a null string */
    l=strlen(s)+1;
    fwrite( &l, sizeof(size_t),1, f); /* save length */
    fwrite( s, sizeof(char), l, f);
  }
  else
  {
    /* null string */
    l=0;
    fwrite( &l, sizeof(size_t),1, f); /* save length=0 */
  }
}

char* load_string(FILE* f)
{
  size_t l;
  char * str;
  fread(&l, sizeof(size_t), 1, f);
  if (l)
  {
  /* not null */
     str=malloc(l);
     fread(str, sizeof(char), l, f);
  }
  else
  {
    /* null pointer */
    str=NULL;
  }
  return str;
}
(the string "Hello")
(a null string)
I had to change the form of the load function to allocate the memory itself and return it, as we don't know how long the string is going to be until we've read in the first value.

This made use of the final '\0' to mark the end of the string. This isn't always available. Suppose, in the game, you are offered a choice of missions at the start. You can pick any mission, and once it's done, pick another until you have completed all the missions. I want to record which missions have been played and what the score was on each. I have to include an extra value to keep track of how long the array is.


struct mission
{
  int id;
  int score;
};
struct hi_entry
{
  int score;
  struct mission* missions;
  int n_missions; /* how many in the array */
  char *name;
};
void save_hi(struct hi_entry* h, FILE* f)
{
  int i;
  fwrite(&h->score, sizeof(int), 1, f);
  /* store how long the array is */
  fwrite(&h->n_missions, sizeof(int), 1, f);
  /* save each mission in turn */
  for (i=0; i<h->n_missions; ++i)
    save_mission(&h->missions[i], f);
  save_string(h->name, f);
}
void load_hi(struct hi_entry* h, FILE* f)
{
  int i;
  fwrite(&h->score, sizeof(int), 1, f);
  /* how many missions to expect */
  fwrite(&h->n_missions, sizeof(int), 1, f);
  /* create an array */
  h->missions=malloc(sizeof (struct mission)*h->n_missions);
  /* load each one in turn */
  for (i=0; i<h->n_missions; ++i)
    load_mission(&h->missions[i], f);
  h->name=load_string(f);
}
That just about wraps it up for this section. The main points are:

Allegro Packfiles

Allegro includes functions to automatically compress and decompress saved data, so it's worth using them if you can. It also makes the files a little harder to hack into!
If you understand the C file functions I've been using above, it's easy to convert to the Allegro ones.
Replace this......with this
FILEPACKFILE
fopen(s, "rb")pack_fopen(s, F_READ_PACKED)
fopen(s, "wb")pack_fopen(s, F_WRITE_PACKED)
fread(p, l, n, f)pack_fread(p, l*n, f)
fwrite(p, l, n, f)pack_fwrite(p, l*n, f)
fclose(f)pack_fclose(f)
Note in particular that Allegro uses a single argument for the size to read/write, instead of the (size, number of items) for C.

Versions and Compatibility

Using what we've got so far, you will certainly be able to save some data and read it back in again on your machine. However, if you plan to distribute your program to people with different operating systems, processors or even different C compilers, you need to take extra care. I've been telling you, basically, to transfer chunks of memory to the disk using fwrite. However, the way data is held in memory is not defined by the C standard, so you may expect some differences.

Alignment and Endian

When you define a structure, the C compiler may insert some extra, hidden members into it. Some processors are much faster at accessing data whose address is multiple of two (word alignment) or four (dword alignment.) For most 32 bit compilers (such as djgpp and MinGW), every int is 4 bytes, every short is two, and a char is one. The compiler might, or might not, insert padding to ensure the correct alignment. You can tell it not to, but then you will pay a performance penalty. If your structure is

struct s /* 8 bytes long */
{
  short a; /* at offset 0 */
  char b; /* at offset 2 */
  char c; /* at offset 3 */
  int d; /* at offset 4 */
};
and you need dword alignment, it may be compiled as

struct s /* 16 bytes long */
{
  short a; /* at offset 0 */
  short _pad1; /* an invisible member which you can't access */
  char b; /* at offset 4 */
  char _pad2[3];
  char c; /* at offset 8 */
  char _pad3[3];
  int d; /* at offset 12 */
};
This is why I always use sizeof instead of calculating it by hand. If you save the structure as one block, using fwrite(p, sizeof(struct s), 1, f); and read it back on a computer which only needs word alignment, the two structures will not match up, and disaster will occur. Therefore, it's better to save each element individually. It takes up less space, and, as we're all super C programmers, is more correct.
Even so, there's still a case when you might have problems. An int is a sequence of four bytes on 32 bit machine, and on Intel processors, the first byte is always the least significant, which is called little endian. So, the sequence {0x12,0x34,0x56,0x78} represents the integer 0x78563412. On other processors, Motorola's for example, it is the other way round (big endian): {0x12,0x34,0x56,0x78} represents 0x12345678.
Allegro, being multi-platform, has functions for this. Use pack_iputl to write a 32-bit value, and pack_igetl to read it back, always using the Intel sequence. If you are primarily a Mac user, pack_mputl uses the Motorola sequence. There's also pack_iputw for 16-bit values and pack_putc for 8-bit values. Note that there's only one variant for bytes. I shan't explain why.

Magic numbers

Magic numbers are a useful trick in some circumstances. These are numbers that don't have any particular meaning in themselves, but can identify something. For example, my telephone extension number at work is 1732. This doesn't tell you anything about where I sit, or my job or status, but it does allow the exchange to route calls to my telephone. If two telephone numbers are consecutive, that doesn't imply anything about the relationship between the two phones.
Allegro uses magic numbers to identify packfiles, Windows uses them to identify executables and many other types of file. Magic numbers are useful Here is some code from the hiscores example. I'll use the Allegro functions, too.

/* this could be any 32-bit number, really */
#define HI_MAGIC 0x99669966

/* save the table */
PACKFILE* f=pack_fopen("hi.sav", F_WRITE_PACKED);
pack_iputl(HI_MAGIC, f); /* write magic number right at the start */
/* ..now save the table itself; */
for (i=0; i<N_ENTRIES; ++i)
  save_hi(&hiscores[i], f);
pack_fclose(f);


/* load the table */
PACKFILE* f=pack_fopen("hi.sav", F_READ_PACKED);
if (pack_igetl(f)==HI_MAGIC)
{ 
  /* ..now load the table itself; */
  for (i=0; i<N_ENTRIES; ++i)
    load_hi(&hiscores[i], f);
}
else
{
  /* error - not a hiscore table */
}
pack_fclose(f);


Suppose I released version 1.0 of my game, then changed the definition of struct hi_entry for version 2.0. I would change the value of HI_MAGIC to something else, to prevent the program from trying to load an old-format table. If I were really clever, I would do this

/* this is for version 1.0 */
#define HI_MAGIC_1_0 0x99669966
/* this is for version 2.0 */
#define HI_MAGIC_2_0 0x87654321

/* save the table */
PACKFILE* f=pack_fopen("hi.sav", F_WRITE_PACKED);
pack_iputl(HI_MAGIC_2_0, f); /* write magic no right at the start */
/* ..now save the table itself; */
for (i=0; i<N_ENTRIES; ++i)
  save_hi(&hiscores[i], f);
pack_fclose(f);


/* load the table */
PACKFILE* f=pack_fopen("hi.sav", F_READ_PACKED);
long int m=pack_igetl(f);
if (m==HI_MAGIC_2_0)
{ 
  /* ..now load the table itself; */
  for (i=0; i<N_ENTRIES; ++i)
    load_hi(&hiscores[i], f);
}
else if (m==HI_MAGIC_1_0)
{ 
  /* ..import from old format and convert to new */
  for (i=0; i<N_ENTRIES; ++i)
    load_hi_1_0(&hiscores[i], f);
}
else
{
  /* error - not a hiscore table */
}
pack_fclose(f);


To show my third use for magic numbers, I'll need to jump into C++. Supposing I have a class Inventory for keeping track of the player's possessions. Each possession is a subclass of class Item, with virtual functions. Very briefly:

class Item
{
  public:
  virtual int id() const=0;
  virtual void save(PACKFILE*);
  virtual void load(PACKFILE*);
  /* rest of class... */
};
#define I_RUBY 0
class Ruby: public Item
{
  public:
  virtual int id() const {return I_RUBY;}
  virtual void save(PACKFILE*);
  virtual void load(PACKFILE*);
  /* rest of class... */
};
#define I_CHEESE 1
class Cheese: public Item
{
  public:
  virtual int id() const {return I_CHEESE;}
  virtual void save(PACKFILE*);
  virtual void load(PACKFILE*);
  /* rest of class... */
};
#define I_SWORD 2
class Sword: public Item
{
  public:
  virtual int id() const {return I_SWORD;}
  virtual void save(PACKFILE*);
  virtual void load(PACKFILE*);
  /* rest of class... */
};
class Inventory
{
  public:
  Item* items[10];
  void save(PACKFILE*);
  void load(PACKFILE*);
  /* rest of class... */
};

void Inventory::save(PACKFILE* f)
{
  for (int i=0; i<10; ++i)
  {
    pack_iputl(items[i]->id(), f);
    items[i]->save(f);
  }
  /* save rest of this class */
}
void Inventory::load(PACKFILE* f)
{
  for (int i=0; i<10; ++i)
  {
    Item* it;
    switch(pack_igetl(f))
    /* which class is coming up in the file? */
    {
      case I_RUBY:
        it=new Ruby();
        break;
      case I_SWORD:
        it=new Sword();
        break;
      case I_CHEESE:
        it=new Cheese();
        break;
      default:
        /* error! */
        it=0;
        break;
    }
    if (it)
      it->load(f);
    items[i]=it;
  }
  /* load rest of this class */
}

That was about the bare minimum to illustrate the point. In reality I would make a static Item* Item::fromID(int) to contain that switch statement.

Conclusions

This article actually turned out quite long! I hope that you can see that the principle is very straightforward; in most cases you will just need to write out the members one by one. You will also have seen some pitfalls, and especially why you shouldn't just write out your structure as a single block. Finally I hope these examples are useful for your own code.
Pete