Basic Concepts - Simplified Learning

In this tutorial we can learn about basic Concepts of Google Protocol buffers.

Defining A Message Type

Every .proto file starts with a package declaration, which helps to prevent naming conflicts between different projects. Basically, you will define how you want your data to be structured, using a message format, in a .proto file. This file is consumed by the protocol buffer compiler (protoc) which will generate a Java class with getter and setter methods so that you can serialize and deserialize Java objects to and from a variety of streams. You can define a message type in .proto as follows

message User {
   required string name = 1;
   required int32 id = 2;
   optional string email = 3;
}

The message format is very straightforward. Each message type has one or more uniquely numbered fields Nested message types have their own set of uniquely numbered fields. Value types can be numbers, Booleans, strings, bytes, collections and enumerations (inspired in the Java enum). Also, you can nest other message types, allowing you to structure your data hierarchically in much the same way JSON allows you to.

Specifying Field Types

Fields can be specified as optional, required, or repeated. Don’t let the type of the field (e.g enum, int32, float, string, etc) confuse you when implementing protocol buffers in Python. The types in the field are just hints to protoc about how to serialize a fields value and produce the message encoded format of your message. The encoded format looks a flatten and compressed representation of your object. You would write this specification the exact same way whether you are using protocol buffers in Python, Java, or C++.

In the above example, all the fields are scalar types: two strings and one int. However, you can also specify composite types for your fields, including enumerations and other message types. The scalar types are any one of the following double, float, int32, int64, uint32, uint64, sint32, sint64, fixed32, fixed64, sfixed32, sfixed64, bool, string, bytes.

Assigning Tags

Each field in the message definition has a unique numbered tag. These tags are used to identify your fields in the message binary format, and should not be changed once your message type is in use. Note that tags with values in the range 1 through 15 take one byte to encode, including the identifying number and the field’s type (you can find out more about this in Protocol Buffer Encoding). Tags in the range 16 through 2047 take two bytes. So you should reserve the tags 1 through 15 for very frequently occurring message elements.

The smallest tag number you can specify is 1, and the largest is 229 – 1, or 536,870,911. You also cannot use the numbers 19000 though 19999 as they are reserved for the Protocol Buffers implementation – the protocol buffer compiler will complain if you use one of these reserved numbers in your .proto. Similarly, you cannot use any previously reserved tags.

Specifying Field Rules

You specify that message fields are one of the following

1. required

For required fields, the initial value must be provided, otherwise the field is not initialized.

2. optional

For optional fields, if not initialize, then a default value will be assigned to the field, of course, you can specify a default value, as defined in the above proto PhoneType field types.

3. repeated

This field can be repeated any number of times (including zero) in a well-formed message. The order of the repeated values will be preserved.

For historical reasons, repeated fields of scalar numeric types aren’t encoded as efficiently as they could be. New code should use the special option [packed=true] to get a more efficient encoding.

For example:

repeated int32 samples = 4 [packed=true];

Adding More Message Types

Multiple message types can be defined in a single .proto file. This is useful if you are defining multiple related messages

Adding Comments

To add comments to your .proto files, use C/C++-style //

message User {
   required string name = 1;
   required int32 id = 2; // Id of the User
   optional string email = 3; // Email of the User
}

What’s Generated From Your .proto?

When you run the protocol buffer compiler on a .proto, the compiler generates the code in your chosen language you’ll need to work with the message types you’ve described in the file, including getting and setting field values, serializing your messages to an output stream, and parsing your messages from an input stream.

For C++, the compiler generates a .h and .cc file from each .proto, with a class for each message type described in your file.

For Java, the compiler generates a .java file with a class for each message type, as well as a special Builder classes for creating message class instances.

Python is a little different – the Python compiler generates a module with a static descriptor of each message type in your .proto, which is then used with a meta class to create the necessary Python data access class at runtime.

For Go, the compiler generates a .pb.go file with a type for each message type in your file.

Enumerations

When you’re defining a message type, you might want one of its fields to only have one of a predefined list of values.

E.g

enum PhoneType {
   MOBILE = 0;
   HOME = 1;
   WORK = 2;
}

Basic Concepts