String Data Types

Strings are an essential part of programming, and proper manipulation of strings is an essential skill for any programmer. You will need to use strings in virtually every program you will write, from games to data entry programs. Strings are important because humans use strings to communicate concepts and ideas. A menu in a game communicates the various game options to the user. An address stored in a database communicates the location of the person who lives at that address. Programs are solutions to problems, and in most cases, must communicate that solution to the user through the use of strings.

Computers however, only understand numbers. The string “¡FreeBasic está fresco!“ has no meaning to the computer. For the very first computers this did not matter much, since those early computers were just glorified calculators, albeit calculators that took up a whole room. However it soon became evident to computer engineers that in order for the computer to be a useful tool, it needed to be able to somehow recognize string data, and be able to manipulate that string data.

Since computers only understand numbers, the solution was to convert alpha-numeric characters to numbers via translation tables. The familiar ASCII code table is one such translation scheme that takes a set of alpha-numeric characters and converts them to numbers. The “A” character is encoded as decimal 65, the exclamation point as decimal 33 and the number 1 is encoded as decimal 49. When you press any key on your keyboard, a scancode for that key is generated and stored in the computer as a number.

Humans group letters together to form words, and words to form sentences. A certain arrangement of letters and words mean something according to the semantic rules the writer's language. When you read the string “FreeBasic is cool!” you understand the meaning of the words if you can read the English language. The computer however doesn't know what FreeBasic is and doesn't know what it means to be cool. When you read that something is “cool” you know that this is slang for great or excellent. All the computer can do is store the string in memory in a way that preserves the format of the string so that a human reading the string will understand its meaning.

While computers have grown in both speed and capacity, the basic computing processes haven't changed since the 1950's. The next revolution in computing won't be quantum processors or holographic memory; it will be when computers can understand language.

The solution to the storage problem was to simply store string data in memory as a sequence of bytes, and terminate that sequence of bytes with a character 0. To put it another way, a string in computer memory is an array of characters, terminated with a character 0 to signal the end of the string.

Virtually all early programming languages, and many modern ones, have no native String data type. A string in C is a Null (character 0) terminated array of Char, where Char can be interpreted as both a character and number. While this scheme accurately reflects the internal structure of a string in memory, it is hard to work with and error prone, and doesn't reflect the way humans work with string data.

A better solution to the problem was the creation of the native String data type. The internal structure of a string has not changed, it is still stored in memory as a sequence of bytes terminated by a Null character, but the programmer can interact with string data in a more natural, humanistic way.

FreeBasic has four intrinsic string data types, listed in the following table.

String Type Declaration Purpose
Dynamic String Dim myString as String string that is dynamically allocated and automatically resized
Fixed Length String Dim myString as String * length fixed-length string
Zstring Dim myString as Zstring * length fixed-length, Null terminated character string
Zstring Dim myString as Zstring Ptr fixed-length, Null terminated character string
Wstring Dim myString as Wstring * length fixed-length, Null terminated string used with Unicode functions
Wstring Dim myString as Wstring Ptr fixed-length, Null terminated string used with Unicode functions

Dynamic Strings

Dynamic strings are variable-length strings that the compiler automatically allocates and resizes as needed. Dynamic strings are actually structures stored in memory, called a string descriptor, that contain a pointer to the string data, the length of the string data and the size of the string allocation. Dynamic strings are allocated in 36 byte blocks, which reduces the amount of resizing that the compiler must do if the string size changes. The actual string data is stored in memory, and can be any size up to 2 GB or available memory.

Dynamic strings use Zstrings strings internally so that you can pass a dynamic string to a function that expects a Null terminated string and the function will work correctly. The Windows api, the C runtime library and most third-party libraries written in C expect a Null terminated string as a string parameter.

In other versions of Basic, strings have been used to load and manipulate binary data. If the binary data contains a Null character, this could cause problems with FreeBasic strings, since the Null character is used to terminate a string. While it is possible to have an embedded Null in a dynamic string (since dynamic strings have a string descriptor), if you pass this data to a third party library function that is expecting a C-style string, the data will read only up to the Null character, resulting in data loss. Instead of using strings to read binary data, you should byte arrays instead.

Since a dynamic string is actually a pointer to a string descriptor, you cannot use dynamic strings in type definitions that you are going to save to a file. You should use a fixed length string instead.

Fixed Length Strings

Fixed length strings are defined using a length parameter, and can only hold strings that are less than or equal to the defined size. Trying to initialize a fixed length string with data that is longer than the defined size will result in the string being truncated to fit th defined size.

Zstrings

Zstrings are Null terminated, C-style strings. The main purpose of a Zstring is to interface with third-party libraries that expect C-style strings, however they are useful even if you do not plan to pass them to third-party libraries. Zstrings can be defined just like a fixed-length string, Dim myZstring as Zstring * 10, and FreeBasic will handle them just like fixed strings, automatically truncating data to fit the defined size.

Unlike fixed strings however, Zstrings can be dynamically managed by declaring a Zstring pointer and using the associated pointer memory functions. When using a dynamically allocated Zstring, you must be careful not to overwrite the the size of the string, as this will overwrite parts of memory not contained in the string and may cause the program or even the operating system to crash.

When using either type of Zstring, FreeBasic will manage the terminating Null character for you, but the storage size of a Zstring will be 1 less than the defined size since the the Null character occupies the last character position. When calculating the size of a Zstring be sure to add 1 to the value to account for the Null terminator. You must also not include the character 0 in any of your data, or the data will be truncated since FreeBasic will see the Null character as the end of the string.

Wstrings

Dynamic, fixed and Zstrings all use 1 byte per character. Wstrings, also called wide strings, use 2 bytes per character and are generally used in conjunction with Unicode strings functions. Unicode is, strictly speaking, a character coding scheme designed to associate a number for every character of every language used in the world today, as well as some ancient languages that are of historical interest. In the context of developing an application, Unicode is used to internationalize a program so that the end-user can view the program in their native language.

Wstrings can be both fixed length and dynamic and are similar to Zstrings.

String Functions

Creating and using strings in your application will often consist of manipulating those strings and FreeBasic has a rich set of string functions, which are listed in the following table.

Function Syntax Comment
Asc B = Asc(string) Returns the character code of the first character in a string as a uninteger, or the character code of character at position.
Asc B = Asc(string, position) Returns the character code of the first character in a string as a uninteger, or the character code of character at position.
Bin B = Bin(number) Returns the binary form of number as a string. Can optionally return only number of digits.
Bin B = Bin(number, digits) Returns the binary form of number as a string. Can optionally return only number of digits.
Chr 1. B = Chr(code) Returns the character represented by Ascii Code. If multiple codes are passed to Chr, the function will return a string of characters.
Chr B = Chr(code, code, …) Returns the character represented by Ascii Code. If multiple codes are passed to Chr, the function will return a string of characters.
Hex B = Hex(number) Returns the hexadecimal form of number as a string. Can optionally return only number of digits.
Hex B = Hex(number, digits) Returns the hexadecimal form of number as a string. Can optionally return only number of digits.
Instr 1. B = Instr(string, substring) Returns the position of substring within string as an integer. Will accept an optional start position. If substring is not found, 0 is returned.
Instr B = Instr(start, string, substring) Returns the position of substring within string as an integer. Will accept an optional start position. If substring is not found, 0 is returned.
Lcase B = Lcase(string) Converts string to all lowercase.
Left B = Left(string, number) Returns the leftmost number of characters from string.
Len B = Len(string) Returns the length of a string or the length of a numeric data type.
Len B = Len(data_type) Returns the length of a string or the length of a numeric data type.
Lset Lset(string_variable, string) Left justifies string within string variable.
Ltrim B = Ltrim(string) The first format will trim all spaces from left side of string.
Ltrim B = Ltrim(string, trimset) The second format will trim characters from left side of string if they exactly match trimset.
Ltrim B = Ltrim(string, ANY trimset) The third format will trim characters from left side of string if they match any in trimset.
Mid (Function) B = Mid(string, start) Returns a substring from string starting at start to the end of the string, or of length.
Mid (Function) B = Mid(string, start, length) Returns a substring from string starting at start to the end of the string, or of length.
Mid (Statement) Mid(string, start) = B Copies contents of B into string starting at start for length. The current characters in string are replaced. If no length is given, all of B is inserted.
Mid (Statement) Mid(string, start, length) = B Copies contents of B into string starting at start for length. The current characters in string are replaced. If no length is given, all of B is inserted.
Oct B = Oct(number) Returns the octal form of number as a string. Can optionally return only number of digits.
Oct B = Oct(number, digits) Returns the octal form of number as a string. Can optionally return only number of digits.
Right B = Right(string, number) Returns the rightmost number of characters from string.
Rset Rset(string_variable, string) Right justifies string within string variable.
Rtrim B = Rtrim(string) The first format will trim all spaces from right side of string.
Rtrim B = Rtrim(string, trimset) The second format will trim characters from right side of string if they exactly match trimset.
Rtrim B = Rtrim(string, ANY trimset) The third format will trim characters from right side of string if they match any in trimset.
Space B = Space(number) Returns a string with number of spaces.
String B = String(number, code) String will return a string with number of characters that correspond to the ascii character code or the first character of string.
String B = String(number, string) String will return a string with number of characters that correspond to the ascii character code or the first character of string.
Trim B = Trim(string) The first format will trim all spaces from left and right side of string.
Trim B = Trim(string, trimset) The second format will trim characters from left and right side of string if they exactly match trimset.
Trim B = Trim(string, ANY trimset) The third format will trim characters from left and right side of string if they match any in trimset.
Ucase B = Ucase(string) Converts string to all uppercase.

The MK* and CV* String Functions

There are times when you will want to save numeric data to a disk file, such as when you are creating your own database system. Saving numeric data as strings can be problematic since the string representation of the data can vary. Once solution to this problem is to use the various MK* functions which convert the binary representation of a number into a string, and CV* functions which convert the string back into a number.

The advantage of using these functions is consistent numeric representation; an integer is converted into a 4-byte string, a double is converted into an 8-byte string. This makes saving and reading binary data from the disk quite easy. The following table lists the MK* and CV* functions.

Function Syntax Comment
Mkd B = Mkd(number) Converts a double-type number to a string with length of 8 bytes.
Mki B = Mki(number) Converts an integer-type number to a string with length of 4 bytes.
Mkl B = Mkl(number) Converts a long-type number to a string with length of 4 bytes.
Mklongint B = Mklongint Converts a longint-type number to a string with length of 8 bytes.
Mks B = Mks(number) Converts a single-type number to a string with length of 4 bytes.
Mkshort B = Mkshort Converts a short-type number to a string with length of 2 bytes.
Cvd B = Cvd(string) Converts an 8 byte string created with Mkd into a double-type number.
Cvi B = Cvi(string) Converts a 4 byte string created with Mki into an integer-type number.
Cvl B = Cvl(sring) Converts a 4 byte string created with Mkl into an integer-type number.
Cvlongint B = Cvlongint(string) Converts an 8 byte string created with Mklongint into a longint-type number.
Cvs B = Cvs(string) Converts a 4 byte string created with Mks into a single-type number.
Cvshort B = Cvshort(string) Converts a 2 byte string created with Mkshort into a short-type number.

Numeric String Conversion Functions

You will find as you write programs that there are instances where you need to convert a text string such as “124.5” into a number, and the number 124.5 into a string. FreeBasic has several conversion functions that can be used to accomplish these tasks.

Function Syntax Comment
Format B = Format(number, format_string) Returns a formatted number. You must include "vbcompat.bi" in your program to use this function.
Str B = Str(number) Converts a numeric expression to a string representation. That is, 145 will become “145”.
Val B = Val(string) Converts a string to a double value. The Val functions will convert from left to right, ending at the first non-numeric character.
Valint B = Valint(string) Converts a string to an integer value.
Vallng B = Vallng(string) Converts a string to a long integer value.
Valuint B = Valuint(string) Converts a string to a unsigned integer value.
Valulong B = Valulong(string) Converts a string to an unsigned long integer value.

These functions work just like the Mk* and Cv* functions, except that these functions work with text representations of the numeric values rather than binary representations. A common usage of these functions is in reading text files that contain text numbers, such as ini files, and the text needs to be converted to a number, and then back to a string for output to the disk. They are also useful for getting input from the user.

Wide String Functions

Since wide strings contain 16-bit characters, there are a few string functions that work specifically with wide strings. The functions listed in the following table behave in the same manner as their 8-bit counterparts.

Function Syntax Comment
Wbin B = Wbin(number) Returns the binary form of number as a wide string. Can optionally return only number of digits.
Wbin B = Wbin(number, digits) Returns the binary form of number as a wide string. Can optionally return only number of digits.
Wchr B = Wchr(unicode) Returns the character represented by Unicode. If multiple codes are passed to Wchr, the function will return a string of unicode characters.
Wchr B = Wchr(unicode, unicode, …) Returns the character represented by Unicode. If multiple codes are passed to Wchr, the function will return a string of unicode characters.
Whex B = Whex(number) Returns the hexadecimal form of number as a wide string. Can optionally return only number of digits.
Whex B = Whex(number, digits) Returns the hexadecimal form of number as a wide string. Can optionally return only number of digits.
Woct B = Woct(number) Returns the octal form of number as a wide string. Can optionally return only number of digits.
Woct B = Woct(number, digits) Returns the octal form of number as a wide string. Can optionally return only number of digits.
Wspace B = Wspace(number) Returns a wide string with number of spaces.
Wstr B = Wstr(number) The first form of Wstr will return a wide string resprestation of a number. The second form will convert an ascii string to a Unicode string.
Wstr B = Wstr(ascii_string) The first form of Wstr will return a wide string resprestation of a number. The second form will convert an ascii string to a Unicode string.
Wstring B = Wstring(number, code) String will return a wide string with number of characters that correspond to the ascii character code or the first character of string.
Wstring B = Wstring(number, string) String will return a wide string with number of characters that correspond to the ascii character code or the first character of string.

String Operators

There are two string operators & and + which concatenate two or more strings together. & is preferred over + since & will automatically convert the operands to strings, where + will not.