MQTT C Client Libraries Internals
Macros | Functions | Variables
utf-8.c File Reference

Functions for checking that strings contain UTF-8 characters only. More...

#include "utf-8.h"
#include <stdlib.h>
#include <string.h>
#include "StackTrace.h"
Include dependency graph for utf-8.c:

Macros

#define ARRAY_SIZE(a)   (sizeof(a) / sizeof(a[0]))
 Macro to determine the number of elements in a single-dimension array.
 

Functions

static const char * UTF8_char_validate (int len, const char *data)
 Validate a single UTF-8 character. More...
 
int UTF8_validate (int len, const char *data)
 Validate a length-delimited string has only UTF-8 characters. More...
 
int UTF8_validateString (const char *string)
 Validate a null-terminated string has only UTF-8 characters. More...
 

Variables

struct {
   int   len
 number of elements in the following array (1 to 4)
 
   struct {
      char   lower
 lower limit of valid range
 
      char   upper
 upper limit of valid range
 
   }   bytes [4]
 up to 4 bytes can be used per character
 
valid_ranges []
 Structure to hold the valid ranges of UTF-8 characters, for each byte up to 4. More...
 

Detailed Description

Functions for checking that strings contain UTF-8 characters only.

See page 104 of the Unicode Standard 5.0 for the list of well formed UTF-8 byte sequences.

Function Documentation

◆ UTF8_char_validate()

static const char * UTF8_char_validate ( int  len,
const char *  data 
)
static

Validate a single UTF-8 character.

Parameters
lenthe length of the string in "data"
datathe bytes to check for a valid UTF-8 char
Returns
pointer to the start of the next UTF-8 character in "data"

◆ UTF8_validate()

int UTF8_validate ( int  len,
const char *  data 
)

Validate a length-delimited string has only UTF-8 characters.

Parameters
lenthe length of the string in "data"
datathe bytes to check for valid UTF-8 characters
Returns
1 (true) if the string has only UTF-8 characters, 0 (false) otherwise
Here is the call graph for this function:

◆ UTF8_validateString()

int UTF8_validateString ( const char *  string)

Validate a null-terminated string has only UTF-8 characters.

Parameters
stringthe string to check for valid UTF-8 characters
Returns
1 (true) if the string has only UTF-8 characters, 0 (false) otherwise
Here is the call graph for this function:

Variable Documentation

◆ valid_ranges

struct { ... } valid_ranges[]
Initial value:
=
{
{1, { {00, 0x7F} } },
{2, { {0xC2, 0xDF}, {0x80, 0xBF} } },
{3, { {0xE0, 0xE0}, {0xA0, 0xBF}, {0x80, 0xBF} } },
{3, { {0xE1, 0xEC}, {0x80, 0xBF}, {0x80, 0xBF} } },
{3, { {0xED, 0xED}, {0x80, 0x9F}, {0x80, 0xBF} } },
{3, { {0xEE, 0xEF}, {0x80, 0xBF}, {0x80, 0xBF} } },
{4, { {0xF0, 0xF0}, {0x90, 0xBF}, {0x80, 0xBF}, {0x80, 0xBF} } },
{4, { {0xF1, 0xF3}, {0x80, 0xBF}, {0x80, 0xBF}, {0x80, 0xBF} } },
{4, { {0xF4, 0xF4}, {0x80, 0x8F}, {0x80, 0xBF}, {0x80, 0xBF} } },
}

Structure to hold the valid ranges of UTF-8 characters, for each byte up to 4.