393. UTF-8 Validation LeetCode Solution

In this guide, you will get 393. UTF-8 Validation LeetCode Solution with the best time and space complexity. The solution to UTF- Validation problem is provided in various programming languages like C++, Java, and Python. This will be helpful for you if you are preparing for placements, hackathons, interviews, or practice purposes. The solutions provided here are very easy to follow and include detailed explanations.

Table of Contents

  1. Problem Statement
  2. Complexity Analysis
  3. UTF- Validation solution in C++
  4. UTF- Validation solution in Java
  5. UTF- Validation solution in Python
  6. Additional Resources
393. UTF-8 Validation LeetCode Solution image

Problem Statement of UTF- Validation

Given an integer array data representing the data, return whether it is a valid UTF-8 encoding (i.e. it translates to a sequence of valid UTF-8 encoded characters).
A character in UTF8 can be from 1 to 4 bytes long, subjected to the following rules:

For a 1-byte character, the first bit is a 0, followed by its Unicode code.
For an n-bytes character, the first n bits are all one’s, the n + 1 bit is 0, followed by n – 1 bytes with the most significant 2 bits being 10.

This is how the UTF-8 encoding would work:

Number of Bytes | UTF-8 Octet Sequence
| (binary)
——————–+—————————————–
1 | 0xxxxxxx
2 | 110xxxxx 10xxxxxx
3 | 1110xxxx 10xxxxxx 10xxxxxx
4 | 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx

x denotes a bit in the binary form of a byte that may be either 0 or 1.
Note: The input is an array of integers. Only the least significant 8 bits of each integer is used to store the data. This means each integer represents only 1 byte of data.

Example 1:

Input: data = [197,130,1]
Output: true
Explanation: data represents the octet sequence: 11000101 10000010 00000001.
It is a valid utf-8 encoding for a 2-bytes character followed by a 1-byte character.

See also  103. Binary Tree Zigzag Level Order Traversal LeetCode Solution

Example 2:

Input: data = [235,140,4]
Output: false
Explanation: data represented the octet sequence: 11101011 10001100 00000100.
The first 3 bits are all one’s and the 4th bit is 0 means it is a 3-bytes character.
The next byte is a continuation byte which starts with 10 and that’s correct.
But the second continuation byte does not start with 10, so it is invalid.

Constraints:

1 <= data.length <= 2 * 104
0 <= data[i] <= 255

Complexity Analysis

  • Time Complexity: O(n)
  • Space Complexity: O(1)

393. UTF-8 Validation LeetCode Solution in C++

class Solution {
 public:
  bool validUtf8(vector<int>& data) {
    int followedBytes = 0;

    for (const int d : data)
      if (followedBytes == 0) {
        if ((d >> 3) == 0b11110)
          followedBytes = 3;
        else if ((d >> 4) == 0b1110)
          followedBytes = 2;
        else if ((d >> 5) == 0b110)
          followedBytes = 1;
        else if ((d >> 7) == 0b0)
          followedBytes = 0;
        else
          return false;
      } else {
        if ((d >> 6) != 0b10)
          return false;
        --followedBytes;
      }

    return followedBytes == 0;
  }
};
/* code provided by PROGIEZ */

393. UTF-8 Validation LeetCode Solution in Java

class Solution {
  public boolean validUtf8(int[] data) {
    int followedBytes = 0;

    for (final int d : data)
      if (followedBytes == 0) {
        if ((d >> 3) == 0b11110)
          followedBytes = 3;
        else if ((d >> 4) == 0b1110)
          followedBytes = 2;
        else if ((d >> 5) == 0b110)
          followedBytes = 1;
        else if ((d >> 7) == 0b0)
          followedBytes = 0;
        else
          return false;
      } else {
        if ((d >> 6) != 0b10)
          return false;
        --followedBytes;
      }

    return followedBytes == 0;
  }
}
// code provided by PROGIEZ

393. UTF-8 Validation LeetCode Solution in Python

class Solution:
  def validUtf8(self, data: list[int]) -> bool:
    followedBytes = 0

    for d in data:
      if followedBytes == 0:
        if (d >> 3) == 0b11110:
          followedBytes = 3
        elif (d >> 4) == 0b1110:
          followedBytes = 2
        elif (d >> 5) == 0b110:
          followedBytes = 1
        elif (d >> 7) == 0b0:
          followedBytes = 0
        else:
          return False
      else:
        if (d >> 6) != 0b10:
          return False
        followedBytes -= 1

    return followedBytes == 0
# code by PROGIEZ

Additional Resources

See also  558. Logical OR of Two Binary Grids Represented as Quad-Trees LeetCode Solution

Happy Coding! Keep following PROGIEZ for more updates and solutions.