IB Computer Science Unit 1
IB Computer Science Unit 2
IB Computer Science Unit 3
IB Computer Science Unit 4
IB Computer Science Unit 5
IB Computer Science Unit 6

IB Computer Science Advanced Data Parsing in Java

IB Computer Science Learning Goals

In this IB Computer Science lesson you will be learning about:

  • How to use Tokens and Delimiters to separate parts Strings into individual pieces

Parsing

Parsing refers to extracting different parts of information from a single String.

For example, suppose you had a String, time, that contains a time such as “9:23AM” or “12:45PM”. Let’s say you wanted to extract the hour, minute, and AM or PM from that String

IB Computer Science Java Data Parsing Substring

Tokens and Delimiters

The previous example is a poor way of doing String Parsing.

Often an input String contains several groups of characters, where each group acts as a unit that has significance. Such a group is called a token. Tokens are separated from each other by delimiter characters. You may have heard of tab delimited or comma delimited files before

Example: 12 8 5 32

  • You would automatically ground these characters to form 4 integers separated by spaces
  • The tokens in this case are 12 and 8 and 5 and 32.
  • These tokens are delimited by the space character

What makes a token and what makes a delimiter depends on the application

Example: 643,983,104

How many tokens are there? What are the delimiters?

  • A good answer might be: DEPENDS ON THE CONTEXT
  • This could be the tokens 643 and 983 and 104 separated by the delimiter character comma
  •  This could be a single token representing a large integer with the thousands and millions marked with commas
  •  This could be something completely different

The StringTokenizer Class breaks a String into tokens.

You can use its default delimiters, or you can specify your own.

  • The default delimiters are space, tab, the new line character, and carriage return character

The StringTokenizer Class is found in java.util which must be imported for the program to run

  • To use this StringTokenizer Class you must create an object from this class
  •  There are 3 different ways to create this object, depending on which options you want to use with the StringTokenizer object

 

Creating Tokens and Delimiters

The following example creates a StringTokenizer t object for a string based on the default delimiters, which is this case will be the space characters

The tokens for str1 are 12 and 8 and 5 and 32

The following example creates a StringTokenizer t object for a string based on the delimiters of the space character or the comma or the slash character

IB Computer Science String Tokenizer Characters

The tokens for str1 are still 12 and 8 and 5 and 32

The following example creates a StringTokenizer t object for a string based on the delimiters of the space character or the comma or the slash character but also returns the delimiters as tokens

The tokens for str1 are 12 and space and 8 and / and 5 and , and 32

Methods for using Tokens and Delimeters

Now that you know how to set up a StringTokenizer object that breaks a string down into tokens, it would be useful to know how to actually access those tokens so they can be manipulated in your program. The following are some of the useful methods in the StringTokenizer Class

  • nextToken( )- Returns a String of the next Token in the list
  •  countTokens ( ) – Returns the number of tokens remaining.
  • hasMoreTokens ( ) – Returns true if there are more tokens remaining

The following program asks the user to enter a sentence and prints the words of that sentence 1 per line.

The following program takes 5 comma delimited marks from a data file and prints out the maximum mark from each data set. For simplicity lets say there are 3 data sets in the file.

Sample Input:

1,2,3,4,5

10,9,8,7,6

12,14,15,13,11

Sample Output:

5

10

15

IB Computer Science Java StringTokenizer File IO

IB Computer Science Practice

1. Letter Frequency
Write a program that does the following:

  •  Finds how many occurrences of each letter occurs in a given sentence. Spaces don’t need to
    be counted.
  • Display the letter that appears most frequently.

2. Random Word
Write a program that asks the user for a bunch of words, all separated by spaces. Display a
random word from the sentence on the screen

3. camelCaseNaming
When we write programs we use the camel case naming convention. firstName, surfaceArea, etc.
The first word is lowercase, and all the following words start with a capital letter with no spaces
between the words. Write a program that asks the user for several words and creates the camel case variable name for those words.

Example:
“Welcome to Class” becomes “welcomeToClass”
“THE QUICK BROWN FOX” becomes “theQuickBrownFox”
“HellO HOW aRe you” becomes “helloHowAreYou”

4. Palindrome
A palindrome is a word that appears the same forward and backward. Example: RADAR. You can
create a palindrome from any word by reversing the word and adding that reversed word to the
end of the original word. Write a program that reads 10 sentences from a file. 1 sentence per line
and outputs the words of each sentence as a palindrome. Make sure to check if a word is already a palindrome

5. Radius 

Given the Cartesian plane coordinates for the center of the circle (x1, y1) and a point on the circle’s circumference (x2, y2),
calculate the area of the circle.

The input for this program will contain five lines of data from an external text file. Each line will
contain 4 real values representing the coordinates x1, y1, x2, y2. Each value will be separated by a
single space. You can output the final results to the screen.

 

Looking to Learn More about Computer Science and Coding?

Check out our programing in python courses that focus on high school level coding.  

  • Grade 11 Computer Science ICS3U
  • Grade 12 Computer Science ICS4U

These courses are complete with interactive coding lessons, teacher led videos, and more practice questions with complete solutions

Return To International Baccalaureate Computer Science Main Page