>> lecture 6 >> -- strings and regular expressionsd00922011/matlab/306/20181212.pdf ·...

65
1 >> Lecture 6 2 >> 3 >> -- Strings and Regular Expressions 4 >> Zheng-Liang Lu 268 / 332

Upload: others

Post on 29-Oct-2019

5 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

1 >> Lecture 62 >>3 >> -- Strings and Regular Expressions4 >>

Zheng-Liang Lu 268 / 332

Page 2: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

(Most) Common Codec: ASCII2

� Everything in the computer is encoded in binary.

� ASCII is a character-encoding scheme originally based on theEnglish alphabet that encodes 128 specified characters intothe 7-bit binary integers (see the next page).

� Unicode1 became a standard for the modern systems from2007.

� Unicode is backward compatible with ASCII because ASCII is asubset of Unicode.

1See Unicode 8.0 Character Code Charts.2Codec: coder-decoder; ASCII: American Standard Code for Information

Interchange, also see http://zh.wikipedia.org/wiki/ASCII.Zheng-Liang Lu 269 / 332

Page 3: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Zheng-Liang Lu 270 / 332

Page 4: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Characters and Strings (Revisited)

� In general, a string is a sequence of characters, just as anumeric array is a sequence of numbers.

� For example, ’ntu’.

� A string array could be done with a cell array which containsseveral strings.

� For example, {’ntu’, ’csie’}.� Most built-in functions can be applied to string arrays.

1 clear; clc;2

3 s1 = 'ntu'; s2 = 'csie';4 s = {s1, s2};5 upper(s) % output: {'NTU'} {'CSIE'}

Zheng-Liang Lu 271 / 332

Page 5: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

� Since R2017a, you can create a string by enclosing a piece oftext in double quotes.3

� For example, ”ntu”.

� The main difference between characters and strings can befound below:

1 clear; clc;2

3 s1 = 'ntu'; s2 = 'NTU';4 s1 + s2 % output: 188 200 2025

6 s3 = string(s1); s4 = string(s2);7 s3 + s4 % output: "ntuNTU"

3See https://www.mathworks.com/help/matlab/ref/string.html.Zheng-Liang Lu 272 / 332

Page 6: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Selected Text Operations4

sprintf Format data into string

strcat Concatenate strings horizontally

contains Determine if pattern is in string

count Count occurrences of pattern in string

endsWith Determine if string ends with pattern

startsWith Determine if string starts with pattern

strfind Find one string within another

replace Find and replace substrings in string array

split Split strings in string array

strjoin Join text in array

lower Convert string to lowercase

upper Convert string to uppercase

reverse Reverse order of characters in string

4See https:

//www.mathworks.com/help/matlab/characters-and-strings.html.Zheng-Liang Lu 273 / 332

Page 7: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Example: Caesar Cipher

� Caesar cipher is one of substitution cipher in which each letterin the plaintext is replaced by a letter some fixed number ofpositions down the alphabet.5

� Write a program which encrypts an input string by Caesarcipher algorithm.

� Try not to use loops.

5See https://en.wikipedia.org/wiki/Caesar_cipher.Zheng-Liang Lu 274 / 332

Page 8: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Introduction to Regular Expressions6

� A regular expression, also called a pattern, is an expressionused to specify a set of strings required for a particularpurpose.

� Check this: https://regexone.com.

6See https://en.wikipedia.org/wiki/Regular_expression; also seehttps://www.mathworks.com/help/matlab/ref/regexp.html.

Zheng-Liang Lu 275 / 332

Page 9: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Example

1 >> text = 'bat cat can car coat court CUT ct ...CAT-scan';

2 >> pattern = 'c[aeiou]+t';3 >> start idx = regexp(text, pattern)4

5 start idx =6

7 5 17

� The pattern ’c[aeiou]+t’ indicates a set of strings:� c must be the first character;� c must be followed by one of the characters in the brackets

[aeiou], followed by t as the last character;� in particular, [aeiou] must occur one or more times, as

indicated by the + operator.

Zheng-Liang Lu 276 / 332

Page 10: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Formalisms

Operator Definition

| Boolean OR

* 0 or more times consecutively

? 0 times or 1 time

+ 1 or more times consecutively

{n} exactly n times consecutively

{m, } at least m times consecutively

{, n} at most n times consecutively

{m, n} at least m times, but no more than n times consecutively

Zheng-Liang Lu 277 / 332

Page 11: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Operator Definition

. any single character, including white space

[c1c2c3] any character contained within the brackets

[∧c1c2c3] any character not contained within the brackets

[c1-c2] any character in the range of c1 through c2\s any white-space character

\w a word; any alphabetic, numeric, or underscore character

\W not a word

\d any numeric digit; equivalent to [0-9]

\D no numeric digit; equivalent to [∧0-9]

Zheng-Liang Lu 278 / 332

Page 12: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Output Keywords

Keyword Output

’start’ starting indices of all matches, by default

’end’ ending indices of all matches

’match’ text of each substring that matches the pattern

’tokens’ text of each captured token

’split’ text of nonmatching substrings

’names’ name and text of each named token

Zheng-Liang Lu 279 / 332

Page 13: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Examples

1 clear; clc;2

3 text1 = {'Madrid, Spain', 'Romeo and Juliet', ...'MATLAB is great'};

4 tokens = regexp(text1, '\s', 'split')5

6 text2 = 'EXTRA! The regexp function helps you ...relax.';

7 matches = regexp(text2, '\w*x\w*', 'match')

Zheng-Liang Lu 280 / 332

Page 14: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Exercise: Processing Multiple Files (Revisited)

1 clear; clc;2

3 file list = dir;4 filenames = {file list(:).name};5 A = regexp(filenames, ...

'[A-Za-z0-9]*\ *[A-Za-z0-9]*\.m', 'match');6 mask = cellfun(@(x) ~isempty(x), A);7 cellfun(@(f) fprintf('%s\\%s\n', pwd, f{:}), ...

A(mask));

Zheng-Liang Lu 281 / 332

Page 15: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Example: Names

� You can associate names with tokens so that they are moreeasily identifiable.

� For example,

1 >> str = 'Here is a date: 01-Apr-2020';2 >> expr = '(?<day>\d+)-(?<month>\w+)-(?<year>\d+)';3 >> mydate = regexp(str, expr, 'names')4

5 mydate =6

7 day: '01'8 month: 'Apr'9 year: '2020'

Zheng-Liang Lu 282 / 332

Page 16: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Exercise: Web Crawler

� Write a script which collects the names of HTML tags bydefining a token within a regular expression.

� For example,

1 >> str = '<title>My Title</title><p>Here is some ...text.</p>';

2 >> pattern = '<(\w+).*>.*</\1>';3 >> [tokens, matches] = regexp(str, pattern, ...

'tokens', 'match')

Zheng-Liang Lu 283 / 332

Page 17: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

More Regexp Functions

� See regexpi, regexprep, and regexptranslate.

Zheng-Liang Lu 284 / 332

Page 18: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

1 >> Lecture 72 >>3 >> -- Matrix Computation4 >>

Zheng-Liang Lu 285 / 332

Page 19: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Vectors

� Let R be the set of all real numbers.

� Rn denotes the vector space of all m-by-1 column vectors:

u = (ui ) =

u1...um

. (1)

� You can simply use the colon (:) operator to reshape any arrayin a column major, say u(:).

� Similarly, the row vector v is

v = (vi ) =[v1 · · · vn

]. (2)

� Normally, we use column vectors unless stated.

Zheng-Liang Lu 286 / 332

Page 20: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Matrices

� Mm×n(R) denotes the vector space of all m-by-n real matricesA:

A = (aij) =

a11 · · · a1n...

. . ....

am1 · · · amn

.

� Complex vectors/matrices7 follow similar definitions andoperations introduced later, simply with some care.

7You could replace R by C.Zheng-Liang Lu 287 / 332

Page 21: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Transposition

1 >> A = [1 i];2 >> A' % Hermitian operator; see any textbook for ...

linear algebra3

4 ans =5

6 1.0000 + 0.0000i7 0.0000 - 1.0000i8

9 >> A.' % transposition of A10

11 ans =12

13 1.0000 + 0.0000i14 0.0000 + 1.0000i

Zheng-Liang Lu 288 / 332

Page 22: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Arithmetic Operations

� Let aij and bij be the elements of the matrices A andB ∈Mm×n(R) for 1 ≤ i ≤ m and 1 ≤ j ≤ n.

� Then C = A± B can be calculated by cij = aij ± bij . (Try.)

Zheng-Liang Lu 289 / 332

Page 23: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Inner Product8

� Let u, v ∈ Rm.

� Then the inner product, denoted by u · v , is calculated by

u · v = u′v = [u1 · · · um]

v1...vm

.

1 clear; clc;2

3 u = [1; 2; 3];4 v = [4; 5; 6];5 u' * v % normal way; orientation is important6 dot(u, v) % using the built-in function

8Akaa dot product and scalar product.Zheng-Liang Lu 290 / 332

Page 24: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

� Inner product is also called projection for emphasizing itsgeometric significance.

� Recall that we knowu · v = 0

if and only if these two are orthogonal to each other, denotedby

u ⊥ v .

� You may use norm(u) to calculate the length of u andsubspace(u, v) for the angle between u and v .9

9See https://en.wikipedia.org/wiki/Norm_(mathematics).Zheng-Liang Lu 291 / 332

Page 25: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Generalization of Inner Product

� For simplicity, consider x ∈ R.

� Let f (x) and g(x) be real-valued functions.

� In particular, g(x) is a basis function.10

� Then we can define the inner product of f and g on [a, b] by

〈f , g〉 =

∫ b

af (x)g(x)dx .

10See https://en.wikipedia.org/wiki/Basis_function,https://en.wikipedia.org/wiki/Eigenfunction, andhttps://en.wikipedia.org/wiki/Approximation_theory.

Zheng-Liang Lu 292 / 332

Page 26: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

� For example, Fourier transform is widely used in engineeringand science.

� Fourier integral11 is defined as

F (ω) =

∫ ∞−∞

f (t)e−iωtdt

where f (t) is a square-integrable function.� The Fast Fourier transform (FFT) algorithm computes the

discrete Fourier transform (DFT) in O(n log n) time.12,13

11See https://en.wikipedia.org/wiki/Fourier_transform.12Cooley and Tukey (1965).13See https://en.wikipedia.org/wiki/Fast_Fourier_transform.

Zheng-Liang Lu 293 / 332

Page 27: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Cross Product15

� cross(u, v) returns the cross product of the vectors x and y oflength 3.14

� For example,

1 >> u = [1; 0; 0];2 >> v = [0; 1; 0];3 >> w = cross(u, v) % built-in function4

5 w =6

7 08 09 1

14Actually, only in 3- and 7-dimensional Euclidean spaces.15For example, angular momentum, Lorentz force, and Poynting vector.

Zheng-Liang Lu 294 / 332

Page 28: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Zheng-Liang Lu 295 / 332

Page 29: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Matrix Multiplication

� Let A ∈Mm×q(R) and B ∈Mq×n(R).

� Then C = AB is given by

cij =

q∑k=1

aik × bkj . (3)

� For example,

Zheng-Liang Lu 296 / 332

Page 30: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Example

1 clear; clc;2

3 A = randi(10, 5, 4); % 5-by-44 B = randi(10, 4, 3); % 4-by-35 C = zeros(size(A, 1), size(B, 2));6 for i = 1 : size(A, 1)7 for j = 1 : size(B, 2)8 for k = 1 : size(A, 2)9 C(i, j) = C(i, j) + A(i, k) * B(k, j);

10 end11 end12 end13 C % display C

� Time complexity: O(n3)

� Strassen (1969): O(n2.807355)

Zheng-Liang Lu 297 / 332

Page 31: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Matrix Exponentiation

� Raising a matrix to a power is equivalent to repeatedlymultiplying the matrix by itself.

� For example, A2 = AA.� It implies that A should be square. (Why?)

� The matrix exponential16 is a matrix function on squarematrices analogous to the ordinary exponential function.

� More explicitly,

eA =∞∑n=0

An

n!.

� However, raising a matrix to a matrix power, that is, AB , isnot allowed.

16See matrix exponentials and Pauli matrices.Zheng-Liang Lu 298 / 332

Page 32: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Example

1 clear; clc;2

3 A = [0 1; 1 0];4

5 trial1 = exp(A)6 trial2 = eye(size(A));7 for n = 1 : 108 trial2 = trial2 + X ˆ n / factorial(n);9 end

10 trial3 = exp(1) ˆ A % equivalent to expm(A)

Zheng-Liang Lu 299 / 332

Page 33: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

1 trial1 =2

3 1.0000 2.71834 2.7183 1.00005

6 trial2 =7

8 1.5431 1.17529 1.1752 1.5431

10

11 trial3 =12

13 1.5431 1.175214 1.1752 1.5431

Zheng-Liang Lu 300 / 332

Page 34: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Determinants17

� Consider the matrix

A =

[a bc d

].

� Then det(A) = ad − bc is called the determinant of A.

� Try det(A).

� Recall the determinant calculation in high school.� It is a coincidence: wrong way but correct answer.

� Let’s try the general formula for det(A).

17See http://en.wikipedia.org/wiki/Determinant.Zheng-Liang Lu 301 / 332

Page 35: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Naive Algorithm

1 function y = myDet(A)2

3 [r, ~] = size(A);4

5 if r == 16 y = A;7 elseif r == 28 y = A(1, 1) * A(2, 2) - A(1, 2) * A(2, 1);9 else

10 y = 0;11 for i = 1 : r12 B = A(2 : r, [1 : i - 1, i + 1 : r]);13 cofactor = (-1) ˆ (i + 1) * myDet(B);14 y = y + A(1, i) * cofactor;15 end16 end17 end

Zheng-Liang Lu 302 / 332

Page 36: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

� It needs n! terms in the sum of products, so this algorithmruns in O(n!) time!

� Try this algorithm compared to the function det.

� In practice, the calculation of determinants be done in O(n3)time, say by LU decomposition.18

� Moreover, different decompositions are used to implementefficient matrix algorithms in numerical analysis.19

18See https://en.wikipedia.org/wiki/LU_decomposition.19See https://en.wikipedia.org/wiki/Matrix_decomposition.

Zheng-Liang Lu 303 / 332

Page 37: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Inverse Matrices21

� Let A ∈Mn×n(R) and a linear system Ax = y for x , y ∈ Rn.

� Then A is invertible if there exists B ∈Mn×n(R) such that

AB = BA = In,

where In denotes the n × n identity matrix.� You can use eye(n) to generate an identity matrix In.

� A is invertible if and only if det(A) 6= 0.� Recall the Cramer’s Rule.20

20See https://en.wikipedia.org/wiki/Cramer’s_rule.21See https://en.wikipedia.org/wiki/Invertible_matrix#The_

invertible_matrix_theorem.Zheng-Liang Lu 304 / 332

Page 38: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

� Use inv to calculate the inverse of a matrix.

� However, inv may return a weird result even if the matrix isill-conditioned, indicates how much the output value of thefunction can change for a small change in the inputargument.22

� For example, let

A =

1 2 34 5 67 8 9

.

� Then it is easy to see that

det (A) = 0.

� Try det(A) and inv(A).

22You may refer to the condition number of a function with respect to anargument. Also try rcond.

Zheng-Liang Lu 305 / 332

Page 39: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Digress: First-Order Approximation

� Let f (x) be a nonlinear function which is infinitelydifferentiable at x0.

� By Taylor’s expansion23, we have

f (x) = f (x0) + f ′(x0)(x − x0) + O(∆2x),

where O(∆2x) is the collection of higher-order terms, which

can be neglected as ∆x → 0.

� Then we have a linear approximation

f (x) ≈ f ′(x0)x + k,

with k = f (x0)− x0f′(x0) is a constant.

23See https://en.wikipedia.org/wiki/Taylor_series.Zheng-Liang Lu 306 / 332

Page 40: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Digress: Local Linearization

� For example, we barely feel like the curvature of the ground;however, we watch Earth on the moon and agree that Earth isa sphere.

� Try https:

//www.google.com.tw/search?q=flat+earth&tbm=isch.� Also see https:

//en.wikipedia.org/wiki/Non-Euclidean_geometry.� Riemannian geometry:https://en.wikipedia.org/wiki/Riemannian_geometry.

Zheng-Liang Lu 307 / 332

Page 41: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

� For another example, Newton’s kinetic energy is a low-speedapproximation (classical limit) to Einstein’s total energy.

� Let m0 be the rest mass and v be the velocity relative to theinertial coordinate.

� We know that the total energy of this body is

E =m0c

2√1− (v/c)2

.

� By applying the first-order approximation,

E ≈ m0c2 +

1

2mv2.

� This concept is striking and profound!!

� You can find many many examples in your life.

Zheng-Liang Lu 308 / 332

Page 42: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

System of Linear Equations24

� For example, Kirchhoff’s Laws.

24See https://en.wikipedia.org/wiki/System_of_linear_equations.Zheng-Liang Lu 309 / 332

Page 43: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

General Form

� A general system of m linear equations with n unknowns canbe written as

a11x1 +a12x2 · · · +a1nxn = b1a21x1 +a22x2 · · · +a2nxn = b2

......

. . .... =

...am1x1 +am2x2 · · · +amnxn = bm

where x1, . . . , xn are unknowns, a11, . . . , amn are thecoefficients of the system, and b1, . . . , bm are the constantterms.

Zheng-Liang Lu 310 / 332

Page 44: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Matrix Equation

� Hence we can rewrite the aforesaid equations as follows:

Ax = b.

where

A =

a11 a12 · · · a1na21 a22 · · · a2n

......

. . ....

am1 am2 · · · amn

,

x =

x1...xn

, and b =

b1...bm

.

Zheng-Liang Lu 311 / 332

Page 45: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Linear Transformation25

25See https://en.wikipedia.org/wiki/Linear_map; also seehttps://kevinbinz.com/2017/02/20/linear-algebra/.

Zheng-Liang Lu 312 / 332

Page 46: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Linear Independence

� Let K = {a1, a2, . . . , an} for each ai ∈ Rm.

� Now consider a linear superposition

x1a1 + x2a2 + · · ·+ xnan = 0,

where x1, x2, . . . , xn ∈ R are the coefficients.

� Then K is linearly independent if and only if

x1 = x2 = · · · = xn = 0.

� You can check if K is linearly independent by det(K ).� det(K ) 6= 0 if and only if K is linearly independent.� Recall that det(K ) 6= 0 if and only if K is invertible.

Zheng-Liang Lu 313 / 332

Page 47: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

� Let

K1 =

1

00

,

010

,

001

.

� It is clear that K1 is linearly independent.

� Moreover, you can represent all vectors in R3 if you collect alllinear superpositions from K1.

� We call this new set a span of K1, denoted by Span(K1).26

� Clearly, Span(K1) = R3.

26See https://en.wikipedia.org/wiki/Linear_span.Zheng-Liang Lu 314 / 332

Page 48: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

� Now let

K2 =

1

00

,

010

,

001

,

123

.

� Then K2 is not a linearly independent set. (Why?)

� If you take one or more vectors out of K2, then K2 becomeslinearly independent.

Zheng-Liang Lu 315 / 332

Page 49: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Basis of Vector Space & Its Dimension27

� However, you can take only one out of K2 if you want torepresent all vectors in R3. (Why?)

� The dimension of R3 is exactly the size of K2.

� We say that the basis of Rn is a maximally linearlyindependent set of size n.

� For example, K1 could be the basis of R3.

27See https://en.wikipedia.org/wiki/Basis_(linear_algebra),https://en.wikipedia.org/wiki/Vector_space, andhttps://en.wikipedia.org/wiki/Dimension_(vector_space).

Zheng-Liang Lu 316 / 332

Page 50: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Solution Set to System of Linear Equations29

� Let n be the number of unknowns and m be the number ofconstraints.28

� If m = n, then there exists a single unique solution.

� If m > n, then there is no solution.� Fortunately, we can find a least-squares error solution such

that ‖Ax − b ‖2 is minimal.

� If m < n, then there are infinitely many solutions.

� We can calculate solutions of these three kinds by

x = A \ b.

28Assume that these m constraints cannot be reduced, that is, they arelinearly independent.

29See https://www.mathworks.com/help/matlab/ref/mldivide.html.Zheng-Liang Lu 317 / 332

Page 51: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Case 1: Unique Solution (m = n)

� For example, 3x +2y −z = 1x −y +2z = −1−2x +y −2z = 0

1 >> A = [3 2 -1; 1 -1 2; -2 1 -2];2 >> b = [1; -1; 0];3 >> x = A \ b4

5 16 -27 -2

Zheng-Liang Lu 318 / 332

Page 52: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Case 2: Overdetermined System (m > n)

� For example, 2x −y = 2x −2y = −2x +y = 1

1 >> A=[2 -1; 1 -2; 1 1];2 >> b=[2; -2; 1];3 >> x = A \ b4

5 16 1

Zheng-Liang Lu 319 / 332

Page 53: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Case 3: Underdetermined System (m < n)

� For example, {x +2y +3z = 7

4x +5y +6z = 8

1 >> A = [1 2 3; 4 5 6];2 >> b = [7; 8];3 >> x = A \ b4

5 -36 07 3.333

� Note that this solution is a basic solution, one of infinitelymany.

� How to find the directional vector?

Zheng-Liang Lu 320 / 332

Page 54: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Gaussian Elimination Algorithm30

� First we consider the linear system is represented as anaugmented matrix [A | b ].

� We then transform A into an upper triangular matrix

[ A | b ] =

1 a12 · · · a1n b10 1 · · · a2n b2...

.... . .

......

0 0 · · · 1 bn

.

where aijs and bi s are the resulting values after elementaryrow operations.

� This matrix is said to be in reduced row echelon form.

30See https://en.wikipedia.org/wiki/Gaussian_elimination.Zheng-Liang Lu 321 / 332

Page 55: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

� By this form, we can have the solution by backwardsubstitution as follows:

xi = bi −n∑

j=i+1

aijxj ,

where i = 1, 2, · · · , n.

� Time complexity: O(n3).

Zheng-Liang Lu 322 / 332

Page 56: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Exercise

1 clear; clc;2

3 A = [3 2 -1; 1 -1 2; -2 1 -2];4 b = [1; -1; 0];5 A \ b % check the answer6

7 for i = 1 : 38 for j = i : 39 b(j) = b(j) / A(j, i); % why first?

10 A(j, :) = A(j, :) / A(j, i);11 end12 for j = i + 1 : 313 A(j, :) = A(j, :) - A(i, :);14 b(j) = b(j) - b(i);15 end16 end17 x = zeros(3, 1);

Zheng-Liang Lu 323 / 332

Page 57: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

18 for i = 3 : -1 : 119 x(i) = b(i);20 for j = i + 1 : 1 : 321 x(i) = x(i) - A(i, j) * x(j);22 end23 end24 x

Zheng-Liang Lu 324 / 332

Page 58: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

More Functions of Linear Algebra31

� Matrix properties: norm, rcond, det, null, orth, rank, rref,trace, subspace.

� Matrix decomposition: lu, chol, qr.

31See https://www.mathworks.com/help/matlab/linear-algebra.html.Zheng-Liang Lu 325 / 332

Page 59: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Numerical Example: 2D Laplace’s Equation

� A partial differential equation (PDE) is a differential equationthat contains unknown multivariable functions and theirpartial derivatives.32

� Let Φ(x , y) be a scalar field on R2.

� Consider Laplace’s equation33 as follows:

∇2Φ(x , y) = 0,

where ∇2 = ∂2

∂x2+ ∂2

∂y2 is the Laplace operator.

� Consider the system shown in the next page.

32Seehttps://en.wikipedia.org/wiki/Partial_differential_equation.

33Pierre-Simon Laplace (1749–1827).Zheng-Liang Lu 326 / 332

Page 60: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

V1

V2

V3

V4

V5

V6

V7

V8

V9

V10

V11

V12

V13

V14

V15

V16

V17

V18

V19

V20

V21

V22

V23

V24

V25

� Consider the boundary condition:� V1 = V2 = · · · = V4 = 0.� V21 = V22 = · · · = V24 = 0.� V1 = V6 = · · · = V16 = 0.� V5 = V10 = · · · = V25 = 1.

Zheng-Liang Lu 327 / 332

Page 61: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

An Simple Approximation34

� As you can see, we partition the region into many subregionsby applying a proper mesh generation.

� Then Φ(x , y) can be approximated by

Φ(x , y) ≈ Φ(x + h, y) + Φ(x − h, y) + Φ(x , y + h) + Φ(x , y − h)

4,

where h is small enough.

34Seehttps://en.wikipedia.org/wiki/Finite_difference_method#Example:

_The_Laplace_operator.Zheng-Liang Lu 328 / 332

Page 62: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Matrix Formation

� By collecting all constraints, we have Ax = b where

A =

4 −1 0 −1 0 0 0 0 0−1 4 −1 0 −1 0 0 0 00 −1 4 0 0 −1 0 0 0−1 0 0 4 −1 0 −1 0 00 −1 0 −1 4 −1 0 −1 00 0 −1 0 −1 4 −1 0 −10 0 0 −1 0 0 4 −1 00 0 0 0 −1 0 −1 4 −10 0 0 0 0 −1 0 −1 4

and

b =[

0 0 1 0 0 1 0 0 1]T

.

Zheng-Liang Lu 329 / 332

Page 63: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Dimension Reduction by Symmetry

� As you can see, V7 = V17,V8 = V18 and V9 = V19.

� So we can reduce A to A′

A′ =

4 −1 0 −1 0 0−1 4 −1 0 −1 00 −1 4 0 0 −1−2 0 0 4 −1 00 −2 0 −1 4 −10 0 −2 0 −1 4

and

b′ =[

0 0 1 0 0 1]T

.

� The dimensions of this problem are cut to 6 from 9.

� This trick helps to alleviate the curse of dimensionality.35

35See https://en.wikipedia.org/wiki/Curse_of_dimensionality.Zheng-Liang Lu 330 / 332

Page 64: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 10

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

V1

V2

V3

V4

V5

V6

V7

V8

V9

V10

V11

V12

V13

V14

V15

V16

V17

V18

V19

V20

V21

V22

V23

V24

V25

0.0000

0.0000

0.0000

0.0000

1.0000

0.0000

0.0714

0.1875

0.4286

1.0000

0.0000

0.0982

0.2500

0.5268

1.0000

0.0000

0.0714

0.1875

0.4286

1.0000

0.0000

0.0000

0.0000

0.0000

1.0000

Zheng-Liang Lu 331 / 332

Page 65: >> Lecture 6 >> -- Strings and Regular Expressionsd00922011/matlab/306/20181212.pdf · ASCII is a character-encoding scheme originally based on the English alphabet that encodes 128

Remarks

� This is a toy example for numerical methods of PDEs.

� We can use the PDE toolbox for this case. (Try.)� You may consider the finite element method (FEM).36

� The mesh generation is also crucial for numerical methods.37

� You can use the Computational Geometry toolbox fortriangular mesh.38

36See https://en.wikipedia.org/wiki/Finite_element_method.37See https://en.wikipedia.org/wiki/Mesh_generation.38See https:

//www.mathworks.com/help/matlab/computational-geometry.html.Zheng-Liang Lu 332 / 332