Ivan Idris's Blog, page 19
January 19, 2012
NumPy Project Euler Problem 3
Project Euler Problem 3 seems almost impossible to crack. However, using the right algorithm – Fermat's factorization method and NumPy, it becomes very easy. The idea is to factor a number N into two numbers c and d.
N = c * d = (a + b) (a - b) = a ** 2 - b ** 2
We can apply the factorization recursively, until we get the required prime factors.
1. Create array of trial values
The algorithm requires us to try a number of trial values for a. It makes sense to create a NumPy array and eliminate the need for loops. However, you should be careful to not create an array that is too big. On my system an array of a million elements seems to be just the right size.
a = numpy.ceil(numpy.sqrt(n))
lim = min(n, LIM)
a = numpy.arange(a, a + lim)
b2 = a ** 2 - n
2. Get the fractional part of the b array
We are now supposed to check whether b is a square. Use the NumPy modf function to get the fractional part of the b array.
fractions = numpy.modf(numpy.sqrt(b2))[0]
3. Find 0 fractions
Call the NumPy where function to find the indices of 0 fractions.
indices = numpy.where(fractions == 0)
4. Find the first occurrence of a 0 fraction
Actually we only need the first occurrence of a 0 fraction. First, call the NumPy take function with the indices array from the previous step to get the a values of 0 fractions. Now we need to "flatten" this array with the NumPy ravel function.
a = numpy.ravel(numpy.take(a, indices))[0]
Below is the entire code needed to solve the problem.
import numpy
#The prime factors of 13195 are 5, 7, 13 and 29.
#What is the largest prime factor of the number 600851475143 ?
N = 600851475143
LIM = 10 ** 6
def factor(n):
#1. Create array of trial values
a = numpy.ceil(numpy.sqrt(n))
lim = min(n, LIM)
a = numpy.arange(a, a + lim)
b2 = a ** 2 - n
#2. Check whether b is a square
fractions = numpy.modf(numpy.sqrt(b2))[0]
#3. Find 0 fractions
indices = numpy.where(fractions == 0)
#4. Find the first occurrence of a 0 fraction
a = numpy.ravel(numpy.take(a, indices))[0]
a = int(a)
b = numpy.sqrt(a ** 2 - n)
b = int(b)
c = a + b
d = a - b
if c == 1 or d == 1:
return
print c, d
factor(c)
factor(d)
factor(N)
The code prints the following output:
1234169 486847
1471 839
6857 71
Have a go
I would suggest writing an unit test to check the outcome. Use for example trial division. To get you started here is some of the code of a possible test function. Please fill in the dots.
def is_prime(n):
a = numpy.arange(2, n)
return len(a[...]) == 0
Also maybe you should try finetuning the size of the trial values array, but be careful.
If you liked this post and are interested in NumPy check out NumPy Beginner's Guide by yours truly.
January 15, 2012
2 Beautiful Computer Graphics Books
These Computer Graphics books are beautiful, because they have lots of color plates and figures, hardcover with nice designs. I think you should appreciate these kind of things, otherwise you could just as well hold a stack of paper in your hands or read an ebook.
1. Interactive Computer Graphics: A Top-Down Approach With Opengl
Author
Edward Angel
ISBN-10
0201855712
This is an introductory book about OpenGL. We had an optional course on OpenGL in University and this book was required reading for this course.
Chapter 1 introduces some basic concepts, such as camera models and ray tracing.
Chapter 2 gets us started wih the OpenGL API.
Chapter 3 discusses input devices, the client-server perspective and menus.
Chapter 4 starts with a bit of geometry and linear algebra. This is followed by an OpenGL example. The chapter ends with transformations supported by OpenGL.
Chapter 5 deals with projections and perspective.
Chapter 6 is about light, light sources, reflection and ray tracing.
Chapter 7 involves studying implementation algorithms for geometric transformations, clipping and rasterization.
Modeling the real world is the topic of Chapter 8. This includes physical models based on elementary Newtonian mechanics.
Chapter 9 teaches us about curves and splines.
Chapter 10 mentions texture mapping, bit and pixel operations, composition techniques, sampling and aliasing.
The chapter titles could have been a bit longer and more descriptive. The book goes over some mathematics and theory. However, it never gets very challenging. I would not buy the book for that. There are lots of OpenGL examples. This is the strength of the book. Although the book has color plates and a nice hard cover, the code does not have syntax highlighting. You might say that I am too used to IDE's, but I have seen syntax highlighting in at least one book, so it should be possible. I give this book 3 stars out of 5.
2. Computer Graphics: Principles and Practice in C (2nd Edition)
Author
James D. Foley, Andries van Dam, Steven K. Feiner and John F. Hughes
ISBN-10
0201848406
Computer Graphics is about computer graphics and principles. This book has four authors, who are experts in their field. It has a hardcover, is richly illustrated with color plates and lots of figures. If the code had syntax highlighting, then it would have been even better.
Chapter 1 covers the basics. Chapter 2 is about SGRP (Simple Raster Graphics Package). Chapter 3 presents basic raster graphics algorithms for drawing 2D primitives.
Chapter 4 describes graphics hardware. Chapter 5 introduces geometrical transformations. Chapter 6 discusses viewing in 3D, projections and perspective.
Chapter 7 is dedicated to SPHIGS (Simple Programmer's Hierarchical Interactive Graphics System). Chapter 8 is the first of three chapters on GUI's. These three chapters are low on mathematics and code. Dialogue design is the title of Chapter 9. Chapter 10 examines user interface software.
Chapter 11 is about representing curves and surfaces. Finally, we get back to some code and mathematics. Chapter 12 builds on the previous chapter and continues with solid modeling. Sadly, no code in this chapter. Achromatic and colored light is the subject of chapter 13. This is a fun chapter, however, the lack of color usage in this chapter about color seems paradoxical to me.
In Chapter 14 we embark on a quest for visual realism. Chapter 15 is about visible-surface determination. Chapter 16 discusses illumination and shading.
Chapter 17 explores image manipulation and storage. Chapter 18 discusses advanced raster graphics architecture. Chapter 19 describes advanced geometric and raster algorithms.
Chapter 20 concentrates on advanced modeling techniques. Chapter 21 brings to life animation.
As I said above, this book is written by four authors. Obviously, this means that the writing style differs between chapters. Some chapters have no code or mathematics. Some chapters have one or the other. These are not my favorite chapters. The code examples are written in C or sometimes in pseudocode that looks a lot like C, so you need to have some knowledge of C.
The book is quite thorough and seems pedagogically sound. It is considered a "classic" for many reasons. Personally, I learned a lot about computer graphics algorithms. I give this book 4 stars out of 5.
January 12, 2012
NumPy Project Euler Problem 2
Project Euler Problem 2 is definitely harder than Problem 1. This one requires reading the Fibonacci numbers Wikipedia article.
I read that the Project Euler problems are some sort of katas. Katas are what martial artist call exercises, preparing you for real-life fighting situations. For instance, imagine that your house is on fire and that your family is surrounded by heavily armed thugs. What do you do? In this case your secret weapon is NumPy, which is like the Dim Mak death touch. If you master that like me, then you don't even need to bother with katas.
1. Calculate the golden ratio
When the bad ninja guys are walking towards you, the first thing to do, is calculate the golden ratio, also called the golden section. This part of the kata is called Moro Ashi Dachi.
1
2
phi = (1 numpy.sqrt(5))/2
print "Phi", phi
This prints the golden mean.
1
Phi 1.61803398875
2. Find the index below 4 million
Next in the kata we need to find the index below 4 million. A formula for this is given in the Wikipedia page. All we need to do is convert log bases. We don't need to round the result down to the closest integer. This is automatically done for us in the next step of the kata.
1
2
n = numpy.log(4 * 10 ** 6 * numpy.sqrt(5) 0.5)/numpy.log(phi)
print n
We get the result 33.2629480359. You can double check this yourself from the Wikipedia article.
3. Create an array of 1-n
No, you say. Not the arange function again. We know about it already, when are we going to learn about the Jyutping. Patience, let's first master the basics.
1
n = numpy.arange(1, n)
4. Compute Fibonacci numbers
There is a convenient formula we can use to calculate the Fibonacci numbers. We will need the golden ratio and the array from the previous step in the kata as input parameters. Print the first 9 Fibonaci numbers to check the result. I could have made an unit test instead of a print statement. This variation of the kata is left as an exercise for the reader.
1
2
fib = (phi**n - (-1/phi)**n)/numpy.sqrt(5)
print "First 9 Fibonacci Numbers", fib[:9]
The output is below. You can plug this right into an unit test, if you want.
1
First 9 Fibonacci Numbers [ 1. 1. 2. 3. 5. 8. 13. 21. 34.]
5. Convert to integers
This step is optional. I think it's nice to have an integer result at the end. It's like the difference between Mawashi Geri and Mawashi Geri Koshi. The effect is almost the same, but one feels better than the other. OK, I actually wanted to show you the astype function.
1
2
fib = fib.astype(int)
print "Integers", fib
A long list of numbers is printed.
1
2
3
Integers [ 1 1 2 3 5 8 13 21 34
... snip ... snip ...
317811 514229 832040 1346269 2178309 3524578]
6. Select even-valued terms
The kata demands that we select the even valued terms now. This should be easy for you, if you followed the previous kata.
1
2
eventerms = fib[fib % 2 == 0]
print eventerms
There we go. Another opportunity to write unit tests.
1
2
[ 2 8 34 144 610 2584 10946 46368 196418
832040 3524578]
7. Sum the selected terms
This is the final step. Your enemies are lying on the ground helpless, unable to fight back. Finish the job, call the ndarray sum method.
1
print eventerms.sum()
For completeness, below is the complete code of this kata.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
import numpy
#Each new term in the Fibonacci sequence is generated by adding the previous two terms.
#By starting with 1 and 2, the first 10 terms will be:
#1, 2, 3, 5, 8, 13, 21, 34, 55, 89, ...
#By considering the terms in the Fibonacci sequence whose values do not exceed four million,
#find the sum of the even-valued terms.
#1. Calculate phi
phi = (1 numpy.sqrt(5))/2
print "Phi", phi
#2. Find the index below 4 million
n = numpy.log(4 * 10 ** 6 * numpy.sqrt(5) 0.5)/numpy.log(phi)
print n
#3. Create an array of 1-n
n = numpy.arange(1, n)
print n
#4. Compute Fibonacci numbers
fib = (phi**n - (-1/phi)**n)/numpy.sqrt(5)
print "First 9 Fibonacci Numbers", fib[:9]
#5. Convert to integers
# optional
fib = fib.astype(int)
print "Integers", fib
#6. Select even-valued terms
eventerms = fib[fib % 2 == 0]
print eventerms
#7. Sum the selected terms
print eventerms.sum()
That was quite a workout. More katas to come soon. Please stay tuned.
If you liked this post and are interested in NumPy check out NumPy Beginner's Guide by yours truly. I would like to thank Christopher Felton and Gael Varoquaux for their recent reviews of my book.
January 9, 2012
3 Great Computer Science Books in My Library
Here are 3 amazing (if a bit old) Computer Science books from my library.
1. TCP/IP Illustrated Volume 1
Author
W. Richard Stevens
ISBN-10
0201633469
This is a fantastic textbook about different network protocols. The protocols are illustrated using the output of tcpdump and other utilities. Protocols discussed include:
IP
ARP
RARP
ICMP
UDP
IGMP
TFTP
BOOTP
TCP
SNMP
FTP
SMTP
Beside the protocols several diagnostic tools such as ping and traceroute are covered in detail. The book contains lots of diagrams, that illustrate the protocols even further. Each chapter ends with exercises. The solutions of some of those exercises can be found in the book as well. I give this book 4 stars out of 5.
2. Database System Concepts
Authors
Abraham Silberschatz, Henry Korth, S. Sudarshan
ISBN-10
0071122680
This book teaches about the inner workings, the nuts and bolts of databases without requiring a lot of prior knowledge:
Data Models
Relational Databases
Object Based Databases and XML
Data Storage and Querying
Transaction Management
Database System Architecture
Personally I feel that some material doesn't belong here, such as XML, but it seems to be a tradition to include it. I found the chapter on indices quite interesting.
At the end of the book there are several chapters on popular commercial databases. It seems that these chapters do not add any value. In any case the authors should have devoted at least one chapter to an open source database. In the real world you have as much chance to work with a commercial database as with an open source one.
My copy is quite outdated by now. New developments, for instance NoSQL databases, are not mentioned. I give this book 3 out of 5 stars.
3. Machine Learning
Author
Tom Mitchell
ISBN-10
0071154671
This is an introductory book on Machine Learning. There is quite a lot of mathematics and statistics in the book, which I like. A large number of methods and algorithms are introduced:
Neural Networks
Bayesian Learning
Genetic Algorithms
Reinforcement Learning
The material covered is very interesting and clearly explained. I find the presentation, however, a bit lacking. I think it has to do with the chosen fonts and lack of highlighting of important terms. Maybe it would have been better to have shorter paragraphs too.
If you are looking for an introductory book on machine learning right now, I would not recommend this book, because in recent years better books have been written on the subject. These are better obviously, because they include coverage of more modern techniques. I give this book 3 out of 5 stars.
Thanks for the NumPy 1.5 Beginner's Guide review by Chris Fonnesbeck on the Strong Inference Blog.
January 7, 2012
NumPy Project Euler Problem 1
Project Euler is a website that lists a number of mathematical problems, which are perfect to be solved with NumPy. Let's start the new year with Problem 1.
1. Call the arange function
Call the arange function in order to store all the integers from 1 to 1000 in an array.
# 1. Numbers 1 - 1000
a = numpy.arange(1, 1000)
2. Select the multiples of 3 or 5
Select using the [] operator.
# 2. Select multiple of 3 or 5
a = a[(a % 3 == 0) | (a % 5 == 0)]
print a[:10]
This prints as expected
[ 3 5 6 9 10 12 15 18 20 21]
3. Sum the array elements
Call the sum method on the NumPy array.
# 3. Sum the numbers
print a.sum()
Once again the code below in its entirety.
import numpy
#Problem 1.
#If we list all the natural numbers below 10 that are multiples of 3 or 5, we get 3, 5, 6 and 9.
#The sum of these multiples is 23.
#Find the sum of all the multiples of 3 or 5 below 1000.
# 1. Numbers 1 - 1000
a = numpy.arange(1, 1000)
# 2. Select multiple of 3 or 5
a = a[(a % 3 == 0) | (a % 5 == 0)]
print a[:10]
# 3. Sum the numbers
print a.sum()
If you liked this post and are interested in NumPy check out NumPy Beginner's Guide by yours truly.
January 1, 2012
4 Great Physics books I had to read
I made a list of four great introductory level Physics books I needed to read for my Master's degree. You can see them below.
1. Analytical Mechanics
Authors
Grant R. Fowles and George L. Cassiday
ISBN-10
0030989744
Mechanics was one of the first (if not the first) courses I followed in University. "Analytical Mechanics" is a brilliant junior level book that covers:
Newton's Laws of Motion
Harmonic motion
Kepler's Laws
Rigid bodies
Lagrangian Mechanics
The authors use real world examples such as falling raindrops to illustrate various concepts. The problems at the end of each chapter are quite challenging and interesting. There are also some numerical problems in the book, that should be solved with a computer. In my opinion there could have been more of those. The study group, that I was part of, ignored the "computer" problems completely.
2. Optics
Author
Eugene Hecht
ISBN-10
020111609X
I had a course on Optics in my first year in University with this book. The book has lots of examples and is very "wordy". I suspect the author hoped to make it a reference as well as a study book. The number of photos in "Optics" is unusually high for a Physics book. I wouldn't mind if some of those were colour photos, but then again it would have made the book less affordable. Topics include:
Mathematics of Waves
Light
Geometrical Optics
Polarization
Interference
Diffraction
Overall, I would say that this was a good book. Ideal for people who prefer long explanations.
3. Introduction to Electrodynamics
Author
David J. Griffiths
ISBN-10
013805326X
This book is an introduction on electricity and magnetism covering:
Electrostatics
Electric Fields
Magnetostatics
Magnetic Fields
Radiation
Relativistic Electrodynamics
The style of the book is very concise. In my opinion the majority of the examples are a bit abstract. Still the explanations are clear and the main points are indicated in such a way that you cannot miss them. For instance, in the story about Faraday's law, the sentence
"A changing magnetic field induces an electric field."
is printed in bold with a box around it and lots of whitespace. To be honest, I first tried winging it, only reading handouts and lecture notes. This was a big mistake. Having this book in my possession was a huge improvement.
4. Mathematical Methods for Physicists
Authors
George B. Arfken and Hans J. Weber
ISBN-10
0120598167
As far as I remember this book was required reading for the "Special Functions" course. You could argue that this is not a pure Physics book. However, it is certainly not a pure Mathematics book, since the examples are related to Physics problems:
Vector Analysis
Group Theory
Infinite Series
Complex Algebra
Special Functions
I think that this is an awesome book. Sometimes I just read it for fun. Seriously. I can't understand, how people can be so negative about it. Probably they were forced to go through the material at high speed without much help from their professors. I admit that this is not a good approach, especially given the size of the book.
December 28, 2011
2012 Python Meme
My "Python meme" replies.
What's the coolest Python application, framework or library you have discovered in 2011?
NumPy, because of my book Numpy 1.5 Beginner's Guide published in November this year. Although I did not discover it in 2011. Still I learned a lot of things about NumPy in the process, so it was a rediscovery, if you will. Yes, NumPy is supercool. Trust me, I am an expert on cool.
What new programming technique did you learn in 2011?
I can't tell you that, it's all strictly confidential, top secret stuff. I think I said too much already. OK, I learned some basic AI techniques. Nothing special. I did some spikes with SQLAlchemy too.
What's the name of the open source project you contributed the most in 2011? What did you do?
NumPy. I hope that I contributed to NumPy through my blog and book. Evangelist is a Big Word, so I would not use that one to describe my contribution. Let's say, I am an Apprentice Evangelist. In training.
What was the Python blog or website you read the most in 2011?
I have been reading the NumPy, SciPy and Matplotlib online documentation a lot.
What are the three top things you want to learn in 2012?
That's easy.
(Internet) marketing.
Social media.
Artificial intelligence, machine learning and natural language processing.
The first two items I want to learn mostly because of my book. I am exploring social media, which means in practice that I joined a lot of websites and created profiles. The list is on the social section of the unofficial book website. If you Like me, I will Like you back. If you Follow me, I will Follow you back, I promise. Let's be friends! The last item is something I always wanted to do. I always dreamed of having a robot or computer do all my hard work.
What are the top software, app or lib you wish someone would write in 2012?
I really like VisualVM, which is a Java profiling tool. I haven't found a Python equivalent yet, so please write one. Also I want it to be easy to use and install.
Want to do your own list? here's how:
copy-paste the questions and answer to them in your blog tweet it with the #2012pythonmeme hashtag
I would like to thank Gokhan Sever for his recent review of NumPy Beginner's Guide on the Pycloud blog. Happy New Year, readers! I wish you all the best for 2012!
December 23, 2011
Christmas NumPy Book Giveaway
Merry Christmas, dear readers!
Since it's the season of giving, Packt Publishing offered to organize a contest with prize – 2 print copies and 2 ebooks of my book NumPy Beginner's Guide.
The Prize
What you will learn from NumPy 1.5 Beginner's Guide
Installing NumPy
Learn to load arrays from files and write arrays to files
Work with universal functions
Create NumPy matrices
Use basic modules that NumPy offers
Write unit tests for NumPy code
Plot mathematical NumPy results with Matplotlib
Integrate with Scipy, a high level Python scientific computing framework built on top of NumPy
The book is written in beginner's guide style with each aspect of NumPy demonstrated by real world examples. You can also download a sample chapter.
How to Win NumPy Beginner's Guide
You can enter by writing a comment to this post explaining why you would like to have the book. The contest has already started and will end on January 31st 2012 at 11:59 PM GMT. Winners will be randomly chosen and notified by email, after termination of the contest.
The contest is open to everybody in the world, however print copies are only available to residents of the US and Europe.
Comments are moderated by me, so your comment will not appear immediately.
Good luck, readers!
December 17, 2011
Steady State Vector of Markov Chains
Numpy Strategies 0.1.3
Unlucky Sprint 13. Scary, especially if you suffer from Triskaidekaphobia. The task for this sprint:
Determine the steady state vector of our Markov chain model.
This task was part of the story:
I want to know how to bet in the long term.
A very trivial task especially with NumPy. Far into the distant future or in theory infinity, the state of our Markov chain system will not change anymore. This is also called a steady state. The stochasic matrix A we calculated last time applied to the steady state, will yield the same state x. Using mathematical notation:
Ax = x
Another way to look at this is as the eigenvector for eigenvalue 1.
1. Smoothing
Last time we counted the number of occurrences for each state transition in our model. This gave us a bit skewed results, because in our sample a Flat or no change situation never occurred. In real life we could have a day that the close price does not change, although unlikely for liquid stock markets. One way to deal with 0 occurrences is to apply additive smoothing. The idea is to add a certain constant to the number of occurrences we find, getting rid of zeroes. Of course, we will have to normalize the probabilites afterwards. Two lines have to change due to the smoothing process. First, change this line
N = len(start_indices)
to
N = len(start_indices) + k * NDIM
where k is an integer smoothing constant and NDIM is the number of states, in our case 3. Second, update the line
SM[i][j] = occurrences/float(N)
to
SM[i][j] = (occurrences + k)/float(N)
The stochastic matrix for k=1 becomes
[[ 0.47826087 0.00869565 0.51304348]
[ 0.33333333 0.33333333 0.33333333]
[ 0.41134752 0.0070922 0.58156028]]
2. Eigenvalues and eigenvectors
To get the eigenvalues and eigenvectors we will need the numpy linalg module and the eig function.
eig_out = numpy.linalg.eig(SM)
print eig_out
The eig function returns an array containing the eigenvalues and an array containing the eigenvectors.
(array([ 1. , 0.06739662, 0.32575786]), array([[ 0.57735027, 0.7670805 , 0.0082198 ],
[ 0.57735027, -0.19564812, -0.99986103],
[ 0.57735027, -0.61099044, 0.01450345]]))
3. Selecting the eigenvector for eigenvalue 1
Currently we are only interested in the eigenvector for eigenvalue 1. In reality the eigenvalue might not be exactly 1, so we should build in a margin for error. We can find the index for eigenvalue between 0.9 and 1.1 as follows
idx_vec = numpy.where(numpy.abs(eig_out[0] - 1) < 0.1)
print "Index eigenvalue 1", idx_vec
We get index 0 as expected
Index eigenvalue 1 (array([0]),)
Select the steady state vector
x = eig_out[1][:,idx_vec].flatten()
print "Steady state vector", x
The steady state vector
Steady state vector [ 0.57735027 0.57735027 0.57735027]
4. Check
The check is simple – just multiply the stochastic matrix and the steady state vector we got.
print "Check", numpy.dot(SM, x)
Seems like the result is correct. I will leave it up to you to decide whether it is meaningful or not. Task resolved.
Check [ 0.57735027 0.57735027 0.57735027]
Here is the code in its entirety with imports and all.
#!/usr/bin/env python
from matplotlib.finance import quotes_historical_yahoo
from datetime import date
import numpy
import sys
today = date.today()
start = (today.year - 1, today.month, today.day)
quotes = quotes_historical_yahoo(sys.argv[1], start, today)
close = [q[4] for q in quotes]
states = numpy.sign(numpy.diff(close))
NDIM = 3
SM = numpy.zeros((NDIM, NDIM))
signs = [-1, 0, 1]
k = int(sys.argv[2])
for i in xrange(len(signs)):
#we start the transition from the state with the specified sign
start_indices = numpy.where(states[:-1] == signs[i])[0]
N = len(start_indices) + k * NDIM
# skip since there are no transitions possible
if N == 0:
continue
#find the values of states at the end positions
end_values = states[start_indices + 1]
for j in xrange(len(signs)):
# number of occurrences of this transition
occurrences = len(end_values[end_values == signs[j]])
SM[i][j] = (occurrences + k)/float(N)
print SM
eig_out = numpy.linalg.eig(SM)
print eig_out
idx_vec = numpy.where(numpy.abs(eig_out[0] - 1) < 0.1)
print "Index eigenvalue 1", idx_vec
x = eig_out[1][:,idx_vec].flatten()
print "Steady state vector", x
print "Check", numpy.dot(SM, x)
Have a go
Let's put the Product Owner hat on. Possible improvement stories I can think of:
Fine tune the smoothing constant.
Introduce more states.
Use different parameters, for instance, option price or volume.
Try out different stocks, stock indices or a combination of stocks.
Extend to different time periods instead of days.
Make some unit tests.
I am looking forward to hearing about your improvements.
If you liked this post and are interested in NumPy check out NumPy Beginner's Guide by yours truly. I would like to thank Mike Driscoll for his book review on the Mouse vs Python blog and on Amazon.com I will be back next week with your Christmas present, which will remain a surprise for now.
December 10, 2011
Stochastic matrix of the FUD states
Numpy Strategies 0.1.2
So we are now in Sprint 12 of Project "NumPy Strategies". Last week you finished the task of determining states and defining a Markov chain model. This took you ten minutes even though the estimate was 2 days. Hey, that's the power of NumPy! The rest of the time you spent playing computer games/pool/pinball or reading your favorite book NumPy Beginner's Guide.
Pop Quiz
Let's see whether you remember what we decided the states should be:
flat F, up U, bottom B, average A, rally R
flat F, up U, down D
None of the above.
This is a trick question!
The next task is:
Create the stochastic matrix with transition probabilities
We will estimate this task to 2 days again for obvious reasons.This task can be split into several subtasks, but these are of course not estimated.
1. Initialize the stochastic matrix to 0 values
We have 3 possible start states and 3 possible end states for each transition. For instance, if we start from a U state, we could switch to:
U again
F
D
Initialize the stochastic matrix with the NumPy zeros function.
1
SM = numpy.zeros((3, 3))
2. Create a list containing the signs
This is just a normal Python list.
1
signs = [-1, 0, 1]
3. For each sign select the corresponding start state indices
Now the code becomes a bit messy. We will have to use actual loops! Maybe a better solution will occur to me one day. We will loop over the signs and select the start state indices corresponding to each sign. Select the indices with the NumPy where function.
1
2
3
for i in xrange(len(signs)):
#we start the transition from the state with the specified sign
start_indices = numpy.where(states[:-1] == signs[i])[0]
4. Continue to the next iteration if no start states are found
This is just straightforward Python code.
1
2
3
4
5
N = len(start_indices)
# skip since there are no transitions possible
if N == 0:
continue
5. Compute the end state values
The end state indices in our simple model can be found by just getting the next available position.
1
2
#find the values of states at the end positions
end_values = states[start_indices + 1]
6. Calculate the transition probabilities
We can now count the number of occurrences of each transition. Dividing by the total number of transitions for a given start state gives us the transition probabilities for our stochastic matrix. This is not the best method by the way, since it could be overfitting.
1
2
3
4
5
6
for j in xrange(len(signs)):
# number of occurrences of this transition
occurrences = len(end_values[end_values == signs[j]])
SM[i][j] = occurrences/float(N)
print SM
The result for the stochastic matrix is:
1
2
3
[[ 0.47787611 0. 0.52212389]
[ 0. 0. 0. ]
[ 0.42753623 0. 0.57246377]]
As you can see the transitions involving the flat no change state have 0 probability, while the other transitions have a probability close to 50%. This is I guess what we would have expected.
If you liked this post and are interested in NumPy check out NumPy Beginner's Guide by yours truly. Thanks goes to David W. Lambert for his recent review on Amazon.com and contribution to the unofficial errata. As far as I know there are still 3 reviews in the pipeline for December. One of them was announced on The Glowing Python blog. This announcement also contains a review of the free sample chapter. I will be back next week with the steady state vector task.


