More servicesWindows Live
HomeHotmailSpacesOneCare
 
MSN
Sign in
 
 
Spaces home  Read, Think, CodePhotosProfileFriendsMore Tools Explore the Spaces community
There are no music lists on this space.

Read, Think, Code

February 23

brief review of journey in the new data intensive world

I joined the big company 4 and half years ago, after spending a year and half in the cash-cow dept, and 2 years in the cut-edge bleeding department, I switched to its brain powerhouse. Overall, I am quite impressed by most of my collegues's smartness and raw and academic intelligences, especially in the currnet working enviroment. We have senior scientist who was MIT professor 10 years ago, we have young scientist from math with rigorous appoarch towards real world algorithmic problems. We have phds from all the top universities... we have internal courses on maching learning, algorithmic game theories, etc.
 
Data world is quite different from system world or application world. First it requires some additional math maturity the the top of of the foundation of algorithmic thinking, invariant and formal logical thinking. The additional math are mostly matrix analysis, probability/stastics and numberial optimization methods.  
 
the best matrix book must be Strang's book, vector space, column space, row space, matrix multiplication, null space, linear dependent, span, subspace, LU, QR, SVD, eiginvalue, eiginvalue decomposition, semi-definite, symmtric, Markov matrix, power method, SVD, projection, inner product,
 
I did not know of a good probability/statistics book here, but we are using elements of statistical learning from two stanford professors, that is a quite hard book. Probability, Bayesian, prior, posteri, likelihood, parametric estimation, gaussian, 2 sd rule, maximum entropy, pdf, cdf, central limit thorem, linear regression, least square, logistic regression,
 
on numerical optimization side, gradient descendent, conjugate gradiate, newton's method,  SVD, KTT, Langergian, tayler expansion, partial derivatives, positive semidefinitive, line search, etc..
 
macihine learning over the years also have developed some wellknown algorithms/techniques in house, like neural network, naive bayesian, svm, boosting, HMM, CRF,
October 24

brave new world of data

a few months after my last post on how I dislike the assigned work, I jumped to a different team inside the big company. The team is research orientd focusing on any innovative internet related technologies. Before I decided to join the team, I was very impressed by the people I met and how happy they all seem to be. I wonder whether I work for the same company. all with ph.d, all around 30s, coding review with beer in local bar, talking philosophy (after relativity) over dinner, I knew that I got some interesting people to work with.
 
in the first a few months, I worked on building a distributed system to store the whole web graph and graph algorithms on top of it. it is quite fun. it is the first truly distributed system i saw built from scratch (tcp/ip). later I got chance to work on search, mostly in the indexing part, we want to run page classifier (linear model trained by logistic regression) during the index time to extract metadata and store in the indexes to better serve queries; we also got to play with the query classifiation, ranking, ranker training, distributed storage/computation platform equivalent of gfs/mapreduce, I got to do quite some log processing and web page data process at very large scale (computation involves reading 10+ TB data). Search is fun, because of its scale, and its inherent complexies, interdisciplinary nature. definitely I spent more time on it. it is so fun.
 
now I am working on some information extraction technologies based on machine learnings. it is basically trying to generate structured output (rather than a simple category/numerical output commonly found in classification/regression problem), it is a reasearch frontier with a lot of paper publishing each year, and most new and good algorithms published after 2000. it is pretty interesting to enter this field because numerical calculation plays a centeral role in the machine learning algorithms (at the very basis, most/any machine learning algorithsm will be reduced to a optimization algorithms, and mostly based on numerical methods involve gradient descendent, stochastic gradient descendent, conjugate gradiant, newton method etc) I used to do a lot of scientific computation solving ode/pde/kinetic equations, so I got to know a lot of numerical methods before I know quicksort.  so now it is like back to the old school but spoken in totally different language. it is fun.
 
I also stumble upon a book recently called super cruncher, it main scheme is about how data driven decision making is superior to the inductive based approach, advocate stasticial methods, data driven, regression, etc. I completely agree with it. what I am daily working/reading/learning is just about turning data into use information could be consumed by other programs
 
btw: kinetic equation in plasma physics is basically a differential equation governing a probaility distribution function int the x,v,t space, perhaps we can derive similar equation in machine learing, that will be fun.
December 15

Get it started, move

Recently, I got assigned to work on a tarpit like project. It involves a major rewrite of existing running pieces of code to "modern" technlogies, and fixing a couple of big problems that could not fixed in the existing architecture.

 I did not like it initially, do not have the motivation to get started. For a couple of weeks, everyday starts with checking email, read online blogs/books and end with thsoe stuff. Hardly did I make any progresses eventhough I created the project file and had another fellow developer working with me. The good thing is that I learn quite a lot about javascript, ajax, dom stuff and also asp.net, and get to know another interesting project inside the big team quite well.

Starting with this Monday, I got some hard deadline from higher chain. Now got pressures to start, and still things get ramped slowly. I begin to profile my time usage, found out that frequently I got randomized just before I planned to start working on that by something like interview candiate, or internal technical presentation, after those randomization, found myself looking at the monitor emptyminded with time close to 4-5PM. so again a day wasted.

I begin to start rewarding myself if i got something done on that project, the reward is usually a nice lunch. It is usually working.
I also begin to prey to god to give me wisdom to handle this kind of situation; also every morning, I just told myself that get started, get started.

I begin to enjoy the work today after I got a couple of things working. I felt great because I could overcome myself to get started.

November 22

helloworld to web programming

I had been working on server backend for large web applications for the last year and half.  I designed one db on SQL server 2005 to support asynchronously process long-run jobs, it support standard FIFO, throttling control, retry and scheduling feature with great throughput, and another db to store the metadata about our backend db proks for streamline the middletier call to backend databases, also build some middletier infrastructures to coordinate backend db farms which contains horizontal and vertical partitioned 10+ different sql server clusters, so it is not uncommon for one frontend request requiring us to combine data from multiple databases, so I wrote some multithreaded distributed query code and implement some in-memory join algorithms to do that, some even support dynamic rollup operation which usually done in  I also did some middletier cache to avoid frontend cache and reduce backend traffic; and some batch solution to increase the backend throughput and reduce the backend resource usage.
 
Before that I had worked on building reusable managed library for a pretty popular application domain (real time communicatin) on top of network transport protocol (SIP), the dll can be and had been used to build client side (windows application) and also server side (web application).
 
Only until recently, I came to the brave new web world. I just knew a little html, some asp.net knowledge from past working experiences on build web service and interaction with fellow ui developers, so I could understand a little about like life cycle and how asp.net runtime compiles aspx page into c# class etc. But I have absolutely no clue what is going on in client side (brower side) until one day my boss recommend firebug to me. It is such a eye opener, just like give a kid the key to a candy store. I start to figure out javascript function/object model from debugger, flexibility of its typeless model. closure is not hard for me to understand because I know scheme/lisp very well. html DOM model is also pretty easy for me to understand because I know DOM model for XML. With firebug console, I could see ajax request and responses.  I also also introduced to xslt, css, both provide orthogonal view of imperate programming world, like aspected oriented programming to me.
 
I begin to dig into asp.net server side control model, it is pretty interesting object model for web programming. I did not see it in php, jsp or cgi. It gives such flexibility with the template method pattern, a simple implementation of IPostBackDataHandler and IPostBackEventHandler of a custom control gives it access to the postback data; those RenderBeginTag, RenderContent, RenderEndTag give it flexibility to render.
 
Just out of curiosity, I take a look at php world, I want to its equivalent of server side controls, but I see none. php is just scripting language in scripting (text processing) sense, embed code in some html text, no objects representing html tags or elements which can participate in the page processing cycle in the web server. what a primitive world, it looks like me back to the text processing time using perl with all those HERE operator to embed html in the code.  But google builds a much faster website than what we are building using php and very less javascript in the client side, but it seems to me obvious that asp.net is superior to php technology or abstraction wise, then my only explanation is: all those smart people using microsoft technologies had been attracted to work for microsoft, those work outside it are most ordinary joe such that asp.net are used only to build ordinary joe type web site not the super cool sites.
 
 
 
There are no photo albums.
No list items have been added yet.