Monday, November 30, 2009

Windows 7 Ubuntu like Multi-desktop

Windows 7 didn't provide multiple virtual desktops as expected. A Solution for Ubuntu like multi desktop is to use 3rd party extensions. It's not perfect, but it's working.

Here's one light weight and free one you may want to try:

Windows Pager

The also have some nice screenshots, and a flash tutorial video.

detail view



Source:WindowsPager

Monday, November 16, 2009

RDP on windows over ssh

First of all, you need to bind the port. Usually we use putty on a windows machine, or ssh on linux.

putty -L port1:DESKTOP_IP:port2 account@ip

and leave the putty window open. In many cases:
port1: 3390, the default rdp port on windows machine.
DESKTOP_IP: the ip of the remote desktop
port2: the rdp port on the remote desktop

Sometimes on a windows machine, remoting by this method will generate an error of not more than one console session is allowed. In certain windows versions, remoting over ssh is blocked. The easy fix is to use a different port other than 3390, such as 23468; then in the remote desktop ip address, type localhost:23468, which will fix the problem.

Thursday, October 1, 2009

Article on Java modularity

Java 7 is out. Here's a good article explains some of the nice features.

InfoQ:Modular Java:What is it?

Sunday, May 10, 2009

Data Analyst Job

Description
Description:

The bioinformatics department seeks an exceptionally talented and
motivated individual to assist with gene information curation and
database development, and provide data analysis supports for research
departments.

Responsibilities:

The successful candidate will perform regular data curation of the
in-house gene information databases. You will work with other
Bioinformatics scientists and programmers to identify needs and
opportunities in gene information integration and provide solutions.
The successful candidate will also work closely with lab scientists in
analyzing protein sequences, genomic structure and microarray data.
You will be encouraged to help design software for DNA and protein
sequence analysis and to pursue creative solutions to other tasks
commensurate with your experience and ability.

Requirements:

Applicants should possess a M.S. degree in bioinformatics, computer
science, or have equivalent experience. The successful candidate must
have strong programming experience in Java, Perl and database design.
Knowledge of R, and web application technologies is highly desirable.
You should have outstanding communication skills and the ability to
work independently and succeed in a complex, dynamic, team-oriented,
multi-disciplinary environment. You should be resourceful and pay
attention to details. Previous experience with biological databases
and sequence analysis are required.


DIVISION: Research
REQUISITION NUMBER: 08-1000025151 PROG ANALYST

Wednesday, April 29, 2009

R library som

Self-organizing maps for clustering, need large number of iterations.
A filtering function in the package,

filtering()

is very helpful to floor, ceil the input data table.

Wednesday, April 22, 2009

Corr

mantel.rtest {ade4},

This provides a comparison of two distance matrices.
Still looking for a good way of translating correlation to distance.

mahalanobis {stats}

Calculate the mahalanobis distance.

To calculate the mahalanobis distance require Cov estimate, and this can be done with either classic cov or robust cov estimate.

Monday, April 20, 2009

Procrustes Analysis

Procrustes analysis: procrustes() in vegan provides procrustes analysis, this package also provides functions for ordination and further information on that area is given in the Environmetrics task view. Generalised procrustes analysis via GPA() is available from FactoMineR.

Wednesday, April 15, 2009

clues package for clustering evaluation

clues contain the calculation of five different indexes when comparing two cluster/classifications.

adjustedRand(cl1, cl2, randMethod = c("Rand", "HA", "MA", "FM", "Jaccard"))

While the adjustedRandIndex from mclust can only compute for one.

Saturday, April 4, 2009

ggplot2: Great plot libs

Just put up a post first. More details coming up soon.

Wednesday, April 1, 2009

Robust correlations

The traditional correlation measures are not suitable for noisy or those with outliers. Non-parametric methods like Kendall and Spearman can do slightly better job than Pearson but not enough.

The R package robust provides nice robust correlation methods, covRob:

R LINK

Other robust methods can be found here:
Robust Task View

For outlier removal, you may refer to outliers package. Honestly, it's not doing a good job at all.

Tuesday, March 31, 2009

Heatmaps

There are several functions you may employ to get heatmaps:
default heatmap, image

heatmap.2 in gplots package
heatplot in made4 Bioconductor package
heatmap_2, heatmap_plus in Heatplus Bioconductor package

So far heatmap.2 in gplots works best. If you want to plot some covariates use heatmap_plus

The easiest way to define a custom colorPalette is

colorRamPalette(colors)(num_of_interpretation)

Monday, March 30, 2009

Granger Test of Causality

Test whether one time series can predict another
Here's the R link:

R Link

P.S.
notes: sometimes i hate the nomenclature differences in different programming languages:

in R, converting a form into string is the function as.character(), i was looking for a function like toString, string... blah, never got any luck. This reminds me of a simple function of getting the length of an array:

Java: arrayObject.length
Javascript: arrayObject.length
R: length(arrayObject)
Perl: scalar arrayObject
PHP: sizeof(arrayObject)
...

Sunday, March 29, 2009

Thursday, March 26, 2009

Adding Counter to your blog

Blogspot doesn't support page counter, so it's still pretty hard to track how many people have read your blog. However, yo can track with the 3rd party scripts.

Here's a tutorial, but it's not very detailed. What you can to do is here:

Link to tutorial

But before you get started, you need to register a free account on sitemeter:


sitemeter


Then you will have your own page counter! :)

Google web visualization api

If you have a small dataset and you want to put it into visualization and shared on the web, the most handy way is the Google chart api. Although the data requests are via http get so you probably can't visualize thousands of data points, but it's very handy to make simple plots to work, in a realtime manner.

And everyone can make up to 250,000 calls per day! Wow! Which means you can do some really cool web data services. This is not entirely new, actually some libs like Ext they do have extension gadgets specifically aimed at Google Charts.

Nevertheless, it took me like only 40 min to read the API and added a little working hours tracking gadget on this blog. It's actually dynamic; i will add some navigation buttons later. It's just another incentive to keep reminding myself do not waste time on forums, and concentrate on the work =)

Google Chart Link

Tuesday, March 24, 2009

R libraries [Clustering, Classification, LM]

Nice functions to quickly manipulate Rmetrics: fUtilities package


Basic libs:

cluster
class
stat

Model based:
mclust

Linear Models:
MASS
lmtest
car

This post will be updated.

Monday, March 23, 2009

WGCNA Package

A package for weighted correlation network analysis. It would be interesting to look at how this method can do for network analysis.

Resource Link

Paper Link

GeneNet Package

To do data transformations, there's a package named GeneNet; designed to analyze gene expression (time series) with the focus on gene networks.

they have handy transformation functions:

z.transform and hotelling.transform.

Note that these two transformations only work on correlation coefficients, with the range from -1 to 1.

Link

Sunday, March 22, 2009

Bagged Clustering

A given partitioning clustering algorithm is run repeatedly on bootstrapped data, the cluster centers then merged by hierarchical cluster algorithm. This is good for noisy data / small sample size.

From R library e1071, function bclust

R Doc Link

Also here's an online powerpoint that demonstrates the application on simulated metabolomics data:

Bagged K-means Metabolomics PPT

Thursday, January 8, 2009

Vista 64 = Messy Awesomeness

It took me a whole day to back up and install windows vista 64. Finally i can access my 8G of ram. Some softwares are heavily ram chewers (The gigantic adobe bundles) so even 3.25G ram is kinda skinny now.

Here are some of the issues:
1. Some OEM vista cdkeys are interchanageable for 32 bit and 64 bit versions. I have tried my ultimate 32 bit vista key and it works gracefully with the 64 bit DVD.

2. Cheap stuff = bad compatibility. The wireless dongle i bought for 15 bucks now shows the weakness over a pci wireless card: no driver for vista 64. I have to hook up two 3 meter cable to reach the router, and i still have the plastic connector back 3 years ago. And it rocks!

3. Weird file protection issues. Ok, here's what it is. Some softwares like acrobat reader will create some protection over certain files even admin can't delete in the first few tries. You need to 'takeover' the file ownership then 'grant' it to new administrators. Luckily these commands supports *.* batches...otherwise i have to manually deal with hundreds of files.

If you got access denied:
takeown /f file_name
icacls file_name /grant administrators:F

Then you may delete all the files.
There may be a hidden manual that can turn the entire vista into commandline?

4. There was a weird *.mov file on my computer in vista 32 that i could not remove before. First i thought it was related to hardware failure (NOOOO!) Chkdsk was totally green. I can play this file, it was fine, but any other operations, like rename, cut, delete, even right click on properties will freeze the explorer. I turned off OAS still no luck. Some googling showed that ppl have encountered similar issues, but no clear solutions were provided. Besides, this was not related to privileges.

Now i will see how vista works with my 8G of ram.