Vishal Lambe's Blog, page 4

January 12, 2015

Which search algorithm does GNU grep use?

I always wondered why grep is so blazingly fast?
How does grep mange the yin and yang of space and time to return results at such speed?

A quick Google search returned the result that the GNU grep uses the Boyer-Moore search algorithm.  The Boyer-Moore is basically looks first for the final letter of the target string, and uses a lookup table to tell it how far ahead it can skip in the input whenever it finds a non-matching character.

In other words, the idea behind the algorithm is to match on the tail of the pattern rather than the head, and to skip along the text in jumps of multiple characters rather than searching every single character in the text.

Therefore, in general, the algorithm runs faster as the pattern length increases.

 •  0 comments  •  flag
Share on Twitter
Published on January 12, 2015 09:13

December 25, 2014

Why Fact tables should not hold textual information

Although it is theoretically possible for a measured fact to be textual,  one must always avoid having it as part of fact table in dimensional modelling.

Textual information is mostly about the description of something. Therefore they are non-additive. They must be pushed to dimension tables as we do not want to store redundant textual information in fact tables.

As per Ralph Kimball, analysing a true textual face is next to impossible due to the unpredictable context of a text fact.

Also, fact tables take up around 90 percent of the space in dimensional databases. Given the space utilization,  it is best to delegate text information to dimension tables.

 •  0 comments  •  flag
Share on Twitter
Published on December 25, 2014 07:08

December 21, 2014

[Informatica ] Transformations that cannot be used in a Mapplet

The following transformations CANNOT be used inside a Mapplet.

COBOL sources, Normalizer
Any type of target
Nonreusable sequence generators
XML sources and targets
Another Mapplet inside a Mapplet

 •  0 comments  •  flag
Share on Twitter
Published on December 21, 2014 10:40

[Unix] less vs more vs vi

'less is more,  more or less'

The developer of 'less' utility Mark Nudelman joked that it is the reverse of more; and hence the name less.

While vi is a full scale text editor with power capabilities that could envy any text editor under the sun, more and less are text viewers.  They do not offer text editing capabilities as powerful as vi.

less is similar to more in the sense that it does the same job as more, however it is more advanced than more. less provided advanced capabilities to navigate through the text file.  Also, less uses a completely different approach for memory management with files.

While more loads the complete file in memory before display on the terminal,  less uses the swapping mechanism to display files. This results in faster load access time with large files.

The syntax to open a file with less is exactly same as more. less followed by the filename as shown below.
less mytextfile.txt

Navigating the file with less is very handy as it provides option to use up-down-right-left keys for browsing.  One can type in line number followed by 'g' to directly go to the line as shown in following example.
To directly go to line 37, one can type
37

 •  0 comments  •  flag
Share on Twitter
Published on December 21, 2014 10:28

December 7, 2014

[Informatica] Runtime files created by Informatica

The Integration Service process generates the following output files when you run workflows and sessions. 
- Workflow log
- Session log
- Session details file
- Performance details file
- Reject files and bad files
- Output file
- Cache files

Integration Service process creates cache files for the following mapping objects.
- Aggregator transformation
- Joiner transformation
- Rank transformation
- Lookup transformation
- Sorter transformation
- XML target 

By default, the DTM creates the index and data cache files in the index file PM*.idx, and the data file PM*.dat for Aggregator, Rank, Joiner, and Lookup transformations and XML targets in the directory configured for the $PMCacheDir.

The Integration Service process creates the cache file for a Sorter transformation in the $PMTempDir.

 •  0 comments  •  flag
Share on Twitter
Published on December 07, 2014 05:22

September 21, 2014

[Informatica] Bulk Load vs Normal Load

While loading data to relational targets, Informatica provides an option to either do Bulk load or Normal load. Selection of appropriate option can not only improve your session performance but also can make your rollback strategy foolproof.

Bulk load should be used in case where the data is humongous in size (greater than 100 GB), and the operation is performance intensive. Bulk load can greatly speed up your session performance in such a case.

However, the trade-off with Bulk load is that, it bypasses database logs, and thus recovery in case of failure is not possible. On the other hand, Normal load logs each and every database transaction while loading, thus enabling rollback.

Also, while using Bulk load, one cannot have Indexes defined on target. Bulk load fails in such a case. One must drop the Indexes, do Bulk load and then recreate the Indexes in such a case. Other option id to use Normal load.

 •  0 comments  •  flag
Share on Twitter
Published on September 21, 2014 11:59

August 30, 2014

:set commands with vi editor

Here is the gist of most commonly used set commands in vi editor.

:set nu
Display line numbers in via editor

:set nonu
Do not display line numbers in vi editor

:set eb
Beep the speaker every time an error occurs

:set noeb
Do not beep the speaker in case of error

:set ai
Set auto indent on

:set noai
Do not set auto indent

:set ic
Ignore case while searching a pattern in via editor

:set noic
Make the search case sensitive, don't ignore case

:set terse
Make messages terse

:set noterse
Do not make error messages terse

:set mesg
Permit receipt of messages from other terminals

:set nomesg
Do not permit receipt of messages

There are of course numerous other options available. To get a complete list of all the options with set, use below command

:set all

 •  0 comments  •  flag
Share on Twitter
Published on August 30, 2014 21:49

An Open Letter To My Younger Self

Hi Vishal,

I know what you're thinking right now, that constant restlessness. Believe me, I've been through it. I know the feeling. Everyone wanting to tell you what to do and how to behave. I also know you hate the constant vigil. However, bear with it. This is a defining moment for you.

One thing you should continue doing is reading. Read as much you can; magazines, articles, stories, books and newspaper. Remember while reading, nothing is irrelevant. Take everything in. Remember, in a long run, no knowledge goes unutilized.

One thing I want you to do is stop associating yourself with any particular idea. Be it a philosophy or a religion or any thought school. Diversify. Do not let the thoughts of others make an impact on your thought process. Don't let the moment define you, define the moment.

Respect knowledge, not age. Remember, respect is earned, do not demand it. Simplicity is the ultimate sophistication. Do not be materialistic; live minimalistic. Do not rush into things, let them take their own course. Make the most of all you have.

Most importantly, invest. Invest in relationships. Have quality friends. Build lifetime relationships.

And yes, they were wrong when they told you that money does not grow on trees. It does, but you need to water the tree with your sweat.

Don't overthink. Quit thinking what others think of you, make no attempt to impress others. If it has to be, it will. Everything will fall in its place. Make no haste.

Remember where you come from and how it is to be there. Stay connected to your roots.

Lovingly,
Your older self.

P.S: Question everything, that is the shortcut to learn. And start writing, the sooner the more polished it shall be.

 •  0 comments  •  flag
Share on Twitter
Published on August 30, 2014 08:31

August 27, 2014

Some excrepts from 'I have a dream' speech

Some excrepts from Martin Luther King's I have a dream speech Now is the time to make real the promises of democracy. Now is the time to rise from the dark and desolate valley of segregation to the sunlit path of racial justice. Now is the time to lift our nation from the quicksands of racial injustice to the solid rock of brotherhood. Now is the time to make justice a reality for all of God's children.

I have a dream that one day this nation will rise up and live out the true meaning of its creed: "We hold these truths to be self-evident; that all men are created equal."

I have a dream that my four little children will one day live in a nation where they will not be judged by the color of their skin but by the content of their character.

I have a dream that one day every valley shall be exhalted, every hill and mountain shall be made low, the rough places will be made plain, and the crooked places will be made straight, and the glory of the Lord shall be revealed, and all flesh shall see it together.

-Martin Luther King 
28 August 1963
 •  0 comments  •  flag
Share on Twitter
Published on August 27, 2014 23:49

August 5, 2014

[Informatica] Defining Expression Strings In Parameter File

If you have an Expression logic that is expected to change frequently, you can parameterize this logic so that you don't have to update the mapping logic each time the expression changes.

Suppose you have an Expression logic as IIF(inSalary=10000,'Y','N') and you wish to parametrize it. In order to do so, follow following steps:

1. Define a mapping parameter eg. $$Expr, under Parameters and Variables section. Make sure to set the IsExprVar value to TRUE and datatype to STRING. Only then will the Integration Service be able to evaluate the expression at run time.
2. Use this parameter where you wish to use it in Expression port as $$Expr.
3. In the parameter file set the expression parameter as follows
$$Expr= IIF(inSalary=10000,'Y','N')
4. Configure the workflow to use the parameter file.

Similarly, such logics in Aggregators, Filters and Routers can also be parameterized.

 •  0 comments  •  flag
Share on Twitter
Published on August 05, 2014 20:24