Monday, October 31, 2011

using SSD for high frequency trading

BiTMICRO made a number of introductions and announcements in 1999 around flash-based SSDs including an 18 GB 3.5 in SSD.[19] Fusion-io announced a PCIe-based SSD with 100,000 input/output operations per second (IOPS) of performance in a single card with capacities up to 320 gigabytes in 2007.[20] At Cebit 2009, OCZ demonstrated a 1 terabyte (TB) flash SSD using a PCI Express ×8 interface. It achieves a maximum write speed of 654 megabytes per second (MB/s) and maximum read speed of 712 MB/s.[21] In December 2009, Micron Technology announced the world's first SSD using a 6 gigabits per second (Gbit/s) or 600 (MB/s) SATA interface.[22]
PCI attached IO Accelerator SSD


from http://en.wikipedia.org/wiki/Solid-state_drive

we can utilize SSD to replace normal HDD in OS for better performance while reading and writing IO will be involved.
Please note that SSD can achieve roughly 100,000 read/write per-second, which is about 10 micro-seconds per operation if data size involved is less than 7KB referring to the throughput of 712MB/s. This speed is still not comparable to read/write from DDR SDRAM. And Samsung has developed the first DDR4 RAM that is even faster. 
.

about High-frequency trading

I am working on high frequency trading platform for the last 3 years. currently most of our competitors now talks about latency for the round trip time takes at the magnitude of single digit micro-seconds within their own system, that is from market tick received to order leaves for exchange.  

Currently fastest speed of the round trip time I heard of  is achieved by utilizing FPGA technology, which has latency of only 2 micro-seconds.

Below is rough description of the idea of  speed of high frequency trading, but do not be fooled with the latency number they provided.  We can have further discussion or idea sharing in my following posts. A basic guideline is to make your market tick to order path as short as possible. And outside software itself, different combination of OS, Network Cards, Router, Switch will have huge impact on the system performance. FPGA board has its advantage to bypass OS completely.

High-frequency trading now revolves around microseconds and even nanoseconds. Picoseconds are on the horizon.
WHAT’S IN A SECOND?
    1 millisecond (ms) = one thousandth of a second
    1 microsecond (us) = one millionth of a second
    1 nanosecond (ns) = one billionth of second
    1 picosecond (ps) = one-trillionth of a second
A fast trader can type and submit perhaps five trades in a minute, said Paul Michaud, a trading and risk management specialist at the software group of International Business Machines Corp <IBM.N> in Houston. In those 60 seconds, exchange systems and black boxes will soon be able to transmit 60 million trades, he said.
“Generally people view we’re in a race to zero here. I mean literally we’re in a race to zero. Speed of light is actually an issue for a lot of our clients,” Michaud said.  
“An electrical signal can travel down a wire 200 meters in one microsecond,” said Greg Allen, vice president of governance, architecture and planning at TMX Group Inc <X.TO>, parent of the Toronto Stock Exchange.
“A blink of the eye is about 200 milliseconds,” Allen said.     “The fastest exchange or ATS (alternative trading system) would’ve been in the range of 5 milliseconds,” referring to trading venues built up to five years ago. “Now, the the best ones claim to be around 500 microseconds — so half a millisecond.”

Saturday, October 8, 2011

Stack vs Heap Allocation

Quote:
Stack vs Heap Allocation

We conclude our discussion of storage class and scope by breifly describing how the memory of the computer is organized for a running program. When a program is loaded into memory, it is organized into three areas of memory, called segments: the text segment, stack segment, and heap segment. The text segment (sometimes also called the code segment) is where the compiled code of the program itself resides. This is the machine language representation of the program steps to be carried out, including all functions making up the program, both user defined and system.

The remaining two areas of system memory is where storage may be allocated by the compiler for data storage. The stack is where memory is allocated for automatic variables within functions. A stack is a Last In First Out (LIFO) storage device where new storage is allocated and deallocated at only one ``end'', called the Top of the stack.

When a program begins executing in the function main(), space is allocated on the stack for all variables declared within main(). If main() calls a function, func1(), additional storage is allocated for the variables in func1() at the top of the stack. Notice that the parameters passed by main() to func1() are also stored on the stack. If func1() were to call any additional functions, storage would be allocated at the new Top of stack as seen in the figure. When func1() returns, storage for its local variables is deallocated, and the Top of the stack returns to to position. If main() were to call another function, storage would be allocated for that function at the Top shown in the figure. As can be seen, the memory allocated in the stack area is used and reused during program execution. It should be clear that memory allocated in this area will contain garbage values left over from previous usage.

The heap segment provides more stable storage of data for a program; memory allocated in the heap remains in existence for the duration of a program. Therefore, global variables (storage class external), and static variables are allocated on the heap. The memory allocated in the heap area, if initialized to zero at program start, remains zero until the program makes use of it. Thus, the heap area need not contain garbage.
In case forget the difference between heap and stack

C++ General: How is floating point representated?

C++ General: How is floating point representated?
http://www.codeguru.com/forum/showthread.php?t=323835


another very interesting article talking about the Integer Security

http://www.codeguru.com/cpp/sample_chapter/article.php/c11111

 

C++ High Performance Dynamic Typing

 
one is boost:any
another is from http://www.codeproject.com/KB/cpp/dynamic_typing.aspx

both can map any type of variable into its "any" type

Fast Memory Management

For example we have an array of 1 Million records, each record is a structure. 

How can we achieve high performance to identify the empty slot for use? 

we can use the same trick Oracle used: 
We use integers's every bit to save the status flag of each element of the array. 

normally one un-signed integer is 32 bits, so for one Million records, we need total : 1000000 / 32 = 31250 integers, if it is long, we can devide it by 2 again. 

for when we look for an empty slot, we can check whether first integer's value is 4294967295 (MAX_INT if I am correct) 
if it is less than MAX_INT, we then search bit by bit of this unsigned integer to get one bit with zero value. 
if it is equal to MAX_INT, then move to next integer. 

to search for 31250 unsigned integers, it is still quite a lot, maybe we can find some other ways to achieve a more effecient method. 

Once we find the empty slot and used it, we should bring along its index when the information of that record has to be passed among processes via message queue or other ways, otherwise search such big array could be a painful experience.

How to represent floating number in binary mode?

The answer to the OP's question can be found in the C
standard (to which the C++ standard delegates) and is
reasoned on the following *symmetric* floating-point
model, described in (C99) 5.2.4.2.2:
x = s b^e Sum[f_k b^(-k), {k, 1, p}]
where
x: Any floating-point number
s: Sign (+1 or -1)
b: Base/radix of exponent repr.
e: Exponent, where e_min <= e <= e_max
p: Precision
f_k: Nonnegative integer digits
using Mathematica notation as described in
http://documents.wolfram.com/mathematica/functions/Sum
and where _ and ^ denote a subscripts and a
superscript, resp.
This obvious symmetry of floating point values
relative to the sign is also demanded *not* to be
disturbed by subnormals, infinities and/or NaN's,
which follows by further constraints described in
5.2.4.2.2/3

Sequence processing will always be faster in performance

Consider some code that sums a square array:
[code]   for (row = 0; row < N;, ++row)
      for (col = 0; col < N; ++col)
         sum += A[row][col];[/code] 
Or you can do it the other way round:
[code]   for (col = 0; col < N; ++col)
      for (row = 0; row < N; ++row)
         sum += A[row][col];[/code]
So does it matter? Indeed it does!
In C++ arrays are stored row-wise in contiguous memory. So if you traverse the array rows first you will traverse the data sequentially. That means that the next data point you need is likely to be in pipeline, cache, RAM, and the next hard drive sector before you need it. But if you traverse columns first then you will be repeatedly reading just one item from each row before reading from the next row. As a result your system's caching and lookahead will fail, and you will waste a lot of time waiting for RAM, which is about three to five times slower than cache. Even worse, you may wind up waiting for the hard drive, which can be millions of times slower than RAM if accessed in large random jumps.
How much time is wasted? On my machine summing a billion bytes row-wise takes 9 seconds, whereas summing them column-wise takes 51 seconds. The less sequential your data acccess, the slower your program will run.