There was an error in this gadget

Tuesday, November 17, 2015

Programming languages 101: Python containers vs. C++ containers

Hi all,
    I am always fascinated about programming languages, computer science and technology in general. At a dinner with my friends, we were discussing about programming languages Python, R, Fortran, C/C++ and data science. At a particular time, we were moving into an topic with a simple question how to implement linked-list in Python. For an experienced programmer like me, it is quite ridiculous for me to ask this question, because we do not need it in most cases. I think that Python will take care of basic data structures for you and you just use it. Python uses C to implement basic data structures for you, called CPython. And if you are crazy enough or want to reinvent the wheel or implement a new data structure, you can implement it in C/C++ and bind it to Python.
    So, I spent that a whole night to rethink about a new question: What underlying implementation of Python container list [] and what is the equivalence containers/implementations in C++?
    At first, I think the equivalent container in C++ would be vector. In the end, it has a slight  difference in their implementation. The implementation of Python list uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure. Some explanations can found at [1], [2]. The intuitive idea is represented through this simple picture.
Courtesy from Laurent Luce [2]
The first left column represents array of pointers to keep tracking of Python objects. The right column represents Python objects itself. Any insertion, append, removal of elements would affect arrangements of the first column (array of pointers), but not on the second column (Python objects). In fact, moving expensive Python objects around is more than costly than its pointers. In contrast, C++ vector utilizes array as underlying storage mechanism. In a sense, Python list [] is better than C++ vector in the insert operation. In particular, if you insert an element in the middle of the array, the Python list just only need to shift references of Python objects, while C++ need to shift the C++ objects to new positions. The efficiency, you can imagine, is usually between moving heavy class objects and moving slight pointers. However, C++ offers more containers that are suitable your need. For example, if you want to insert objects frequently, it is efficient to use C++ lists (implemented by a double linked-list).
typedef struct {
    Py_ssize_t ob_size;

    /* Vector of pointers to list elements.  list[0] is ob_item[0], etc. */
    PyObject **ob_item;

    /* ob_item contains space for 'allocated' elements.  The number
     * currently in use is ob_size.
     * Invariants:
     *     0 <= ob_size <= allocated
     *     len(list) == ob_size
     *     ob_item == NULL implies ob_size == allocated == 0
    Py_ssize_t allocated;
} PyListObject;

    For me, Python is still new. And I think the best way to learn and understand a language is to ask simple and stupid questions like this. I have recently admired the development of Python language because it is motivated and evolved from exciting huge efforts of an open-source community.
    Of course, the container are quite an important concept of a programming language. The container in Python includes list, tupple, dict, string and something else. The container in C++11 includes vector, list (equivalent to Python list), forward_list (single linked-list), map (similar concept to Python dict), unorder_map, deque (double linked-list queue), string, and something else.
    Today, I have motivations to write blogs in Vietnamese. I decided that this blog is bilingual, i.e., English and Vietnamese. For me, it is easier to write technical contents in English, although Vietnamese is my mother tongue. It is because I tried to adapt to technical contents around Internet in English when I first came to learn computer science in university. Now, I find it hard to describe some technical concept in Vietnamese, especially with fast-pace changing field like Computer Science (CS). Overall, it is helpful because it is fun to contribute back to the community.


Sunday, July 19, 2015

OpenCV vs Matlab in Image Processing

OpenCv and Matlab are frequently used in image processing and computer vision because of their well-supported from OpenCv developers and Mathworks. In this post, I would like to highlight some differences in some basic concepts in memory layout, model, math expressions and etc. These concepts are very vital for programmers who use these tools.

1. Memory layout of an image
  • OpenCV supports only interleaved images (packing multiple channels one after the other for each pixel). This is more ridiculous and take notices when reading pixel values of opencv image.
  • Always to like this in scanning an image in Opencv 
    int channels = I.channels();

    int nRows = I.rows;
    int nCols = I.cols * channels;

    if (I.isContinuous())
        nCols *= nRows;
        nRows = 1;

    int i,j;
    uchar* p;
    for( i = 0; i < nRows; ++i)
        p = I.ptr<uchar>(i);
        for ( j = 0; j < nCols; ++j)
            p[j] = table[p[j]];
  • Matlab all of the channels clustered into image planes with the planes placed one after another. There is one advantage that matlab is homogenous, and single invariant to deal with.
2. Memory model
  • OpenCV is row-major
  • Matlab is column-major.

3. Matrix allocation and copy
  • Opencv: Do not expect a deep copy of an assignment operation in OpenCv. It just copies the header of Mat object.

  • Matlab: Somehow, Matlab will delay the deep copy until necessarily.
Ex: a = rand(5000);
      b = a;                // still not copy from a
      a(1) = 1;            // a will make a deep copy of itself and modify first element to 1.

This is the only way to implement Mat in opencv, because Mat will be shared between many variables rather than deep copying around, which is heavily prohibited for large object in memory.

4. Math experssions:
Although cv::Mat supports some mathematical expressions, it is very limited. Matlab is very intuitive and expressive language of math.

5. Matrix dimension in Opencv:
Semantic meaning x ~ width, y ~ height. But the first dim is y, second dim is x. 
For example, to access elements in position (x, y) in Mat, we should do<int>(y, x).
Similarly, Matlab also uses the same notations.

6. Continuity of memory layout:
Memory layout for a matrix in Matlab is continuous, column after column. However, in Opencv, it can be separate or continuous. We can check its continuity by calling function Mat::isContinuous() in Mat objects.

7. Size vs. Size
OpenCV returns size() method as Size(cols, rows). This is inconsistent with Mat constructor Mat(rows, cols, type). While Matlab is always consistent, a matrix will return size as index dimensions of matrix (0, 1, 2, ...).
The problem with OpenCV that it has notations of width, height with width ~ x ~ cols, height ~ y ~ rows. Remembering this correspondence is problematic.
REMEMBER: Size(x, y).

Selamat Hari Raya

Tuesday, November 25, 2014

What happened if GCC search for libraries in C/C++

Hi all,
In some beautiful days, we wonder how GCC search for standard libraries like Cool! It really matters for those guys who works on developments of new standard libraries. They want to have both versions of libraries at the same time. It is a controversial topics. After I googled it, here is the summary I found.

There are two definitions denoted as ld you need to differentiate: linker in GCC and loader in linux. ld can refer to ld (GNU linker) (at compiling time). It can also refer to loader in linux (at running time).

How to know search path during linking and running time? 

During linking of GCC, we can use this instruction:
1/ You can do this by executing the following command:
ld --verbose | grep SEARCH_DIR | tr -s ' ;' \\012

2/ gcc passes a few extra -L paths to the linker, which you can list with the following command:
gcc -print-search-dirs | sed '/^lib/b 1;d;:1;s,/[^/.][^/]*/\.\./,/,;t 1;s,:[^=]*=,:;,;s,;,;  ,g' | tr \; \\012

The answers suggesting to use and ldconfig are not correct because they refer to the paths searched by the runtime dynamic linker (i.e. whenever a program is executed), which is not the same as the path searched by ld (i.e. whenever a program is linked).

The most important keynote: GCC search -L paths first, then the default library (i.e, 2 first, then 1).

During run-time:
ldconfig -v 2>/dev/null | grep -v ^$'\t'

Wednesday, May 21, 2014

what happened after # in C/C++

Hi all,
If you ever want to know what will happen after including a header, what does GCC look for, or where does it find these headers, or then can I read this header files in Linux. Awesom.......e!!!
If you feel awesome, you need to know a little bit about it.
Well, GCC will look in default "include" folder in Linux such as

If you mention -Idir option in the command line or makefile scripts, it will search your directory first, then the default folder. For example, if you want to search for headers in current working folder, then use "-I."
More information,

Friday, April 4, 2014

HOW-TO: things after fresh Ubuntu 12.04 Desktop installation as a computing server.

Hi all,
After installing Ubuntu 12.04 Desktop, you should install some packages which is very useful in long term. All of these tips, I collect from Internet. It's totally up to purpose of your machine. My machine is up to running experiments, workstation usages, not for entertainments.
1. If you occasionally build applications from source or make your own deb files, here are some basic packages you should install:

sudo apt-get install build-essential automake make checkinstall dpatch patchutils autotools-dev debhelper quilt fakeroot xutils lintian cmake dh-make libtool autoconf git git-core subversion bzr

2. I install Ubuntu on a workstation as a server. At home or office, you have a personal Windows machine. So, remote desktop logging into Linux workstation from a Windows computer is very important. I recommend you use X11rdp, because you can copy text or files between Windows and Linux computers. Other remote desktop protocols such as VNC don't have this feature. This feature is very useful for developments latter.

3. SSH connection.
If you are a fan of terminal and SSH, you should set up a SSH server on your workstation. Then, it is vital for a connection from Linux machine to a Linux machine w/o using remote desktop logging.

4. If you have a GPU, setting up NVIDIA driver and CUDA development kit are enlightened.

5. Sharing is important. Samba is the most useful way to share files between Linux and Linux machine. It is also compatible with Windows machine.
sudo apt-get install samba.

Tuesday, March 25, 2014

Peekaboo: Machine Learning Toolkits

Peekaboo: Machine Learning Toolkits: Wow. So much to read today . While following link upon link, I found so many great toolkits that I think it is worth listing them here. On...

Peekaboo: Favourite Code from Andy's blog

Peekaboo: Favourite Code: In gerneral I use Python with Numpy/Scipy/Matplotlib for most of my experiments, so most of these are python tools. Here are some projects ...