Installing Pandas on t2.micro AWS Instance

We use pandas for numerous data analysis tasks in our backend; however while setting up Jenkins for automated build and test I came across an issue with installing requirements in a low memory environment.

Overview of issue

(.pyenv)[[email protected] MyAPI]# pip install pandas
Collecting pandas
  Using cached pandas-0.18.0.tar.gz
.....
    pandas/algos.c:41172:21: warning: ‘__pyx_pybuffernd_tot_wgt.diminfo[0].strides’ may be used uninitialized in this function [-Wmaybe-uninitialized]
       __Pyx_LocalBuf_ND __pyx_pybuffernd_tot_wgt;
                         ^
    pandas/algos.c:41637:21: warning: ‘__pyx_pybuffernd_tot_wgt.diminfo[0].shape’ may be used uninitialized in this function [-Wmaybe-uninitialized]
               } else if (unlikely(__pyx_t_20 >= __pyx_pybuffernd_tot_wgt.diminfo[0].shape)) __pyx_t_7 = 0;
                         ^
    {standard input}: Assembler messages:
    {standard input}:700931: Warning: end of file not at end of a line; newline inserted
    {standard input}:702091: Error: unknown pseudo-op: `.cfi'
    {standard input}: Error: open CFI at the end of file; missing .cfi_endproc directive
    gcc: internal compiler error: Killed (program cc1)
    Please submit a full bug report,
    with preprocessed source if appropriate.
    See <http://bugzilla.redhat.com/bugzilla> for instructions.
    error: command 'gcc' failed with exit status 4
    
    ----------------------------------------
Command "/var/lib/jenkins/workspace/MyAPI/.pyenv/bin/python2.7 -u -c "import setuptools, tokenize;__file__='/tmp/pip-build-izfM62/pandas/setup.py';exec(compile(getattr(tokenize, 'open', open)(__file__).read().replace('\r\n', '\n'), __file__, 'exec'))" install --record /tmp/pip-fMQFbn-record/install-record.txt --single-version-externally-managed --compile --install-headers /var/lib/jenkins/workspace/MyAPI/.pyenv/include/site/python2.7/pandas" failed with error code 1 in /tmp/pip-build-izfM62/pandas/

Here gcc is being killed because it is running out of memory while compiling the module.

A solution

Well since memory was the issue we implement a swap on our instance to cover the overages during the install of the pandas module.

mkdir -p /var/cache/swap/
dd if=/dev/zero of=/var/cache/swap/swap0 bs=1M count=512
chmod 0600 /var/cache/swap/swap0
mkswap /var/cache/swap/swap0
swapon /var/cache/swap/swap0
Now when I install pandas it actually finishes vs gcc being killed:
(.pyenv)[[email protected] MyAPI]# pip install pandas
Collecting pandas
  Using cached pandas-0.18.0.tar.gz
Requirement already satisfied (use --upgrade to upgrade): python-dateutil in ./.pyenv/lib/python2.7/site-packages (from pandas)
Requirement already satisfied (use --upgrade to upgrade): pytz>=2011k in ./.pyenv/lib/python2.7/site-packages (from pandas)
Collecting numpy>=1.7.0 (from pandas)
  Using cached numpy-1.11.0-cp27-cp27mu-manylinux1_x86_64.whl
Requirement already satisfied (use --upgrade to upgrade): six>=1.5 in ./.pyenv/lib/python2.7/site-packages (from python-dateutil->pandas)
Installing collected packages: numpy, pandas
  Running setup.py install for pandas ... done
Successfully installed numpy pandas

 

Write a Comment

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.