바이오인포메틱스 —UNIX 기초

Jay
6 min readFeb 25, 2019

--

커맨드라인

  • UnixCommand Line 은 같은 말을 지칭. Unix 는 원래는 운영체제를 의미했으나 지금은 Command Line 를 지칭하는 말로 바뀜

The terms “Unix” and “Command Line” are used interchangeably both in this Handbook and in the real world. Technically speaking, Unix is a class of operating system that originally used command line as the primary mode of interaction with the user. As computer technologies have progressed, the two terms have come to refer to the same thing. — biostar handbook

  • 커맨드라인 : 바이오인포메틱스는 GUI보다는 고전적인 방식의 커맨드라인을 사용하는데, 이것에 매우 파워풀하다(편리하고 빠름) 아래와 같이 단 한줄의 몇 가지 단어를 입력하는 것만으로 매우 복잡한 일들을 해 낼 수 있음.
cat sraids.txt | parallel fastq-dump -O sra --split-files {}

As for the above — and don’t worry if you can’t follow this yet — it is a way to download all the published sequencing data that corresponds to the ids stored in the sraids.txt file. The command as listed above invokes three other commands:

a funny one named cat ,
a more logical sounding one parallel
and a mysterious fastq-dump .

This series of commands will read the content of the sraids.txt and will feed that line by line into the parallel program that in turn will launch as many fastq-dump programs as there are CPU cores on the machine. This parallel execution typically speeds up processing a great deal as most machines can execute more than one process at a time. — biostar handbook

  • 장점 : 명료함, 공유성, 반복성, 자동화
  • 단점: 진입장벽
  • 커맨드라인 배우는 것은 필수인가?
    - 그렇다. 문제를 풀기위한 최적화 된 방법을 모색 하려면 필수적일 것.
  • 커맨드라인 배우는 것은 어려운가?
    - 질문이 잘못되었음. 커맨드라인을 배운다는 것은 특정 문제를 아주 단순한 단계들로 나눈다는 것에 의의가 있음. 시행착오를 거듭할 수록 더욱 발전함. 매일 매일 할 수 있도록 하기.
  • 실수는 누구나 한다, 어떻게 하면 빠르게 바로 잡을 수 있을지를 고민할 것.
  • 커맨드라인 접근법 : 터미널
  • Shell이란? : bash가 가장 유명.
echo 'Hello World!'

Unix Bootcamp

Man pages:

man ls
man cd
man man
  • man <command> : shows manual of <command> in the text viewer called less

Directory level command

ls
ls /bin
ls /
pwd
mkdir <folder>
mkdir -p <pathway>
cd <folder>
  • ls : lists all of contents in directory
  • ls /bin : lists a lot of program name
  • ls / : lists root( / ) level
  • pwd : print working directory
  • mkdir : make directory
  • mkdir -p : make directory
  • cd : change directory
cd /step1/step2
cd step1/step2/
  • cd /step1/step2 : absolute change directory from root ( / )
  • cd step1/step2/ : relative change directory from current location
cd ..
cd ../..
  • cd .. : 1 level upward
  • cd ../.. : 2 level upward
cd ../child
cd ~/child/grandchild
cd /grandparent/parent/child/grandchild
  • cd ../tmp : downward from current path, relative method
  • cd ~/child/grandchild : downward from current path, absolute method
  • cd /grandparent/parent/child/children : change to absolute path
cd /
cd ~
cd
cd ~/child/grandchild
  • cd / : to root
  • cd, cd~ : to home
  • cd ~/child/grandchild : go to /child/grandchild from anywhere
ls ../../
  • ls ../../ : list directories above current location
ls -l ~
  • ls -l ~ : longer output compared to default
rmdir <folder>
  • rmdir : remove directory (before removing move to outside of <folder>
history
  • history : show previous commands list
touch example.txt
  • touch : create empty files, here example.txt
mkdir new_folder
mv example.txt new_folder
  • mv : move example.txt file to new_folder
mv *.txt new_folder
mv *t new_folder
mv *xam* new_folder
  • ( * ) : wild-card character, match anything. similar is ( ? )
touch original_name
mv original_name folder/new_name
mv folder/original_name folder/new_name
  • mv <source> <target> : renaming, moving
rm <target>
rm -i <target>
  • rm : remove, irreversible way (= dangerous).
  • rm -i : ask permission for each step
touch file1
cp file1 file2
  • cp : copy file with different name
cp ~/folder/file1 ~/folder/folder2/
  • cp <path1>/<file> <path2>: copy path1 file to path2
cp -r <path>
  • cp -r : copy directory recursively

View files with more/less/cat

echo "Hello World!"
Hello World!
echo "Hello World!" > hello.txt
  • ( > ) : redirect text into an output file, redirection. This will overwrite any existing file (careful).
more hello.txt
less hello.txt
  • more or less : text viewer
  • h : help, space : scroll forward, j :forward one line, k : backward one line, q : quit
echo "Goodbye World" >> hello.txt
  • ( >> ): append to a file.
cat hello.txt
  • cat : displays contents of file or files and returns to command line.
cat hello.txt > hello_copy.txt

Counting characters in a file

wc test.txt
wc -l test.txt
  • wc : word counts, returns lines, words and characters.
  • wc -l : returns lines
test.txt
word counts

Editing text with nano

nano test.txt
nano window (^X = Ctrl+X : Exit)

$PATH, environment variable(환경변수)

echo $USER
echo $HOME
echo $PATH
output
  • echo $PATH : colon separated list of directories of run-able programs.
run-able programs in example ‘bin’ directory

Matching lines in files, grep

Now is the winter of our discontent.
All children, except one, grow up.
The Galactic Empire was dying.
In a hole in the ground there lived a hobbit.
It was a pleasure to burn.
It was a bright, cold day in April, and the clocks were striking thirteen.
It was love at first sight.
I am an invisible man.
It was the day my grandmother exploded.
When he was nearly thirteen, my brother Jem got his arm badly broken at the elbow.
Marley was dead, to begin with.
  • add above to test.txt with nano
nano test.txt
Ctrl + O to write out
grep was test.txt
grep --color=AUTO was test.txt
  • showing lines match with was in test.txt

Options for grep

  • show lines that match a specified pattern
  • ignore case when matching ( -i )
  • only match whole words ( -w )
  • show lines that don’t match a pattern ( -v )
  • Use wildcard characters and other patterns to allow for alternatives ( * , . , and [] )
grep -v was test.txt

Combining Unix Commands with pipes

grep was test.txt | wc -c
count all words of ‘was’ containing lines
grep was test.txt | sort | head -n 3 | wc -c
앞의 세 줄만
  • example use of grep
lspci
lspci : list all pci
lspci | grep -i vga
video graphic card driver 확인

Miscellaneous Unix power commands # 여러가지 유용한 명령어

tail -n 20 test.txt | head
  • tail쪽 20 line의 10개 라인 (head default -n = 10)
grep "^ATG" file.txt
  • 시작코돈(ATG)을 ^가 시작라인에서 부터 찾는다
cut -f 3 file.txt | sort -u
  • Cut out the 3rd column of a tab-delimited text file and sort it to only show unique lines (i.e. remove duplicates)
grep -c '[bc]at' file.txt
  • ‘cat’ 또는 ‘bat’ 을 포함한 line 찾기( -c: counts lines)
cat file.txt | tr 'a-z' 'A-Z'
  • 소문자 대문자로 tr 은 ‘transliterate’의 약자
cat file.txt | sed 's/Chr1/Chromosome 1/' > file2.txt
  • ‘Chr1’ 를 ‘Chromosome 1’ 으로 치환하고 새로운 파일 생성

참고문헌

biostar-handbook-September-2018.pdf — The Biostar Handbook

--

--

Jay
Jay

Written by Jay

Brain Neural Network : Where neuroscience meets machine learning

No responses yet