Pandas Best Practice

Data Manipulation Dedup DataFrame Sometimes we want to drop all the duplicated data in our DataFrame, and we can use the drop_duplicates() function. For Example: 1 2 3 4 5 6 7 8 9 10 11 12 df = pd.DataFrame({ 'brand': ['Yum Yum', 'Yum Yum', 'Indomie', 'Indomie', 'Indomie'], 'style': ['cup', 'cup', 'cup', 'pack', 'pack'], 'rating': [4, 4, 3.5, 15, 5] }) df brand style rating 0 Yum Yum cup 4.0 1 Yum Yum cup 4....

March 1, 2019 · 2 min · 330 words · Eric

PostgreSQL Best Practice

Table Creation Add essential field checking rules when creating table 1 2 3 4 5 6 7 8 9 10 CREATE TABLE IF NOT EXISTS time ( start_time TIMESTAMP CONSTRAINT time_pk PRIMARY KEY, hour INT NOT NULL CHECK (hour >= 0), day INT NOT NULL CHECK (day >= 0), week INT NOT NULL CHECK (week >= 0), month INT NOT NULL CHECK (month >= 0), year INT NOT NULL CHECK (year >= 0), weekday VARCHAR NOT NULL ); Column Referencing If you already knew there are some foreign key referencing acrossing different tables, you can specify that when creating your table....

March 1, 2019 · 2 min · 299 words · Eric

Tips and tricks of Jupyter Notebook  [draft]

Reference 28 Jupyter Notebook tips, tricks, and shortcuts Nazif Berat: Boosting Your Jupyter Notebook Productivity

September 29, 2018 · 1 min · 15 words · Eric

Best practices of AWS EMR  [draft]

Reference AWS Big Data Blog: Best practices for resizing and automatic scaling in Amazon EMR

July 3, 2018 · 1 min · 15 words · Eric