Miscellaneous
100
🏗️ IDs
Feature Engineering A-Z
Preface
Introduction
Numeric Features
1
Numeric Overview
2
Logarithms
3
Square Root
4
Box-Cox
5
Yeo-Johnson
6
Percentile Scaling
7
Normalization
8
Range Scaling
9
Max Abs Scaling
10
Robust Scaling
11
Binning
12
Splines
13
Polynomial Expansion
Categorical Features
14
Categorical Overview
15
Cleaning
16
Unseen Levels
17
Dummy Encoding
18
Label Encoding
19
Ordinal Encoding
20
Binary Encoding
21
Frequency Encoding
22
Target Encoding
23
Hashing Encoding
24
🏗️ Leave One Out Encoding
25
🏗️ Leaf Encoding
26
🏗️ GLMM Encoding
27
🏗️ Catboost Encoding
28
🏗️ Weight of Evidence Encoding
29
🏗️ James-Stein Encoding
30
🏗️ M-Estimator Encoding
31
🏗️ Thermometer Encoding
32
🏗️ Quantile Encoding
33
🏗️ Summary Encoding
34
Collapsing Categories
35
🏗️ Combination
36
🏗️ Multi-Dummy Encoding
Datetime Features
37
Datetime Overview
38
Value Extraction
39
Advanced Features
40
Circular Features
Missing Data
41
Missing Overview
42
Simple Imputation
43
Model Based Imputation
44
Missing Values Indicators
45
Remove Missing Values
Text Features
46
Text Overview
47
Manual Text Features
48
Text Cleaning
49
Tokenization
50
Stemming
51
N-grams
52
Stop words
53
🏗️ Token Filter
54
🏗️ Term Frequency
55
🏗️ TF-IDF
56
🏗️ Token Hashing
57
🏗️ Sequence Encoding
58
🏗️ LDA
59
🏗️ word2vec
60
🏗️ BERT
Circular Features
61
Circular Overview
62
🏗️ Trigonometric
63
🏗️ Periodic Splines
64
🏗️ Periodic Indicators
Too Many Variables
65
Too Many Overview
66
Zero Variance Filter
67
🏗️ Principal Component Analysis
68
🏗️ Principal Component Analysis Variants
69
🏗️ Independent Component Analysis
70
🏗️ Non-Negative Matrix Factorization
71
🏗️ Linear Discriminant Analysis
72
🏗️ Generalized Discriminant Analysis
73
🏗️ Autoencoders
74
🏗️ Uniform Manifold Approximation and Projection
75
🏗️ ISOMAP
76
🏗️ Filter based feature selection
77
🏗️ Wrapper based feature selection
78
🏗️ Embedded based feature selection
Correlated Data
79
Correlated Overview
80
High Correlation Filter
Outliers
81
Outliers Overview
82
🏗️ Removal
83
🏗️ Imputation
84
🏗️ Indicate
Imbalanced Data
85
Imbalanced Overview
86
🏗️ Up-Sampling
87
🏗️ ROSE
88
🏗️ SMOTE
89
🏗️ SMOTE Variants
90
🏗️ Borderline SMOTE
91
🏗️ Adaptive Synthetic Algorithm
92
🏗️ Down-Sampling
93
🏗️ Near-Miss
94
🏗️ Tomek Links
95
🏗️ Condensed Nearest Neighbor
96
🏗️ Edited Nearest Neighbor
97
🏗️ Instance Hardness Threshold
98
🏗️ One Sided Selection
Miscellaneous
99
Miscellaneous Overview
100
🏗️ IDs
101
🏗️ Colors
102
🏗️ Zip Codes
103
🏗️ Emails
Spatial
104
Spatial Overview
105
🏗️ Spatial Distance
106
🏗️ Spatial Nearest
107
🏗️ Spatial Count
108
🏗️ Spatial Query
109
🏗️ Spatial Embedding
110
🏗️ Spatial Characteristics
Time-Series Data
111
Time-series Overview
112
🏗️ Smoothing
113
🏗️ Sliding
114
🏗️ Log Interval
115
🏗️ Time series Missing values
116
🏗️ Time Series outliers
117
🏗️ Differences
118
🏗️ Lagging Features
119
🏗️ Rolling Window
120
🏗️ Expanding Window
121
🏗️ Fourier Features
122
🏗️ Wavelet
Image Data
123
Image Overview
124
🏗️ Edge and corner detection
125
🏗️ Texture Analysis
126
🏗️ Greyscale conversion
127
🏗️ Color Modifications
128
🏗️ Noise Reduction
129
🏗️ Value Normalization
130
🏗️ Resizing
131
🏗️ Changing Brightness
132
🏗️ Shifting, Flipping, and Rotation
133
🏗️ Cropping and Scaling
134
🏗️ Image embeddings
Ralational Data
135
Relational Overview
136
🏗️ Manual
137
🏗️ Automatic
Video Data
138
Video Overview
139
🏗️ Temporary
Sound Data
140
Sound Overview
141
🏗️ Temporary
142
🏗️ Order of transformations
143
🏗️ What should you do if you have sparse data?
144
🏗️ How Different Models Deal With Input
145
🏗️ Summary
References
Table of contents
100.1
IDs
100.2
Pros and Cons
100.2.1
Pros
100.2.2
Cons
100.3
R Examples
100.4
Python Examples
Edit this page
Report an issue
View source
Miscellaneous
100
🏗️ IDs
100
🏗️ IDs
100.1
IDs
WIP
100.2
Pros and Cons
100.2.1
Pros
100.2.2
Cons
100.3
R Examples
100.4
Python Examples
99
Miscellaneous Overview
101
🏗️ Colors
' class='pagination-link' href='/miscellaneous-color'>
101
🏗️ Colors