|Typical Interview Questions for Software Testing||Typical Interview Questions for JAVA-J2EE|
|Typical Interview Questions for Hadoop - Bigdata||Typical SQL Interview Questions for Software Testing|
1. What is Big Data?
Big data is defined as the voluminous amount of structured, unstructured or semi-structured data that has huge potential for mining but is so large that it cannot be processed using traditional database systems.
2. What do the four V's of Big Data denote?
The four critical features of big data: a) Volume –Scale of data b) Velocity –Analysis of streaming data c) Variety – Different forms of data d) Veracity –Uncertainty of data
3. Differentiate between Structured and Unstructured data.
Data which can be stored in traditional database systems in the form of rows and columns, for example the online purchase transactions can be referred to as Structured Data. Data which can be stored only partially in traditional database systems, for example, data in XML records can be referred to as semi structured data. Unorganized and raw data that cannot be categorized as semi structured or structured data is referred to as unstructured data.
4. What are the main components of a Hadoop Application?
Core components of a Hadoop application are- 1) Hadoop Common 2) HDFS 3) Hadoop MapReduce 4) YARN
5. What is Hadoop streaming?
Hadoop distribution has a generic application programming interface for writing Map and Reduce jobs in any desired programming language like Python, Perl, Ruby, etc. This is referred to as Hadoop Streaming.
6. What is a block and block scanner in HDFS?
Block - The minimum amount of data that can be read or written is generally referred to as a "block" in HDFS. The default size of a block in HDFS is 64MB. Block Scanner - Block Scanner tracks the list of blocks present on a DataNode and verifies them to find any kind of checksum errors.
7. Explain about the indexing process in HDFS.
Indexing process in HDFS depends on the block size. HDFS stores the last part of the data that further points to the address where the next part of data chunk is stored.
8. What happens to a NameNode that has no data?
There does not exist any NameNode without data. If it is a NameNode then it should have some sort of data in it.
9. Explain the difference between HBase and Hive.
HBase and Hive both are completely different hadoop based technologies-Hive is a data warehouse infrastructure on top of Hadoop whereas HBase is a NoSQL key value store that runs on top of Hadoop. Hive helps SQL savvy people to run MapReduce jobs whereas HBase supports 4 primary operations-put, get, scan and delete.
10. Explain the difference between RDBMS data model and HBase data model.
RDBMS is a schema based database whereas HBase is schema less data model. RDBMS does not have support for in-built partitioning whereas in HBase there is automated partitioning. RDBMS stores normalized data whereas HBase stores de-normalized data.
11. What happens when a user submits a Hadoop job when the NameNode is down- does the job get in to hold or does it fail.
The Hadoop job fails when the NameNode is down.
12. How Sqoop can be used in a Java program?
The Sqoop jar in classpath should be included in the java code. After this the method Sqoop.runTool () method must be invoked. The necessary parameters should be created to Sqoop programmatically just like for command line.
1. Define join in SQL?
Join keyword is used to fetch data from related two or more tables. It returns rows where there is at least one match in both the tables included in join.
2. What is a primary key?
A Primary key is a column whose values uniquely identify every row in a table. Primary key values can never be reused. Primary key values can never be duplicate or NULL.
3. What is NULL in SQL?
A value of NULL is different from an empty or zero value. No two null values are equal. Comparisons between two null values, or between a NULL and any other value, return unknown because the value of each NULL is unknown.
4. What is an identity in SQL?
An identity column in the SQL automatically generates numeric values. A start and increment value can be defined for identity column.
5. Is it possible for a table to have more than one foreign key?
Yes, a table can have many foreign keys and only one primary key.
6. What are the possible values for BOOLEAN data field.
For a BOOLEAN data field two values are possible: -1(true) and 0(false).
7. Explain DML and DDL?
DML stands for Data Manipulation Language. INSERT, UPDATE and DELETE are DML statements. DDL stands for Data Definition Language. CREATE ,ALTER, DROP, RENAME are DDL statements.
8. Difference between TRUNCATE, DELETE and DROP commands?
DELETE removes some or all rows from a table based on the condition. It can be rolled back. TRUNCATE removes ALL rows from a table by de-allocating the memory pages. The operation cannot be rolled back DROP command removes a table from the database completely.
9. What is difference between UNIQUE and PRIMARY KEY constraints?
A table can have only one PRIMARY KEY whereas there can be any number of UNIQUE keys. Primary key cannot contain Null values whereas Unique key can contain Null values.
10. What is the Subquery?
A Subquery is sub set of select statements whose return values are used in filtering conditions of the main query.
11. Explain the difference between Rename and Alias?
Rename is a permanent new name given to a table or column whereas Alias is a temporary name given to a table or column.
12. What is CTE?
A CTE or common table expression is an expression which contains temporary result set which is defined in a SQL statement.
13. What is the difference between sub queries and joins.
Sub Queries : To write sub queries between two or more tables, there is no need to be relation exist among those tables Joins:To write joins between two or more tables, there should be relation exist among those tables.
14. What is the difference between union and union all?
The UNION operator selects only distinct values by default. To allow duplicate values, use UNION ALL.
15. Why can a "group by" or "order by" clause be expensive (time consuming) to process?
Processing of "group by" or "order by" clause often requires creation of Temporary tables to process the results of the query, which depending of the result set can be very expensive.
16. How can duplicating records be avoided in a query?
By using DISTINCT keyword duplicating records in a query can be avoided.
17. What is difference between Having clause and Where clause?
Both specify a search condition but Having clause is used only with the SELECT statement and typically used with GROUP BY clause.If GROUP BY clause is not used then Having behaves like WHERE clause only.
18. Can a primary key contain more than one columns?
Yes. Primary key created on more than one column is called composite primary key.
19. Define COMMIT?
COMMIT saves all changes made by DML statements. Once COMMIT is done, the changes cannot be rolled back.
20. List the various privileges that a user can grant to another user?
SELECT, CONNECT, RESOURCES.
21. Difference Between Primary Key and Unique Key In Sql?
Although Both PRIMARY KEY and UNIQUE KEY enforces the Uniqueness of the values , they differ in following ways.
1. Primary Key doesn't allow Null values whereas Unique Key Allows Null value. But only one Null value.
2. A table can have only one PRIMARY KEY Column[s] whereas A table can have more than one UNIQUE Key Column[s].