The Big Data QA Engineer provides expert guidance and delivers through self and others to:
Review the code produced by Data Engineering to load the necessary data from local and group sources onto the shared platform that are necessary for analysis and for commercial actions;
Review the code produced by Data Engineering to building Big Data applications. The applications will have to be “fit for purpose” to work as designed and efficiently manage and manipulate large volumes of data, for the ingestion in the Big Data Platform and the Use Case engineering.
Define, build, automate, execute testing for code mentioned at point 1 and 2, in an agile practice and according to the sprint planning.
Comply with the framework and Dev/Ops approach defined by the Big Data Platform central team which defines the QA strategy, tools, ways of working to be applied in every Local Market.
Validate, where requested, the models produced by the Data Science team
Key accountabilities and decision ownership;
Ensure that Data Engineering produce high performing and stable applications to perform complex processing of massive volumes of data in a multi-tenancy Hadoop environment
Guarantee the quality of end-to-end applications (through review and testing) that make use of large volumes of source data from the operational systems and output insights back to business systems
Proactively work with the Data Engineering (and data extraction) team as one squad to ensure the right quality checks are in place at the various stages of the Data Integration
Support and contribute, with the central team of Data Engineering/QA/Data Science, to define best practice for the agile development of applications to run on the Big Data Platform, from a QA perspective
Core competencies, knowledge and experience [max 5]:
Proven experience in designing, building and managing applications to process large amounts of data in a Hadoop ecosystem;
Proven experience with performance tuning applications on Hadoop and configuring Hadoop systems to maximise performance;
Experience building systems to perform real-time data processing using Spark Streaming or Storm;
Experience with managing the development life-cyclefor agile software development projects;
Experience working in a multi tenancy Hadoop environment
Expert level experience with using Spark, Yarn, Hive and Oozie;
Strong Python and Scala programming ability;
Working knowledge of HBase, Solr, Kafka and Flume
Some experience with other distributed technologies such as Cassandra, ElasticSeatch and Flink would also bedesirable.
Test automation and Dev/Ops experience.