lunes, 12 de enero de 2015

[Flume + MongoDB] Plugin Flume NG MongoDB Sink

[Flume + MongoDB] Plugin Flume NG MongoDB Sink
[Flume + MongoDB] Plugin Flume NG MongoDB Sink

Instalar Flume en CentOS:

Visitar Apache Flume - Downloads, copiar la URL de los binarios de la versión deseada y descargarla:
   bash# cd /usr/local
   bash# wget http://apache.rediris.es/flume/1.5.2/apache-flume-1.5.2-bin.tar.gz

Extraer su contenido:
   bash# tar xzf apache-flume-1.5.2-bin.tar.gz
   bash# mv apache-flume-1.5.2-bin/ apache-flume-1.5.2
   bash# ln -s apache-flume-1.5.2/ flume
   bash# ls -al flume
   lrwxrwxrwx 1 root root 19 ene  9 16:52 flume -> apache-flume-1.5.2/

Establecer las variables de entorno:
   bash# 
vim /etc/profile.d/flume.sh
   ... (añadir las siguientes líneas)
   export FLUME_HOME=/usr/local/flume
   export PATH=$PATH:$FLUME_HOME/bin

   bash# source /etc/profile

Editar la configuración:
   bash# cd flume
   bash# cp conf/flume-env.sh.template conf/flume-env.sh
   bash# vim conf/flume-env.sh
      ... (editar la siguiente línea)
   JAVA_HOME=/usr/java/default

Plugin Flume NG MongoDB Sink

En la propia web del plugin flume-ng-mongodb-sink sólo se nos listan los 5 pasos que deberemos dar para instalar y configurar el plugin en cuestión, pero sin darnos mayor detalle sobre los mismos.

Mi propósito a continuación es mostrarlo paso a paso.
  1. Clone the repository
    Recordar que en caso de estar detrás de un proxy deberemos exportar las variables oportunas:
    bash# git config --global http.proxy http://user:password@host:port
    bash# git config --global https.proxy https://user:password@host:port

    bash# cd /usr/local
    bash# git clone https://github.com/leonlee/flume-ng-mongodb-sink.git
  2. Install latest Maven and build source by 'mvn package'
    Visitar la web del proyecto Apache Maven => Downloads y copiar la URL de los binarios de la versión deseada.

    bash# 
    wget http://ftp.cixug.es/apache/maven/maven-3/3.2.5/binaries/apache-maven-3.2.5-bin.tar.gz

    bash# tar xzf apache-maven-3.2.5-bin.tar.gz
    bash# 
    ln -s apache-maven-3.2.5 maven
    bash# ls -al maven

    lrwxrwxrwx 1 user group 18 ene  9 16:10 maven -> apache-maven-3.2.5

    bash# 
    vim /etc/profile.d/maven.sh
    ... (añadir las siguientes líneas)
    export M2_HOME=/usr/local/maven
    export PATH=$PATH:$M2_HOME/bin


    bash# source /etc/profile
    bash# mvn -version     (verificamos su correcta instalación)
    Apache Maven 3.2.5...

    bash# 
    vi ~/.m2/settings.xml     (en caso de estar detrás de un proxy crear el siguiente fichero y modificarlo de acuerdo a nuestro entorno)
        
     <settings>
           <proxies>
             <proxy>
               <active>true</active>
               <protocol>http</protocol>
               <host>host.domain</host>
               <port>port</port>
               <username>username</username>
               <password>password</password>
               <nonProxyHosts></nonProxyHosts>
             </proxy>
           </proxies>
         </settings>


    bash# cd flume-ng-mongodb-sink
    bash# mvn package
    [INFO] Scanning for projects...
    [INFO]                                                                      
    [INFO] ------------------------------------------------------------------------
    [INFO] Building Flume NG MongoDB sink 1.0.0
    [INFO] ------------------------------------------------------------------------
    Downloading: https://repo.maven.apache.org/maven2/org/apache/maven/plugins/maven-resources-plugin/2.6/maven-resources-plugin-2.6.pom
    ...
    [INFO] Building jar: /usr/local/flume-ng-mongodb-sink/target/flume-ng-mongodb-sink-1.0.0.jar
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 31.738 s
    [INFO] Finished at: 2015-01-09T16:33:15+01:00
    [INFO] Final Memory: 14M/139M
    [INFO] ------------------------------------------------------------------------

    bash# 
  3. Generate classpath by 'mvn dependency:build-classpath'
    bash# mvn dependency:build-classpath
    [INFO] Scanning for projects...
    [INFO]                                                                         
    [INFO] ------------------------------------------------------------------------
    [INFO] Building Flume NG MongoDB sink 1.0.0
    [INFO] ------------------------------------------------------------------------
    [INFO] 
    [INFO] --- maven-dependency-plugin:2.8:build-classpath (default-cli) @ flume-ng-mongodb-sink ---
    [INFO] Dependencies classpath:
    /root/.m2/repository/org/apache/flume/flume-ng-sdk/1.3.0/flume-ng-sdk-1.3.0.jar:/root/.m2/repository/org/apache/avro/avro/1.7.2/avro-1.7.2.jar:/root/.m2/repository/org/codehaus/jackson/jackson-core-asl/1.8.8/jackson-core-asl-1.8.8.jar:/root/.m2/repository/org/codehaus/jackson/jackson-mapper-asl/1.8.8/jackson-mapper-asl-1.8.8.jar:/root/.m2/repository/com/thoughtworks/paranamer/paranamer/2.3/paranamer-2.3.jar:/root/.m2/repository/org/xerial/snappy/snappy-java/1.0.4.1/snappy-java-1.0.4.1.jar:/root/.m2/repository/org/apache/avro/avro-ipc/1.7.2/avro-ipc-1.7.2.jar:/root/.m2/repository/org/apache/velocity/velocity/1.7/velocity-1.7.jar:/root/.m2/repository/commons-collections/commons-collections/3.2.1/commons-collections-3.2.1.jar:/root/.m2/repository/commons-lang/commons-lang/2.4/commons-lang-2.4.jar:/root/.m2/repository/io/netty/netty/3.4.0.Final/netty-3.4.0.Final.jar:/root/.m2/repository/org/apache/flume/flume-ng-core/1.3.0/flume-ng-core-1.3.0.jar:/root/.m2/repository/org/apache/flume/flume-ng-configuration/1.3.0/flume-ng-configuration-1.3.0.jar:/root/.m2/repository/com/google/guava/guava/10.0.1/guava-10.0.1.jar:/root/.m2/repository/com/google/code/findbugs/jsr305/1.3.9/jsr305-1.3.9.jar:/root/.m2/repository/commons-io/commons-io/2.1/commons-io-2.1.jar:/root/.m2/repository/log4j/log4j/1.2.16/log4j-1.2.16.jar:/root/.m2/repository/commons-cli/commons-cli/1.2/commons-cli-1.2.jar:/root/.m2/repository/joda-time/joda-time/2.1/joda-time-2.1.jar:/root/.m2/repository/org/mortbay/jetty/servlet-api/2.5-20110124/servlet-api-2.5-20110124.jar:/root/.m2/repository/org/mortbay/jetty/jetty-util/6.1.26/jetty-util-6.1.26.jar:/root/.m2/repository/org/mortbay/jetty/jetty/6.1.26/jetty-6.1.26.jar:/root/.m2/repository/com/google/code/gson/gson/2.2.2/gson-2.2.2.jar:/root/.m2/repository/org/apache/mina/mina-core/2.0.4/mina-core-2.0.4.jar:/root/.m2/repository/org/slf4j/slf4j-api/1.6.1/slf4j-api-1.6.1.jar:/root/.m2/repository/org/slf4j/slf4j-log4j12/1.6.1/slf4j-log4j12-1.6.1.jar:/root/.m2/repository/org/testng/testng/6.7/testng-6.7.jar:/root/.m2/repository/junit/junit/4.10/junit-4.10.jar:/root/.m2/repository/org/hamcrest/hamcrest-core/1.1/hamcrest-core-1.1.jar:/root/.m2/repository/org/beanshell/bsh/2.0b4/bsh-2.0b4.jar:/root/.m2/repository/com/beust/jcommander/1.12/jcommander-1.12.jar:/root/.m2/repository/org/yaml/snakeyaml/1.6/snakeyaml-1.6.jar:/root/.m2/repository/org/mongodb/mongo-java-driver/2.10.1/mongo-java-driver-2.10.1.jar:/root/.m2/repository/com/googlecode/json-simple/json-simple/1.1.1/json-simple-1.1.1.jar
    [INFO] ------------------------------------------------------------------------
    [INFO] BUILD SUCCESS
    [INFO] ------------------------------------------------------------------------
    [INFO] Total time: 1.062 s
    [INFO] Finished at: 2015-01-09T16:35:47+01:00
    [INFO] Final Memory: 14M/211M

    [INFO] ------------------------------------------------------------------------
  4. Append classpath in $FLUME_HOME/conf/flume-env.sh

    bash# vim /usr/local/flume/conf
    /flume-env.sh
          ... (editar la siguiente línea)
       FLUME_CLASSPATH=/usr/local/flume-ng-mongodb-sink/target/classes/

  5. Add the sink definition according to Configuration
    bash# vim /usr/local/flume/conf/flume-conf.properties
    ...(añadir las siguientes líneas)
    agent1.sources = source1
    agent1.channels = channel1
    agent1.sinks = sinkMongo

    # Source config
    agent1.sources.source1.type = netcat
    agent1.sources.source1.bind = <serverName>
    agent1.sources.source1.port = 1982
    agent1.sources.source1.channels = channel1

    # Channel Config
    agent1.channels.channel1.type = memory
    agent1.channels.channel1.capacity = 1000000
    agent1.channels.channel1.transactionCapacity = 800
    agent1.channels.channel1.keep-alive = 3

    # Flumen NG MongoDB Sink Config
    agent1.sinks.sinkMongo.channel = channel1
    agent1.sinks.sinkMongo.type = org.riderzen.flume.sink.MongoSink
    agent1.sinks.sinkMongo.host = <serverName>
    agent1.sinks.sinkMongo.port = 27017
    agent1.sinks.sinkMongo.model = single
    agent1.sinks.sinkMongo.db = events
    agent1.sinks.sinkMongo.collection = events
    agent1.sinks.sinkMongo.batch = 100
    agent1.sinks.sinkMongo.timestampField = "yyyy-MM-dd HH:mm:ss"
  6. TEST
    Lo primero será tratar de levantar el agente Flume:
    bash# flume-ng agent -n agent1 -f /usr/local/flume/conf/flume-conf.properties -c /usr/local/flume/conf
    Info: Sourcing environment configuration script /usr/local/flume/conf/flume-env.sh
    + exec /usr/java/default/bin/java -Xmx20m -cp '/usr/local/flume/conf:/usr/local/flume/lib/*:/usr/local/flume-ng-mongodb-sink/target/classes' -Djava.library.path= org.apache.flume.node.Application -n agent1 -f /usr/local/flume/conf/flume-conf.properties


    Aparentemente el agente parece haberse levantado correctamente, pero para validarlo os aconsejo que examinéis el contenido del fichero /usr/local/flume-ng-mongodb-sink/logs/flume.log.

    bash# more /usr/local/flume-ng-mongodb-sink/logs/flume.log

    En caso de detectar el siguiente error: 

    "09 ene 2015 16:45:27,366 ERROR [conf-file-poller-0] (org.apache.flume.node.PollingPropertiesFileConfigurationProvider$FileWatcherRunnable.run:145)  - Failed to start agent because dependencies were not found in classpath. Error follows.
    java.lang.NoClassDefFoundError: com/mongodb/BasicDBObject"


    Deberemos copiar la librería del driver de MongoDB:

    bash# find / -name "*mongo*.jar"
    /root/.m2/repository/org/mongodb/mongo-java-driver/2.10.1/mongo-java-driver-2.10.1.jar
    bash# cp /root/.m2/repository/org/mongodb/mongo-java-driver/2.10.1/mongo-java-driver-2.10.1.jar /usr/local/flume/lib

    De nuevo tratamos de levantar al agente:
    bash# flume-ng agent -n agent1 -f /usr/local/flume/conf/flume-conf.properties -c /usr/local/flume/confInfo: Sourcing environment configuration script /usr/local/flume/conf/flume-env.sh
    + exec /usr/java/default/bin/java -Xmx20m -cp '/usr/local/flume/conf:/usr/local/flume/lib/*:/usr/local/flume-ng-mongodb-sink/target/classes' -Djava.library.path= org.apache.flume.node.Application -n agent1 -f /usr/local/flume/conf/flume-conf.properties


    bash# tail /usr/local/flume-ng-mongodb-sink/logs/flume.log...
    09 ene 2015 16:55:09,825 INFO  [lifecycleSupervisor-1-0] (org.apache.flume.source.NetcatSource.start:164)  - Created serverSocket:sun.nio.ch.ServerSocketChannelImpl[/:1982]09 ene 2015 16:55:09,845 INFO  [lifecycleSupervisor-1-3] (org.riderzen.flume.sink.MongoSink.start:134)  - Started MongSink_0.

    Desde otra terminal abrimos una conexión (telnet o nc) contra el servidor y puerto configurado como origen de los datos. El formato del mensaje que enviemos deberá seguir la estructura JSON. Ejemplo:
    bash# nc <serverName> 1982
    {"evento":"Hello World!"}

    OK

    {"evento":"prueba", "severtidad":"INFO"}
    OK

    A continuación si abrimos una conexión con la NoSQL de MongoDB y examinamos el contenido de la base de datos "events" y collección "events", deberemos observar lo siguiente:
    bash# mongo <serverName>/events

    MongoDB shell version: 2.6.6
    connecting to: <serverName>/events

    show collections
    events
    system.indexes

    db.events.find()
    { "_id" : ObjectId("54b3a21fe4b0f77f70218bcb"), 
    "evento" : "Hello World!", ""yyyy-MM-dd HH:mm:ss"" : ISODate("2015-01-12T10:29:48.258Z") }{ "_id" : ObjectId("54b3a21fe4b0f77f70218bcc"), "evento" : "prueba", "severtidad" : "INFO", ""yyyy-MM-dd HH:mm:ss"" : ISODate("2015-01-12T10:29:48.258Z") }

2 comentarios:

  1. Felicitaciones por el blog!, consulta como lograste generar esta carpeta de logs? more /usr/local/flume-ng-mongodb-sink/logs/flume.log

    ResponderEliminar
    Respuestas

    1. Hola Alexander,

      Muchas gracias.

      En cuanto a tu pregunta, la carpeta de logs te la genera automáticamente Flume en cuanto ejecutes el agente. Su ubicación por defecto es $FLUME_HOME/logs, pero es personalizable dentro del fichero de configuración $FLUME_HOME/conf/log4j.properties

      Espero haberte ayudado.

      Gracias y disculpa el retraso en la respuesta.

      Un saludo,
      BigData-Apuntes.

      Eliminar